graphrag
Microsoft GraphRAG Skill
Expert assistance for using Microsoft GraphRAG, a modular graph-based Retrieval-Augmented Generation system that extracts structured knowledge from unstructured text to enhance LLM reasoning over private data.
When to Use This Skill
This skill should be used when:
- Building RAG systems that need to "connect the dots" across dispersed information
- Querying large document collections holistically
- Extracting structured knowledge graphs from unstructured text
- Implementing graph-based retrieval for LLM applications
- Processing private datasets with enhanced reasoning capabilities
- Working with narrative, unstructured documents
- Building question-answering systems over document corpora
- Extracting entities, relationships, and claims from text
- Creating hierarchical knowledge summaries
- Implementing multi-hop reasoning over documents
- Comparing GraphRAG with traditional vector-based RAG
- Tuning prompts for domain-specific datasets
- Configuring indexing pipelines for knowledge extraction
Overview
What is GraphRAG?
Microsoft GraphRAG is a data pipeline and transformation system that:
- Extracts meaningful, structured data from unstructured text using LLMs
- Builds knowledge graph memory structures
- Enhances LLM outputs through graph-based retrieval
- Supports private data processing without external exposure
Core Innovation:
"GraphRAG addresses fundamental limitations of baseline RAG: connecting the dots across disparate information pieces and holistically understanding summarized concepts over large collections."
Key Differentiators from Baseline RAG
Traditional vector-based RAG has limitations:
- ❌ Struggles to connect information across multiple documents
- ❌ Limited holistic understanding of document collections
- ❌ Misses relationships between dispersed facts
- ❌ Poor performance on "summarize the corpus" queries
GraphRAG solves these with:
- ✅ Knowledge graph extraction from text
- ✅ Hierarchical community detection
- ✅ Multi-level summarization
- ✅ Graph-based reasoning and traversal
- ✅ Better performance on complex queries
Core Concepts
1. Knowledge Graph Extraction
GraphRAG extracts three primary elements:
Entities: Objects, people, places, concepts
Examples:
- "Microsoft" (Organization)
- "Seattle" (Location)
- "Cloud Computing" (Concept)
- "Satya Nadella" (Person)
Relationships: Connections between entities
Examples:
- Microsoft → headquartered_in → Seattle
- Satya Nadella → is_CEO_of → Microsoft
- Microsoft → provides → Cloud Computing
Claims: Factual statements with supporting evidence
Examples:
- "Microsoft is the largest software company" [Source: Document X, Page 5]
- "Azure revenue grew 30% in Q4" [Source: Earnings Report]
2. Hierarchical Community Detection
GraphRAG uses the Leiden algorithm to:
- Cluster related entities into communities
- Create hierarchical levels of organization
- Generate summaries at each level
- Enable bottom-up reasoning
Example Hierarchy:
Level 0 (Detailed):
Community 1: Azure services (Compute, Storage, Networking)
Community 2: Office products (Word, Excel, PowerPoint)
Level 1 (Mid-level):
Community A: Cloud services (includes Community 1)
Community B: Productivity tools (includes Community 2)
Level 2 (High-level):
Community X: Microsoft product ecosystem (includes A & B)
3. TextUnits
Documents are segmented into TextUnits:
- Manageable chunks for analysis
- Sized based on token limits
- Overlapping to preserve context
- Form the basis of entity extraction
4. Query Modes
GraphRAG offers multiple search strategies:
Global Search: Holistic corpus reasoning
- Best for: "Summarize the main themes"
- Uses: Community summaries at all levels
- Method: Bottom-up aggregation
Local Search: Entity-specific reasoning
- Best for: "Tell me about Entity X"
- Uses: Entity neighborhoods in graph
- Method: Traversal from seed entities
DRIFT Search: Entity reasoning with community context
- Best for: "How does X relate to broader themes?"
- Uses: Entities + community summaries
- Method: Hybrid approach
Basic Search: Traditional vector similarity
- Best for: Simple semantic matching
- Uses: Embedding similarity
- Method: Baseline RAG fallback
Installation
Prerequisites
# Python 3.10 or higher required
python --version
# Install GraphRAG
pip install graphrag
# Or install from source
git clone https://github.com/microsoft/graphrag.git
cd graphrag
pip install -e .
Environment Setup
# Create environment file
cat > .env << EOF
# LLM Configuration (OpenAI)
GRAPHRAG_LLM_API_KEY=your-openai-api-key
GRAPHRAG_LLM_TYPE=openai_chat
GRAPHRAG_LLM_MODEL=gpt-4o
# Embedding Configuration
GRAPHRAG_EMBEDDING_API_KEY=your-openai-api-key
GRAPHRAG_EMBEDDING_TYPE=openai_embedding
GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small
# Optional: Azure OpenAI
# GRAPHRAG_LLM_API_BASE=https://your-resource.openai.azure.com
# GRAPHRAG_LLM_API_VERSION=2024-02-15-preview
# GRAPHRAG_LLM_DEPLOYMENT_NAME=gpt-4
# Optional: Local models
# GRAPHRAG_LLM_TYPE=ollama
# GRAPHRAG_LLM_API_BASE=http://localhost:11434
EOF
Quick Start
1. Initialize Project
# Create new GraphRAG project
mkdir my-graphrag-project
cd my-graphrag-project
# Initialize configuration
graphrag init --root .
# This creates:
# - settings.yaml (configuration)
# - .env (environment variables)
# - prompts/ (customizable prompts)
2. Prepare Your Data
# Create input directory
mkdir -p input
# Add your documents
cp /path/to/documents/*.txt input/
# Supported formats: .txt, .pdf, .docx, .md
# Each file will be processed independently
3. Run Indexing Pipeline
# Index your data (this can take time and cost money!)
graphrag index --root .
# The indexing process will:
# 1. Load and chunk documents
# 2. Extract entities, relationships, claims
# 3. Build knowledge graph
# 4. Detect communities (Leiden algorithm)
# 5. Generate community summaries
# 6. Create embeddings
# 7. Store results in output/
# Monitor progress
graphrag index --root . --verbose
4. Query Your Data
# Global Search (holistic queries)
graphrag query --root . \
--method global \
--query "What are the main themes in this dataset?"
# Local Search (entity-specific queries)
graphrag query --root . \
--method local \
--query "Tell me about Microsoft's cloud strategy"
# DRIFT Search (entity + community context)
graphrag query --root . \
--method drift \
--query "How does Azure relate to the broader Microsoft ecosystem?"
Configuration
settings.yaml Structure
# Core Configuration
llm:
api_key: ${GRAPHRAG_LLM_API_KEY}
type: openai_chat # or azure_openai_chat, ollama
model: gpt-4o
max_tokens: 4000
temperature: 0
top_p: 1
embeddings:
api_key: ${GRAPHRAG_EMBEDDING_API_KEY}
type: openai_embedding
model: text-embedding-3-small
# Chunking Configuration
chunks:
size: 1200 # Token size per chunk
overlap: 100 # Overlap between chunks
group_by_columns: [id]
# Entity Extraction
entity_extraction:
prompt: "prompts/entity_extraction.txt"
max_gleanings: 1 # Re-extraction passes
entity_types: [organization, person, location, event]
# Community Detection
community_reports:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
# Claim Extraction
claim_extraction:
enabled: true
prompt: "prompts/claim_extraction.txt"
max_gleanings: 1
# Embeddings
embed_graph:
enabled: true
strategy: node2vec # or deepwalk
# Storage
storage:
type: file # or blob, cosmosdb
base_dir: output
# Reporting
reporting:
type: file
base_dir: output/reports
Advanced Configuration Options
# Custom LLM Configuration
llm:
type: azure_openai_chat
api_base: https://your-resource.openai.azure.com
api_version: "2024-02-15-preview"
deployment_name: gpt-4
api_key: ${AZURE_OPENAI_API_KEY}
request_timeout: 180
max_retries: 10
max_retry_wait: 10
# Parallelization
parallelization:
stagger: 0.3 # Delay between requests
num_threads: 4 # Concurrent workers
# Cache Configuration
cache:
type: file
base_dir: cache
# Input Configuration
input:
type: file
file_type: text # or csv, parquet
base_dir: input
encoding: utf-8
file_pattern: ".*\\.txt$"
Prompt Tuning
Why Tune Prompts?
"Using GraphRAG with your data out of the box may not yield the best possible results."
Domain-specific datasets require custom prompts for:
- Relevant entity types
- Appropriate relationship types
- Domain-specific language
- Expected output format
Auto-Tuning Process
# Generate domain-adapted prompts
graphrag prompt-tune --root . \
--config settings.yaml \
--output prompts/
# This will:
# 1. Analyze your input documents
# 2. Identify domain-specific patterns
# 3. Generate custom entity extraction prompts
# 4. Generate custom summarization prompts
# 5. Save to prompts/ directory
Manual Prompt Customization
# Edit generated prompts
nano prompts/entity_extraction.txt
Example Entity Extraction Prompt:
-Target activity-
You are an AI assistant helping to identify entities in documents about {DOMAIN}.
-Goal-
Extract all entities and relationships from the text below.
Entity Types:
{ENTITY_TYPES}
Relationship Types:
{RELATIONSHIP_TYPES}
Format your response as JSON:
{{
"entities": [
{{"name": "Entity Name", "type": "ENTITY_TYPE", "description": "..."}}
],
"relationships": [
{{"source": "Entity 1", "target": "Entity 2", "type": "RELATIONSHIP_TYPE", "description": "..."}}
]
}}
Text to analyze:
{INPUT_TEXT}
Indexing Pipeline Deep Dive
Step-by-Step Process
1. Document Loading
# Input documents are loaded from input/ directory
# Supported formats: .txt, .pdf, .docx, .md
2. Text Chunking
# Documents split into TextUnits
# Default: 1200 tokens with 100 token overlap
# Preserves context across chunk boundaries
3. Entity Extraction
# For each TextUnit:
# - Extract entities (with types and descriptions)
# - Extract relationships (with types and weights)
# - Extract claims (with sources and confidence)
4. Graph Construction
# Build knowledge graph:
# - Nodes = Entities
# - Edges = Relationships
# - Properties = Attributes and metadata
5. Community Detection
# Leiden algorithm for hierarchical clustering:
# - Level 0: Fine-grained communities
# - Level 1: Mid-level aggregations
# - Level 2+: High-level themes
6. Community Summarization
# For each community at each level:
# - Aggregate entity and relationship info
# - Generate natural language summary
# - Store for query-time retrieval
7. Embedding Generation
# Create vector embeddings for:
# - TextUnits (for similarity search)
# - Entities (for semantic matching)
# - Community summaries (for global search)
8. Output Storage
# Results saved to output/:
# - create_final_entities.parquet
# - create_final_relationships.parquet
# - create_final_communities.parquet
# - create_final_community_reports.parquet
# - create_final_text_units.parquet
Query Modes in Detail
Global Search
Best For:
- "What are the main themes?"
- "Summarize the entire dataset"
- "What are the key trends?"
How It Works:
- Query is matched against community summaries
- Relevant communities selected at all hierarchy levels
- Summaries aggregated bottom-up
- Final answer synthesized from multiple levels
Example:
graphrag query --root . \
--method global \
--query "What are the major technology trends discussed in these documents?"
# Behind the scenes:
# 1. Match query to relevant communities
# 2. Retrieve summaries from levels 0, 1, 2
# 3. Aggregate: AI/ML, Cloud, Cybersecurity communities
# 4. Synthesize comprehensive answer
Python API:
from graphrag.query import GlobalSearch
searcher = GlobalSearch(
llm=llm,
context_builder=context_builder,
map_system_prompt=map_prompt,
reduce_system_prompt=reduce_prompt
)
result = await searcher.asearch(
query="What are the major themes?",
conversation_history=[]
)
print(result.response)
Local Search
Best For:
- "Tell me about [specific entity]"
- "What is the relationship between X and Y?"
- "Find information about [topic]"
How It Works:
- Identify entities mentioned in query
- Traverse graph from those entities
- Collect neighborhood information (N-hop)
- Retrieve associated TextUnits
- Synthesize answer from local context
Example:
graphrag query --root . \
--method local \
--query "What is Microsoft's strategy for artificial intelligence?"
# Behind the scenes:
# 1. Identify: "Microsoft", "artificial intelligence" entities
# 2. Traverse: Find related entities (Azure AI, OpenAI partnership, etc.)
# 3. Collect: Relationships, claims, TextUnits
# 4. Synthesize: Answer from local graph neighborhood
Python API:
from graphrag.query import LocalSearch
searcher = LocalSearch(
llm=llm,
context_builder=context_builder,
system_prompt=system_prompt
)
result = await searcher.asearch(
query="Tell me about Microsoft's AI strategy",
conversation_history=[]
)
print(result.response)
DRIFT Search
Best For:
- "How does [entity] fit into [broader context]?"
- "What is the significance of [topic]?"
- Hybrid queries needing both local and global context
How It Works:
- Identify query entities (like Local Search)
- Find relevant communities (like Global Search)
- Combine entity neighborhoods with community summaries
- Synthesize answer from both perspectives
Example:
graphrag query --root . \
--method drift \
--query "How does Azure AI relate to Microsoft's overall cloud strategy?"
# Behind the scenes:
# 1. Local: Find "Azure AI" entity and neighborhood
# 2. Global: Find "cloud strategy" community summaries
# 3. Combine: Entity details + strategic context
# 4. Synthesize: Comprehensive answer
Python API Usage
Basic Setup
import asyncio
from graphrag.query import LocalSearch, GlobalSearch
from graphrag.llm import create_openai_chat_llm
from graphrag.config import GraphRagConfig
# Load configuration
config = GraphRagConfig.from_file("settings.yaml")
# Create LLM
llm = create_openai_chat_llm(
api_key=config.llm.api_key,
model=config.llm.model,
temperature=0.0
)
Custom Indexing
from graphrag.index import run_pipeline_with_config
# Run indexing programmatically
await run_pipeline_with_config(
config_path="settings.yaml",
verbose=True
)
Advanced Query Customization
from graphrag.query.context_builder import LocalContextBuilder
# Build custom context
context_builder = LocalContextBuilder(
entities=entities_df,
relationships=relationships_df,
text_units=text_units_df,
embeddings=embeddings
)
# Custom search with parameters
result = await searcher.asearch(
query="Your question here",
conversation_history=[
{"role": "user", "content": "Previous question"},
{"role": "assistant", "content": "Previous answer"}
],
top_k=10, # Number of results
temperature=0.5, # LLM creativity
max_tokens=2000 # Response length
)
# Access detailed results
print("Response:", result.response)
print("Context used:", result.context_data)
print("Sources:", result.sources)
Use Cases and Examples
1. Research Paper Analysis
# Index academic papers
mkdir -p input/papers
cp research_papers/*.pdf input/papers/
graphrag index --root .
# Global query
graphrag query --method global \
--query "What are the main research themes across these papers?"
# Local query
graphrag query --method local \
--query "What methodologies does the Smith et al. paper use?"
2. Legal Document Processing
# Index legal contracts
mkdir -p input/contracts
cp contracts/*.docx input/contracts/
# Tune prompts for legal domain
graphrag prompt-tune --root . --domain "legal contracts"
# Index with legal-specific entities
graphrag index --root .
# Query
graphrag query --method local \
--query "What are the termination clauses in the Microsoft contracts?"
3. Customer Feedback Analysis
# Index customer feedback
mkdir -p input/feedback
cp feedback_*.txt input/feedback/
# Global themes
graphrag query --method global \
--query "What are the main customer pain points?"
# Specific product feedback
graphrag query --method local \
--query "What feedback relates to product X features?"
4. News Article Summarization
# Index news articles
mkdir -p input/news
cp articles/*.txt input/news/
graphrag index --root .
# Get comprehensive summary
graphrag query --method global \
--query "Summarize the key events and trends from these news articles"
# Entity-specific news
graphrag query --method local \
--query "What news relates to climate change initiatives?"
Advanced Features
1. Incremental Indexing
# Initial indexing
graphrag index --root .
# Add new documents
cp new_documents/*.txt input/
# Re-index only new content
graphrag index --root . --incremental
# Note: Full graph may need periodic rebuilding
2. Custom Entity Types
Edit prompts/entity_extraction.txt:
Entity Types:
- PRODUCT: Software products, services
- FEATURE: Product features and capabilities
- TECHNOLOGY: Technologies and frameworks
- METRIC: Performance metrics, KPIs
- INITIATIVE: Projects and strategic initiatives
- COMPETITOR: Competing products or companies
3. Multi-Language Support
# settings.yaml
input:
encoding: utf-8
language: es # Spanish
llm:
model: gpt-4o # Multilingual model
# Customize prompts in target language
4. Azure OpenAI Integration
llm:
type: azure_openai_chat
api_base: https://your-resource.openai.azure.com
api_version: "2024-02-15-preview"
deployment_name: gpt-4
api_key: ${AZURE_OPENAI_API_KEY}
embeddings:
type: azure_openai_embedding
api_base: https://your-resource.openai.azure.com
api_version: "2024-02-15-preview"
deployment_name: text-embedding-3-small
api_key: ${AZURE_OPENAI_API_KEY}
5. Local LLM Support (Ollama)
llm:
type: ollama
api_base: http://localhost:11434
model: llama3:70b
temperature: 0
embeddings:
type: ollama
api_base: http://localhost:11434
model: nomic-embed-text
Cost Management
Understanding Costs
GraphRAG uses LLM APIs which incur costs:
Indexing Phase (most expensive):
- Entity extraction: Multiple LLM calls per TextUnit
- Relationship extraction: Additional calls
- Community summarization: Calls per community
- Embedding generation: Per entity/TextUnit
Query Phase (less expensive):
- Context retrieval: Minimal LLM use
- Answer synthesis: Single LLM call per query
Cost Optimization Strategies
1. Reduce Chunk Size
chunks:
size: 600 # Smaller chunks = fewer tokens
overlap: 50
2. Limit Entity Extraction Passes
entity_extraction:
max_gleanings: 0 # 0 = single pass, 1 = two passes
3. Use Smaller Models
llm:
model: gpt-4o-mini # Cheaper than gpt-4o
embeddings:
model: text-embedding-3-small # Cheaper than large
4. Process Subset First
# Test on small sample
mkdir input/sample
cp input/full/*.txt input/sample/ | head -5
graphrag index --root . --input-dir input/sample
5. Cache Aggressively
cache:
type: file
base_dir: cache
Cost Estimation
# Estimate before indexing
from graphrag.index import estimate_index_cost
cost_estimate = estimate_index_cost(
input_dir="input/",
config_path="settings.yaml"
)
print(f"Estimated cost: ${cost_estimate.total_cost}")
print(f"Total tokens: {cost_estimate.total_tokens}")
print(f"Estimated time: {cost_estimate.estimated_hours} hours")
Best Practices
1. Start Small
# Test with 5-10 documents first
# Validate outputs before scaling
# Tune prompts on small sample
# Then scale to full dataset
2. Monitor Indexing Progress
# Use verbose mode
graphrag index --root . --verbose
# Check output files periodically
ls -lh output/*.parquet
# Monitor logs
tail -f output/reports/indexing.log
3. Version Control Configuration
# Track changes
git add settings.yaml prompts/
git commit -m "Update entity types for domain X"
# Tag successful configurations
git tag -a v1.0-config -m "Working config for dataset X"
4. Validate Outputs
import pandas as pd
# Check extracted entities
entities = pd.read_parquet("output/create_final_entities.parquet")
print(f"Total entities: {len(entities)}")
print(f"Entity types: {entities['type'].value_counts()}")
# Check relationships
relationships = pd.read_parquet("output/create_final_relationships.parquet")
print(f"Total relationships: {len(relationships)}")
print(f"Relationship types: {relationships['type'].value_counts()}")
# Check communities
communities = pd.read_parquet("output/create_final_communities.parquet")
print(f"Total communities: {len(communities)}")
print(f"Hierarchy levels: {communities['level'].value_counts()}")
5. Iterate on Prompts
# Run initial index
graphrag index --root .
# Evaluate quality
graphrag query --method global --query "Test query"
# If quality is poor:
# 1. Adjust entity types in prompts
# 2. Modify extraction instructions
# 3. Re-run indexing
# 4. Validate improvements
Troubleshooting
Common Issues
"API rate limit exceeded"
# Add delays between requests
parallelization:
stagger: 1.0 # Increase delay
num_threads: 2 # Reduce concurrency
llm:
max_retries: 20 # More retries
max_retry_wait: 60 # Longer backoff
"Out of memory during indexing"
# Reduce batch sizes
chunks:
size: 600 # Smaller chunks
parallelization:
num_threads: 2 # Less parallelism
"Poor quality entity extraction"
# Run prompt tuning
graphrag prompt-tune --root . --domain "your domain"
# Manually refine prompts
nano prompts/entity_extraction.txt
# Add domain-specific examples
# Specify expected entity types clearly
"Queries return irrelevant results"
# Check if indexing completed successfully
ls -lh output/*.parquet
# Validate extracted entities
python -c "import pandas as pd; print(pd.read_parquet('output/create_final_entities.parquet').head())"
# Try different query methods
graphrag query --method local --query "Your query"
graphrag query --method global --query "Your query"
"Version incompatibility after update"
# Reinitialize configuration
graphrag init --root . --force
# This updates settings.yaml to new schema
# Review and merge your customizations
Performance Optimization
Indexing Performance
# Optimize for speed
parallelization:
num_threads: 8 # Max concurrent workers
stagger: 0.1 # Minimal delay
chunks:
size: 1500 # Larger chunks (fewer API calls)
entity_extraction:
max_gleanings: 0 # Single pass only
Query Performance
# Cache query results
from functools import lru_cache
@lru_cache(maxsize=100)
def cached_query(query_text):
return searcher.search(query_text)
# Pre-load data structures
entities_df = pd.read_parquet("output/create_final_entities.parquet")
relationships_df = pd.read_parquet("output/create_final_relationships.parquet")
# Keep in memory for fast access
Storage Optimization
# Use compressed storage
storage:
type: file
compression: gzip # Or snappy, lz4
# Or use database storage
storage:
type: cosmosdb
connection_string: ${COSMOS_CONNECTION_STRING}
Integration Examples
LangChain Integration
from langchain.retrievers import GraphRAGRetriever
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
# Create GraphRAG retriever
retriever = GraphRAGRetriever(
index_path="output/",
search_method="local"
)
# Build QA chain
llm = ChatOpenAI(model="gpt-4o")
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True
)
# Query
result = qa_chain("What are the main themes?")
print(result["answer"])
FastAPI Service
from fastapi import FastAPI
from graphrag.query import LocalSearch, GlobalSearch
app = FastAPI()
# Initialize searchers
local_searcher = LocalSearch(...)
global_searcher = GlobalSearch(...)
@app.post("/query/local")
async def query_local(query: str):
result = await local_searcher.asearch(query)
return {"response": result.response, "sources": result.sources}
@app.post("/query/global")
async def query_global(query: str):
result = await global_searcher.asearch(query)
return {"response": result.response}
# Run: uvicorn main:app --reload
Streamlit UI
import streamlit as st
from graphrag.query import GlobalSearch
st.title("GraphRAG Query Interface")
# Query input
query = st.text_input("Enter your question:")
method = st.selectbox("Search method:", ["global", "local", "drift"])
if st.button("Search"):
with st.spinner("Searching..."):
# Run query
result = await searcher.asearch(query)
# Display results
st.write("### Answer")
st.write(result.response)
st.write("### Sources")
st.write(result.sources)
Comparison with Other Approaches
GraphRAG vs. Vector RAG
| Feature | Vector RAG | GraphRAG |
|---|---|---|
| Structure | Flat embeddings | Knowledge graph |
| Relationships | Implicit (similarity) | Explicit (edges) |
| Multi-hop | Poor | Excellent |
| Summarization | Difficult | Natural (communities) |
| Setup Cost | Low | High (indexing) |
| Query Cost | Low | Medium |
| Best For | Simple lookups | Complex reasoning |
When to Use GraphRAG
✅ Use GraphRAG when:
- Queries require connecting multiple pieces of information
- Need holistic understanding of document corpus
- Relationships between entities matter
- Multi-hop reasoning is important
- Domain has rich entity/relationship structure
❌ Use Vector RAG when:
- Simple semantic search is sufficient
- Low setup cost is priority
- Documents are independent
- Queries are straightforward lookups
- Budget is constrained
Resources
Documentation
- Official Docs: https://microsoft.github.io/graphrag/
- GitHub: https://github.com/microsoft/graphrag
- Research Paper: https://arxiv.org/abs/2404.16130
Community
- GitHub Discussions: https://github.com/microsoft/graphrag/discussions
- Issues: https://github.com/microsoft/graphrag/issues
Examples
- Notebooks: https://github.com/microsoft/graphrag/tree/main/examples
- Sample Configs: https://github.com/microsoft/graphrag/tree/main/examples/configs
Important Notes
⚠️ Not an Official Microsoft Product
"This codebase is a demonstration of graph-based RAG and not an officially supported Microsoft offering."
💰 Cost Considerations
- Indexing can be expensive (especially with GPT-4)
- Test on small samples first
- Monitor API costs closely
🔄 Version Management
- Configuration schemas change between versions
- Run
graphrag init --forceafter updates - Review migration guides for breaking changes
🎯 Prompt Tuning is Critical
- Out-of-box results may be suboptimal
- Domain-specific tuning significantly improves quality
- Invest time in prompt customization
License
Microsoft GraphRAG is released under the MIT License.
Note: This skill provides comprehensive guidance for using Microsoft GraphRAG. Always test on small datasets first, monitor costs, and tune prompts for your specific domain.