GraphRAG: From Data to Decisions
GraphRAG: From Data to Decisions
GraphRAG (Graph Retrieval-Augmented Generation) provides the AI question-answering interface that grounds LLM responses in evidence from your knowledge graph, enabling policy-relevant queries with traceable reasoning.
What This Document Covers:
- What GraphRAG is and why it matters
- How it prevents AI hallucinations
- Integration with knowledge graph and Bayesian networks
- Real policy question examples
- Why this beats regular ChatGPT for digital equity
Audience: Policymakers, navigators, anyone who needs evidence-based answers
The Core Problem: AI Hallucination
Regular LLM (like ChatGPT without RAG):
User: "Should Baraga County prioritize infrastructure or digital navigators?"
ChatGPT: "Baraga County should consider both infrastructure and digital
navigators as they complement each other. Infrastructure provides
the foundation while navigators help with adoption..."
Problems:
- ❌ Generic answer (could apply to ANY county)
- ❌ No data about Baraga County specifically
- ❌ No citation of research evidence
- ❌ Possible hallucination (making up “facts”)
- ❌ Can’t trace reasoning
GraphRAG (your system):
User: "Should Baraga County prioritize infrastructure or digital navigators?"
GraphRAG:
1. Queries knowledge graph for Baraga County data
2. Finds: Connectivity=0.58, Application=0.42, Skills=0.38
3. Retrieves relevant research: Hampton & Bauer "unclear value" finding
4. Retrieves theory: Appadurai's navigation capacity
5. Queries Bayesian network for predictions
6. Finds similar counties and their outcomes
Response: "PRIORITIZE DIGITAL NAVIGATORS for Baraga County. Here's why:
DATA: Baraga has moderate infrastructure (Connectivity: 0.58) but LOW
aspiration capacity (Application: 0.42). The gap is -0.16.
THEORY: Appadurai (2004) explains this as 'thin aspirational maps'—
infrastructure exists but people can't navigate toward digital futures.
EVIDENCE: Hampton & Bauer (2020, doi:10.25335/BZGY-3V91) found this
'unclear value' gap in Michigan K-12 schools. Infrastructure
alone didn't close performance gaps.
PREDICTION: Bayesian model predicts $2M navigator program will:
- Improve Application: 0.42 → 0.68 (+0.26)
- Improve Skills: 0.38 → 0.55 (+0.17)
- Increase P(Equity): 0.31 → 0.62 (+0.31)
COMPARABLE: Ontonagon County (similar profile) deployed navigators in 2022.
Result: Application improved 0.44 → 0.69 over 18 months.
RECOMMENDATION: Deploy $2M navigator program (2-year) alongside
existing infrastructure investments. Expected ROI:
+0.31 equity probability for $2M = 0.155 per $M.
CITATIONS: [Appadurai 2004, Hampton & Bauer 2020, Dagg et al. 2023]"
Differences:
- ✅ Data-specific (Baraga County Compass scores)
- ✅ Theory-grounded (Appadurai cited)
- ✅ Evidence-validated (Hampton & Bauer research)
- ✅ Prediction-informed (Bayesian network)
- ✅ Comparison-based (similar county outcomes)
- ✅ Fully traceable (can verify every claim)
This is the power of GraphRAG!
What is GraphRAG?
Simple definition:
GraphRAG = Retrieval-Augmented Generation using a knowledge graph
Breaking it down:
Retrieval: Find relevant information from your knowledge graph
Augmented: Add that information to the LLM prompt
Generation: LLM generates answer based on retrieved facts
Graph: Knowledge graph provides structured, interconnected data
The process:
1. User asks question
↓
2. Parse question to identify entities (Baraga County, navigators)
↓
3. Query knowledge graph for relevant data
↓
4. Retrieve: County scores, research, theory, predictions, comparisons
↓
5. Construct enriched prompt: Question + Retrieved context
↓
6. LLM generates answer grounded in retrieved facts
↓
7. Return answer with citations/sources
How GraphRAG Prevents Hallucination:
Problem: LLMs Make Things Up
Example hallucination:
User: "What's Baraga County's digital literacy score?"
Regular LLM: "Baraga County's digital literacy score is approximately
3.2 out of 5, based on recent assessments."
Issue: LLM doesn’t KNOW this—it’s guessing based on patterns, might be completely wrong
Solution: Ground in Retrieved Facts
GraphRAG process:
# 1. Parse query
entities = extract_entities("What's Baraga County's digital literacy score?")
# entities = ["Baraga County", "digital literacy"]
# 2. Query knowledge graph
result = graph.query("""
MATCH (c:County {name: 'Baraga County'})-[s:HAS_SCORE]->(comp:Component {name: 'Skills'})
RETURN s.value, s.year, s.source
""")
# 3. Check if data exists
if result:
context = f"Baraga County's Skills component score: {result['s.value']} (year: {result['s.year']}, source: {result['s.source']})"
else:
context = "No digital literacy data found for Baraga County."
# 4. Augment prompt
prompt = f"""
User question: {user_query}
Relevant data from knowledge graph:
{context}
Instructions: Answer based ONLY on the provided data. If data is missing, say so. Do not guess.
"""
# 5. Generate answer
response = llm.generate(prompt)
GraphRAG response:
"Baraga County's Skills component score (which includes digital literacy)
is 0.38 out of 1.0, measured in 2024 using Dagg Compass methodology.
Source: Digital literacy assessment via MSU Quello Center."
Result:
- ✅ Factually accurate (from graph)
- ✅ Citable (source provided)
- ✅ Traceable (can verify in graph)
- ✅ No hallucination
Architecture of Your GraphRAG System:
┌─────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ "Should we invest in navigators for Baraga County?" │
└─────────────────────┬───────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ QUERY UNDERSTANDING │
│ - Parse question (NLP) │
│ - Extract entities: [Baraga County, navigators] │
│ - Identify question type: Investment decision │
└─────────────────────┬───────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ KNOWLEDGE GRAPH RETRIEVAL │
│ Cypher Query 1: Get county Compass scores │
│ Cypher Query 2: Get relevant theory (Appadurai) │
│ Cypher Query 3: Get evidence (Hampton & Bauer) │
│ Cypher Query 4: Get similar county outcomes │
└─────────────────────┬───────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ BAYESIAN NETWORK INFERENCE │
│ - Feed retrieved scores to Bayesian model │
│ - Predict: P(Equity | Current state) │
│ - Predict: P(Equity | With navigator intervention) │
│ - Calculate expected improvement │
└─────────────────────┬───────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ CONTEXT ASSEMBLY │
│ Combine: County data + Theory + Evidence + Prediction │
│ Structure: Data, Theory, Evidence, Prediction, Rec. │
└─────────────────────┬───────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LLM GENERATION │
│ Prompt: User question + Assembled context │
│ Generate: Evidence-grounded answer │
│ Format: With citations and reasoning │
└─────────────────────┬───────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ RESPONSE WITH SOURCES │
│ Answer + Data citations + Theory references + │
│ Research DOIs + Prediction confidence │
└─────────────────────────────────────────────────────────┘
Real Examples: Policy Questions Answered
Example 1: Investment Decision
Question: “We have $5M for Upper Peninsula. How should we allocate it?”
GraphRAG retrieval:
// Get all UP counties' Compass scores
MATCH (c:County {region: 'Upper Peninsula'})-[s:HAS_SCORE]->(comp:Component)
RETURN c.name, comp.name, s.value
ORDER BY c.name, comp.name
// Get cost estimates for interventions
MATCH (i:Intervention)-[:TARGETS]->(stage:Stage)
RETURN i.type, stage.name, avg(i.cost_per_capita) AS avg_cost
// Get ROI from past interventions
MATCH (county:County)-[:RECEIVED_INTERVENTION]->(i:Intervention)
MATCH (county)-[:ACHIEVED_OUTCOME {year: i.year + 2}]->(outcome)
RETURN i.type, avg(outcome.improvement) AS avg_improvement, avg(i.budget) AS avg_cost
Bayesian prediction:
for county in UP_counties:
scenarios = {
'Infrastructure': predict_outcome(county, intervention='infra', budget=500k),
'Navigators': predict_outcome(county, intervention='nav', budget=200k),
'Training': predict_outcome(county, intervention='training', budget=150k)
}
# Calculate ROI for each scenario
GraphRAG response:
"RECOMMENDED ALLOCATION for $5M across Upper Peninsula:
ANALYSIS:
- 14 UP counties assessed
- Connectivity average: 0.52 (moderate, rural infrastructure gaps)
- Application average: 0.38 (LOW—unclear value, thin aspiration maps)
- Skills average: 0.41 (moderate)
THEORY (Toyama 2015): Technology × Capacity = Outcome
Current: Infrastructure × Low capacity = Minimal impact
Need: Build capacity WHILE improving infrastructure
ALLOCATION:
$2.5M (50%): Fiber infrastructure (6 counties, lowest connectivity)
- Expected: Connectivity 0.45 → 0.68
- Addresses Sen's "resource" requirement
$1.5M (30%): Digital navigator programs (all 14 counties)
- Expected: Application 0.38 → 0.61 (+0.23)
- Addresses Appadurai's navigation capacity gap
$1.0M (20%): Training programs (focus: 8 counties, lowest skills)
- Expected: Skills 0.41 → 0.58 (+0.17)
- Addresses Dweck's growth mindset development
PREDICTED OUTCOME:
Regional P(Equity): 0.34 → 0.59 (+0.25)
Gini coefficient: 0.43 → 0.37 (-0.06, improvement)
EVIDENCE: Hampton & Bauer (2020) showed all three stages needed.
Infrastructure alone → minimal equity gain
Infrastructure + Navigators + Training → significant gain
CONFIDENCE: 78% (based on 12 similar rural regions, 2018-2024)
CITATIONS:
[Toyama 2015, Hampton & Bauer 2020 doi:10.25335/BZGY-3V91,
Dagg et al. 2023 Compass framework]"
Traceable: Every number, theory reference, and research citation verifiable in knowledge graph
Example 2: Theory Application
Question: “How does Appadurai’s theory apply to Marquette County?”
GraphRAG retrieval:
// Get Appadurai's theory
MATCH (a:Researcher {name: 'Arjun Appadurai'})-[:DEVELOPED]->(theory:Theory)
RETURN theory.name, theory.key_insight, theory.year
// Get Marquette County's relevant scores
MATCH (c:County {name: 'Marquette County'})-[s:HAS_SCORE]->(comp:Component)
WHERE comp.name IN ['Application', 'Connectivity']
RETURN comp.name, s.value
// Get evidence linking theory to stage
MATCH (theory)-[:GROUNDS]->(stage:Stage)<-[:VALIDATES]-(research:Research)
WHERE research.state = 'Michigan'
RETURN research.title, research.doi
GraphRAG response:
"Appadurai's 'Capacity to Aspire' (2004) is HIGHLY RELEVANT to Marquette County:
THEORY:
Aspiration is a navigational capacity learned through social interaction.
Wealthy people have 'thick' aspirational maps (many examples, much practice).
Poor people have 'thin' maps (few examples, limited practice).
Key insight: Can't navigate what you can't imagine.
MARQUETTE COUNTY DATA:
Connectivity: 0.71 (HIGH—infrastructure is good)
Application: 0.49 (MODERATE-LOW—use diversity limited)
Gap: -0.22 (infrastructure outpaces aspiration capacity)
APPADURAI'S EXPLANATION:
Marquette has access (Connectivity 0.71) BUT:
- Use cases limited (mostly entertainment, social media)
- Fewer models of transformative digital use
- 'Thin aspirational maps' for digital careers, telehealth, remote work
Result: Infrastructure underutilized because people can't NAVIGATE toward
valued digital futures.
HAMPTON & BAUER VALIDATION (2020):
Found this exact pattern in Michigan schools: infrastructure without
clear value perception = low engagement. This IS Appadurai's theory
empirically validated.
RECOMMENDATION:
Navigator program to 'thicken aspirational maps':
- Show relevant digital pathways (remote work, telehealth, online education)
- Provide community role models (people 'like me' succeeding)
- Support navigation practice (help people traverse digital pathways)
Expected: Application 0.49 → 0.68 over 18-24 months
(Based on similar interventions in Traverse City, Grand Rapids)
This is Appadurai operationalized for Marquette County.
CITATIONS:
[Appadurai 2004 'Capacity to Aspire', Hampton et al. 2020
doi:10.25335/BZGY-3V91, Dagg et al. 2023 Application component]"
Shows: Theory → Evidence → Data → Recommendation pipeline with full traceability
Example 3: Intervention Comparison
Question: “What worked better in similar counties: navigators or training programs?”
GraphRAG retrieval:
// Find rural Michigan counties with interventions
MATCH (c:County {rural: true, state: 'Michigan'})-[:RECEIVED_INTERVENTION]->(i:Intervention)
WHERE i.type IN ['Digital Navigator', 'Training Program']
// Get before/after scores
MATCH (c)-[before:HAS_SCORE {year: i.year}]->(comp:Component {name: 'Application'})
MATCH (c)-[after:HAS_SCORE {year: i.year + 1}]->(comp)
RETURN i.type,
c.name AS county,
i.budget,
before.value AS before_score,
after.value AS after_score,
(after.value - before.value) AS improvement
ORDER BY i.type, improvement DESC
Statistical analysis:
navigator_improvements = query_results['Digital Navigator']
training_improvements = query_results['Training Program']
# Compare distributions
t_stat, p_value = scipy.stats.ttest_ind(navigator_improvements, training_improvements)
effect_size = cohen_d(navigator_improvements, training_improvements)
GraphRAG response:
"DIGITAL NAVIGATORS outperformed Training Programs for Application component in rural Michigan counties:
DATA (2020-2024):
Digital Navigator programs (n=8 counties):
- Average improvement: +0.24 (Application component)
- Budget range: $1.5M - $2.5M
- Duration: 18-24 months
Training programs (n=6 counties):
- Average improvement: +0.08 (Application component)
- Budget range: $800K - $1.2M
- Duration: 12-18 months
Statistical significance: p < 0.01, effect size: 1.2 (large)
WHY NAVIGATORS > TRAINING (Theory explanation):
Appadurai (2004): Navigators build ASPIRATION CAPACITY
- Show pathways (thicken maps)
- Provide models (relevant examples)
- Support navigation (practice with guidance)
Training: Builds SKILLS without necessarily building aspiration
- Teaches HOW to use technology
- Doesn't always show WHY it matters
- Limited pathway visibility
Result: Navigators address the ROOT CAUSE (thin aspirational maps)
Training addresses SYMPTOM (skill gaps)
HAMPTON & BAUER EVIDENCE (2020):
Found 'unclear value' gap—this is aspiration capacity deficit.
Skills training alone didn't close gaps.
Need relevance demonstration (navigator role).
SPECIFIC EXAMPLES:
Ontonagon County (Navigator, $2.1M, 2022-2024):
Application: 0.44 → 0.71 (+0.27)
Mechanism: Navigators showed telehealth, remote work pathways
Schoolcraft County (Training, $950K, 2021-2023):
Application: 0.52 → 0.58 (+0.06)
Mechanism: Computer skills taught, but relevance unclear
RECOMMENDATION:
Prioritize navigator programs for Application/Aspiration gaps.
Use training programs for Skills gaps AFTER aspiration built.
Best: Combined approach (navigators show WHY, training teaches HOW)
CONFIDENCE: High (8 navigator programs, consistent results)
CITATIONS:
[Appadurai 2004, Hampton & Bauer 2020 doi:10.25335/BZGY-3V91,
intervention data from Michigan BEAD evaluations 2020-2024]"
Evidence-based: Direct comparison with statistical rigor, theory explanation, real county outcomes
How GraphRAG Learns Over Time:
Feedback Loop:
1. GraphRAG makes prediction: "Navigator program will improve Application by +0.24"
↓
2. County deploys navigator program
↓
3. Measure outcome after 18 months: Application improved +0.21
↓
4. Update knowledge graph:
CREATE (county)-[:ACHIEVED_OUTCOME {
predicted: 0.24,
actual: 0.21,
accuracy: 0.88
}]->(outcome)
↓
5. Update Bayesian network priors (learned: predictions slightly optimistic)
↓
6. Next prediction more accurate
Result: System gets SMARTER with each intervention
Comparison: GraphRAG vs Alternatives
| Approach | Data Grounding | Citations | Traceability | Learning | Cost |
|---|---|---|---|---|---|
| Regular ChatGPT | ❌ None (training data only) | ❌ No | ❌ Black box | ❌ Static | $ |
| ChatGPT + Web Search | ⚠️ Web (unreliable) | ⚠️ Sometimes | ⚠️ Limited | ❌ Static | $$ |
| Vector DB RAG | ✅ Your documents | ⚠️ Sometimes | ⚠️ Document-level | ❌ Static | $$ |
| GraphRAG (Your System) | ✅✅ Structured knowledge | ✅✅ Always | ✅✅ Full reasoning path | ✅✅ Continuous | $$$ |
Why GraphRAG wins for policy:
- Structured data: Not just documents, but relationships (theory → evidence → data)
- Full traceability: Every claim verified in graph
- Continuous learning: Updates with each intervention
- Relationship reasoning: Multi-hop queries (theory + evidence + data + prediction)
Implementation: Python Example
from neo4j import GraphDatabase
from openai import OpenAI
import json
class DigitalEquityGraphRAG:
def __init__(self, graph_uri, graph_auth, openai_key):
self.graph = GraphDatabase.driver(graph_uri, auth=graph_auth)
self.llm = OpenAI(api_key=openai_key)
def answer_question(self, question):
"""Main GraphRAG pipeline"""
# 1. Parse question to identify entities and intent
entities = self.extract_entities(question)
intent = self.classify_intent(question)
# 2. Query knowledge graph for relevant context
context = self.retrieve_context(entities, intent)
# 3. If prediction needed, query Bayesian network
if intent == 'prediction':
prediction = self.get_prediction(entities, context)
context['prediction'] = prediction
# 4. Construct enriched prompt
prompt = self.build_prompt(question, context)
# 5. Generate answer
response = self.llm.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a digital equity policy advisor. Answer based ONLY on provided context. Cite sources."},
{"role": "user", "content": prompt}
]
)
return {
'answer': response.choices[0].message.content,
'sources': context['sources'],
'confidence': context.get('prediction', {}).get('confidence')
}
def retrieve_context(self, entities, intent):
"""Query knowledge graph for relevant information"""
context = {'sources': []}
with self.graph.session() as session:
# Get county data if county mentioned
if 'county' in entities:
county_data = session.run("""
MATCH (c:County {name: $county})-[s:HAS_SCORE]->(comp:Component)
RETURN comp.name, s.value, s.year, s.source
""", county=entities['county'])
context['county_scores'] = [dict(record) for record in county_data]
context['sources'].append(f"Compass scores for {entities['county']}")
# Get relevant theory
if intent in ['investment', 'strategy']:
theory = session.run("""
MATCH (t:Theory)-[:GROUNDS]->(s:Stage)
WHERE s.name IN ['Opportunity', 'Aspiration', 'Growth Mindset']
RETURN t.name, t.key_insight, t.year
LIMIT 3
""")
context['theory'] = [dict(record) for record in theory]
context['sources'].extend([f"{t['t.name']} ({t['t.year']})" for t in context['theory']])
# Get relevant evidence
if intent in ['investment', 'validation']:
evidence = session.run("""
MATCH (r:Research {state: 'Michigan'})-[:VALIDATES]->(s:Stage)
RETURN r.title, r.doi, r.year, s.name
LIMIT 2
""")
context['evidence'] = [dict(record) for record in evidence]
context['sources'].extend([f"{e['r.title']} (doi:{e['r.doi']})" for e in context['evidence']])
# Get similar county outcomes if intervention question
if intent == 'intervention' and 'county' in entities:
similar = session.run("""
MATCH (target:County {name: $county})
MATCH (similar:County)-[:SIMILAR_TO]->(target)
MATCH (similar)-[:RECEIVED_INTERVENTION]->(i:Intervention)
MATCH (similar)-[:ACHIEVED_OUTCOME]->(o:Outcome)
RETURN similar.name, i.type, i.budget, o.improvement
LIMIT 3
""", county=entities['county'])
context['similar_outcomes'] = [dict(record) for record in similar]
context['sources'].append("Similar county intervention outcomes")
return context
def build_prompt(self, question, context):
"""Construct enriched prompt with retrieved context"""
prompt = f"""
USER QUESTION:
{question}
RELEVANT DATA FROM KNOWLEDGE GRAPH:
County Scores:
{json.dumps(context.get('county_scores', []), indent=2)}
Theoretical Foundation:
{json.dumps(context.get('theory', []), indent=2)}
Empirical Evidence:
{json.dumps(context.get('evidence', []), indent=2)}
Similar County Outcomes:
{json.dumps(context.get('similar_outcomes', []), indent=2)}
Prediction (if applicable):
{json.dumps(context.get('prediction', {}), indent=2)}
INSTRUCTIONS:
1. Answer based ONLY on the provided data
2. Structure: Data → Theory → Evidence → Recommendation
3. Cite sources for every claim
4. If data is insufficient, say so explicitly
5. Include confidence level if prediction provided
Answer:
"""
return prompt
# Usage
graphrag = DigitalEquityGraphRAG(
graph_uri="bolt://localhost:7687",
graph_auth=("neo4j", "password"),
openai_key="your-key"
)
result = graphrag.answer_question(
"Should Baraga County prioritize infrastructure or navigators?"
)
print(result['answer'])
print(f"\nSources: {result['sources']}")
print(f"Confidence: {result['confidence']}")
Advanced: Multi-Agent GraphRAG
For complex policy questions, use multiple specialized agents:
class MultiAgentGraphRAG:
def __init__(self):
self.theory_agent = TheoryAgent() # Retrieves relevant theories
self.evidence_agent = EvidenceAgent() # Retrieves research findings
self.data_agent = DataAgent() # Retrieves county/state data
self.prediction_agent = PredictionAgent() # Runs Bayesian predictions
self.synthesis_agent = SynthesisAgent() # Combines all inputs
def answer_complex_question(self, question):
# Each agent retrieves its specialty
theory = self.theory_agent.retrieve(question)
evidence = self.evidence_agent.retrieve(question)
data = self.data_agent.retrieve(question)
prediction = self.prediction_agent.predict(data)
# Synthesis agent combines all perspectives
answer = self.synthesis_agent.synthesize(
question, theory, evidence, data, prediction
)
return answer
Use case: “Design comprehensive digital equity strategy for Michigan”
Result: Coordinated retrieval across all knowledge domains, synthesized into cohesive plan
Bottom Line:
GraphRAG is the intelligent interface to your digital equity intelligence system.
What it provides:
✅ Evidence-grounded answers (not hallucinations)
✅ Full traceability (every claim verifiable)
✅ Theory + Evidence + Data integration
✅ Policy-relevant responses (actionable recommendations)
✅ Continuous learning (improves with feedback)
✅ Natural language interface (accessible to non-technical users)
Why it’s critical:
- Policymakers need answers, not data dumps
- Decisions require evidence, not guesses
- Interventions need confidence levels, not blind faith
- Stakeholders demand accountability, not black boxes
The complete pipeline:
Question
↓
GraphRAG retrieves from Knowledge Graph
↓
Bayesian Network provides predictions
↓
LLM synthesizes evidence-grounded answer
↓
Citable, traceable, actionable response
This is AI for evidence-based policymaking.
Next steps:
- TrainingCompassKnowledgeGraph.md - What data GraphRAG retrieves
- TrainingCompassBayesian.md - How predictions are generated
- TrainingCompassMetrics.md - What evidence is stored
Version: 1.0
Last Updated: November 2025
Part of: Project Compass (Merit Network) - Digital Opportunities Intelligence Network (DOIN) • Working draft