Grounded Accuracy

Given whatever context was sourced, did the answer correctly use it?

Metric	Description
Evidence Use Rate	% of answers whose claims are supported by the provided context
Citation Correctness	% of citations that truly back the claim (no fabricated refs)
Unsupported Claim Rate	% of answers containing claims not in the provided context
Hallucination Detection	Identifying fabricated information absent from source material

Manual Examples:

Response	Context	Result
All claims reference provided documents with correct citations	Context contains all referenced information	✅ PASS — fully grounded
Adds a claim about “write overhead” not present in context	Context only covers table scans and B-tree types	❌ FAIL — unsupported claim / hallucination

Code: examples/accuracy/grounded_accuracy/faithfulness_evaluator.py · examples/accuracy/grounded_accuracy/citation_accuracy_evaluator.py

from examples.accuracy.grounded_accuracy.faithfulness_evaluator import evaluate_faithfulness
from examples.accuracy.grounded_accuracy.citation_accuracy_evaluator import evaluate_citation_accuracy

# Hallucination / unsupported claims (checks against provided context, not ground truth)
result = evaluate_faithfulness(
    judge=judge,
    query="How do database indexes work?",
    generated_response="Indexes speed up retrieval... they also add write overhead...",
    context="Document 1: Indexes speed up data retrieval by avoiding full table scans..."
)

# Citation correctness
result = evaluate_citation_accuracy(
    judge=judge,
    query="What are the health benefits of exercise?",
    generated_response="Exercise improves heart health [Doc A] and reduces stress [Doc B]...",
    citations=[
        {"source": "Doc A", "content": "Regular exercise strengthens the cardiovascular system..."},
        {"source": "Doc B", "content": "Physical activity reduces cortisol levels..."},
    ]
)