2. Accuracy

This is the core of LLM evaluation — measuring whether the system produces correct, grounded, and complete outputs.


Area What It Covers Sub-Page
Response Quality Task fulfillment, instruction following, factuality, consistency Response Quality →
Context Sourcing RAG recall/precision, API selection, parameter accuracy, query generation Context Sourcing →
Grounded Accuracy Faithfulness to context, citation correctness, hallucination detection Grounded Accuracy →
Memory Memory correctness, recall relevance, update correctness Memory →
Agentic Evaluation Tool calls, task adherence, trajectory quality, plan quality, safety constraints Agentic →
Multi-Modal I/O Text (streaming stability, responsiveness), Voice (ASR, turn-taking, barge-in), Vision (visual tasks, OCR), Cross-modal stability Multi-Modal →

← Previous: 1. Strategy · Next: 3. Performance →


Table of contents


Back to top

Copyright © 2026 Emumba. Distributed under the MIT License.

This site uses Just the Docs, a documentation theme for Jekyll.