2. Accuracy
This is the core of LLM evaluation — measuring whether the system produces correct, grounded, and complete outputs.
| Area | What It Covers | Sub-Page |
|---|---|---|
| Response Quality | Task fulfillment, instruction following, factuality, consistency | Response Quality → |
| Context Sourcing | RAG recall/precision, API selection, parameter accuracy, query generation | Context Sourcing → |
| Grounded Accuracy | Faithfulness to context, citation correctness, hallucination detection | Grounded Accuracy → |
| Memory | Memory correctness, recall relevance, update correctness | Memory → |
| Agentic Evaluation | Tool calls, task adherence, trajectory quality, plan quality, safety constraints | Agentic → |
| Multi-Modal I/O | Text (streaming stability, responsiveness), Voice (ASR, turn-taking, barge-in), Vision (visual tasks, OCR), Cross-modal stability | Multi-Modal → |
← Previous: 1. Strategy · Next: 3. Performance →