RAG Evaluation¶
IdeaWeaver provides comprehensive evaluation capabilities for RAG systems using the RAGAS framework. This allows you to assess the quality and performance of your knowledge bases.
Evaluation Metrics¶
The RAGAS framework evaluates several key aspects of your RAG system:
- Faithfulness: Measures if the generated answer is factually consistent with the retrieved context
- Answer Relevancy: Assesses if the answer is relevant to the question
- Context Relevancy: Evaluates if the retrieved context is relevant to the question
- Context Recall: Measures how well the system retrieves all relevant information
- Context Precision: Assesses the precision of the retrieved information
Basic Evaluation¶
ideaweaver rag evaluate --kb my-kb
This will:
- Generate test questions
- Run the evaluation
- Display metrics
- Generate a report
Advanced Evaluation¶
Custom Test Questions¶
ideaweaver rag evaluate --kb my-kb --questions-file custom_questions.json
Specific Metrics¶
ideaweaver rag evaluate --kb my-kb --metrics faithfulness,answer_relevancy
Compare Knowledge Bases¶
ideaweaver rag compare-kb --kb1 kb1 --kb2 kb2
Generating Test Questions¶
ideaweaver rag generate-test-questions --kb my-kb
Options:
--num-questions
: Number of questions to generate--output-file
: Save questions to a file--question-types
: Types of questions to generate
Example Output¶
๐ค Auto-generating 5 test questions
๐งช Starting RAGAS evaluation for KB: my-kb
๐ Metrics: faithfulness, answer_relevancy
๐ Evaluation Results:
----------------------------------------
Faithfulness: 0.85
Answer Relevancy: 0.92
Context Relevancy: 0.88
Context Recall: 0.90
Context Precision: 0.87
โ
Evaluation completed successfully!
๐ Report saved to: evaluation_report_my-kb.md
Best Practices¶
- Regular Evaluation: Evaluate your RAG system regularly as you add new documents
- Multiple Metrics: Use multiple metrics to get a complete picture
- Custom Questions: Create domain-specific test questions
- Compare Baselines: Compare against baseline or previous versions
- Monitor Changes: Track how metrics change over time
Troubleshooting¶
Common issues and solutions:
Low Faithfulness
- Check document quality
- Review chunking strategy
- Adjust retrieval parameters
Low Relevancy
- Improve question understanding
- Optimize embedding model
- Review document preprocessing
Low Recall
- Increase chunk overlap
- Adjust chunk size
- Review retrieval strategy
Next Steps¶
- RAG Commands - Complete command reference
- RAG Overview - System architecture and features
- Enterprise RAG Guide - Production deployments