RAGAS Adapter
The RAGAS adapter integrates RAGAS (Retrieval Augmented Generation Assessment) with the eval-hub evaluation service using the evalhub-sdk framework adapter pattern.
Overview
Section titled “Overview”RAGAS is an open-source framework purpose-built for evaluating RAG pipeline quality. It provides metrics that directly measure whether a RAG system is correctly grounding its answers in retrieved documents: faithfulness, answer relevancy, context precision, context recall, and more.
RAG pipelines are among the most widely deployed AI application patterns in enterprise settings, yet faithfulness evaluation, verifying that model-generated answers match the retrieved context rather than hallucinating, remains one of the least commonly performed evaluation steps. The RAGAS adapter brings these metrics into EvalHub’s unified evaluation control plane.
Key Features
Section titled “Key Features”- RAG-specific metrics: Faithfulness, answer relevancy, context precision/recall, answer correctness, and more
- LLM-as-judge evaluation: Uses an LLM judge endpoint for metrics that require semantic understanding
- Embedding-based metrics: Supports separate embedding model configuration for similarity-based metrics
- Flexible data input: JSONL and JSON datasets with configurable column mapping
- Two benchmark suites: Default (4 core metrics) and Full (11 metrics) evaluation suites
- OpenAI-compatible: Works with any OpenAI-compatible model endpoint (vLLM, TGI, Ollama, etc.)
Supported Backends
Section titled “Supported Backends”- OpenAI-compatible chat completions endpoints (vLLM, Text Generation Inference, Ollama, etc.)
- OpenAI-compatible embeddings endpoints (for metrics that require embeddings)
Architecture
Section titled “Architecture”The adapter follows the eval-hub framework adapter pattern with automatic configuration:
Workflow:
- Settings-based configuration: Runtime settings loaded automatically from environment
- Automatic JobSpec loading: Job configuration auto-loaded from mounted ConfigMap
- Dataset resolution: Evaluation data loaded from
/test_data(S3 init container),/data, or an explicit path - LLM and embeddings setup: OpenAI-compatible wrappers configured for the RAGAS evaluation engine
- RAGAS evaluation: Metrics computed per sample using the RAGAS library
- Callback-based communication: Progress updates and artifacts sent to sidecar via callbacks
- OCI artifact persistence: Results persisted as OCI artifacts via the sidecar
- Structured results: Returns
JobResultswith per-metric scores and overall aggregate
Quick Start
Section titled “Quick Start”Building the Container
Section titled “Building the Container”make image-ragasRunning Locally
Section titled “Running Locally”# Set environment for local modeexport EVALHUB_MODE=localexport EVALHUB_JOB_SPEC_PATH=meta/job.json
# Run the adapterpython main.py# Start Ollama and pull a modelollama run qwen2.5:1.5b
# Run RAGAS evaluation against Ollamaexport EVALHUB_MODE=localexport EVALHUB_JOB_SPEC_PATH=meta/job.jsonpython main.pyContainer Image
Section titled “Container Image”# Pull from registrypodman pull quay.io/evalhub/community-ragas:latest
# Run with custom job specpodman run \ -e EVALHUB_MODE=local \ -e EVALHUB_JOB_SPEC_PATH=/meta/job.json \ -v $(pwd)/job.json:/meta/job.json:ro \ -v $(pwd)/data:/data:ro \ quay.io/evalhub/community-ragas:latestProvider Details
Section titled “Provider Details”| Field | Value |
|---|---|
| Provider ID | ragas |
| Default Benchmark | ragas_rag_default |
| Full Benchmark | ragas_rag_full |
Source
Section titled “Source”- Adapter: eval-hub-contrib/adapters/ragas
- Upstream: explodinggradients/ragas