Skip to content

RAGAS Adapter

The RAGAS adapter integrates RAGAS (Retrieval Augmented Generation Assessment) with the eval-hub evaluation service using the evalhub-sdk framework adapter pattern.

RAGAS is an open-source framework purpose-built for evaluating RAG pipeline quality. It provides metrics that directly measure whether a RAG system is correctly grounding its answers in retrieved documents: faithfulness, answer relevancy, context precision, context recall, and more.

RAG pipelines are among the most widely deployed AI application patterns in enterprise settings, yet faithfulness evaluation, verifying that model-generated answers match the retrieved context rather than hallucinating, remains one of the least commonly performed evaluation steps. The RAGAS adapter brings these metrics into EvalHub’s unified evaluation control plane.

  • RAG-specific metrics: Faithfulness, answer relevancy, context precision/recall, answer correctness, and more
  • LLM-as-judge evaluation: Uses an LLM judge endpoint for metrics that require semantic understanding
  • Embedding-based metrics: Supports separate embedding model configuration for similarity-based metrics
  • Flexible data input: JSONL and JSON datasets with configurable column mapping
  • Two benchmark suites: Default (4 core metrics) and Full (11 metrics) evaluation suites
  • OpenAI-compatible: Works with any OpenAI-compatible model endpoint (vLLM, TGI, Ollama, etc.)
  • OpenAI-compatible chat completions endpoints (vLLM, Text Generation Inference, Ollama, etc.)
  • OpenAI-compatible embeddings endpoints (for metrics that require embeddings)

The adapter follows the eval-hub framework adapter pattern with automatic configuration:

Workflow:

  1. Settings-based configuration: Runtime settings loaded automatically from environment
  2. Automatic JobSpec loading: Job configuration auto-loaded from mounted ConfigMap
  3. Dataset resolution: Evaluation data loaded from /test_data (S3 init container), /data, or an explicit path
  4. LLM and embeddings setup: OpenAI-compatible wrappers configured for the RAGAS evaluation engine
  5. RAGAS evaluation: Metrics computed per sample using the RAGAS library
  6. Callback-based communication: Progress updates and artifacts sent to sidecar via callbacks
  7. OCI artifact persistence: Results persisted as OCI artifacts via the sidecar
  8. Structured results: Returns JobResults with per-metric scores and overall aggregate
Terminal window
make image-ragas
Terminal window
# Set environment for local mode
export EVALHUB_MODE=local
export EVALHUB_JOB_SPEC_PATH=meta/job.json
# Run the adapter
python main.py
Terminal window
# Pull from registry
podman pull quay.io/evalhub/community-ragas:latest
# Run with custom job spec
podman run \
-e EVALHUB_MODE=local \
-e EVALHUB_JOB_SPEC_PATH=/meta/job.json \
-v $(pwd)/job.json:/meta/job.json:ro \
-v $(pwd)/data:/data:ro \
quay.io/evalhub/community-ragas:latest
FieldValue
Provider IDragas
Default Benchmarkragas_rag_default
Full Benchmarkragas_rag_full