Configuration Reference
Complete reference for RAGAS adapter configuration options.
JobSpec Structure
Section titled “JobSpec Structure”The RAGAS adapter uses a standardised JobSpec structure:
{ "id": "string", "provider_id": "ragas", "benchmark_id": "string", "model": { "name": "string", "url": "string" }, "parameters": { // RAGAS-specific configuration }, "num_examples": 0}Core Parameters
Section titled “Core Parameters”Required Parameters
Section titled “Required Parameters”| Parameter | Type | Description | Example |
|---|---|---|---|
id | string | Unique job identifier | "ragas-rag-eval-001" |
provider_id | string | Must be "ragas" | "ragas" |
benchmark_id | string | Benchmark identifier | "ragas_rag_default" |
model.name | string | Model name for the LLM judge | "Qwen/Qwen2.5-1.5B-Instruct" |
model.url | string | OpenAI-compatible API endpoint | "http://localhost:8000" |
Optional Parameters
Section titled “Optional Parameters”| Parameter | Type | Description | Default |
|---|---|---|---|
num_examples | integer | Limit the number of dataset samples to evaluate | All samples |
callback_url | string | EvalHub service callback URL | null |
Benchmark Configuration
Section titled “Benchmark Configuration”Two pre-defined benchmark suites are available:
ragas_rag_default: Default Suite
Section titled “ragas_rag_default: Default Suite”Runs the four core RAG evaluation metrics. Suitable for most use cases.
| Setting | Value |
|---|---|
| Metrics | answer_relevancy, context_precision, faithfulness, context_recall |
| Primary score | faithfulness |
| Pass threshold | 0.5 |
ragas_rag_full: Full Suite
Section titled “ragas_rag_full: Full Suite”Runs all 11 available metrics for comprehensive RAG evaluation.
| Setting | Value |
|---|---|
| Metrics | All 11 metrics (see Metrics reference) |
| Primary score | faithfulness |
| Pass threshold | 0.5 |
Parameters Reference
Section titled “Parameters Reference”All configuration is specified in the parameters object of the JobSpec.
Metric Selection
Section titled “Metric Selection”| Parameter | Type | Description | Default |
|---|---|---|---|
metrics | array | List of RAGAS metric names to evaluate | Benchmark default |
Available metrics: answer_relevancy, answer_similarity, context_precision, faithfulness, context_recall, context_entity_recall, nv_accuracy, nv_context_relevance, factual_correctness, noise_sensitivity, nv_response_groundedness.
See the Metrics reference for details on each metric.
LLM Configuration
Section titled “LLM Configuration”| Parameter | Type | Description | Default |
|---|---|---|---|
max_tokens | integer | Maximum tokens for LLM completions | null (server default) |
temperature | number | Sampling temperature for LLM completions | null (adapter default) |
Embedding Configuration
Section titled “Embedding Configuration”| Parameter | Type | Description | Default |
|---|---|---|---|
embedding_model | string | Model name for embeddings | Same as model.name |
embedding_url | string | Base URL for embeddings endpoint | Same as model.url |
Data Configuration
Section titled “Data Configuration”| Parameter | Type | Description | Default |
|---|---|---|---|
data_path | string | Explicit path to dataset file | Auto-resolved |
column_map | object | Map dataset column names to RAGAS expected names | null |
Column mapping: RAGAS expects columns named user_input, response, retrieved_contexts, and reference. If your dataset uses different names, provide a mapping:
{ "parameters": { "column_map": { "question": "user_input", "answer": "response", "contexts": "retrieved_contexts", "ground_truth": "reference" } }}Data resolution order (when data_path is not set):
/test_data/dataset.jsonl, populated by EvalHub’s S3 init container- First
.jsonlor.jsonfile in/test_data/ /data/dataset.jsonl- First
.jsonlor.jsonfile in/data/
Concurrency
Section titled “Concurrency”| Parameter | Type | Description | Default |
|---|---|---|---|
max_workers | integer | Parallel workers for RAGAS evaluation (1–10) | 1 |
LLM Judge Requirements
Section titled “LLM Judge Requirements”RAGAS uses an LLM as a judge for several metrics. The judge model receives structured prompts and must return parseable JSON responses.
Which Metrics Require a Judge
Section titled “Which Metrics Require a Judge”| Metric | LLM Judge | Embeddings |
|---|---|---|
faithfulness | Yes | No |
answer_relevancy | Yes | Yes |
context_precision | Yes | No |
context_recall | Yes | No |
answer_similarity | No | Yes |
context_entity_recall | No | No |
factual_correctness | Yes | No |
noise_sensitivity | Yes | No |
nv_accuracy | Yes | No |
nv_context_relevance | Yes | No |
nv_response_groundedness | Yes | No |
Cost and Latency Implications
Section titled “Cost and Latency Implications”- Each sample is evaluated independently per metric. For
Nsamples andMjudge-based metrics, expect roughlyN * MLLM calls. - Judge prompts are structured and can be lengthy, so set
max_tokensappropriately (512 is usually sufficient). - Use a lower
temperature(e.g.0.1) for more deterministic judge outputs. - The adapter uses chat completions (not legacy completions) to avoid truncation issues.
Environment Variables
Section titled “Environment Variables”The adapter reads runtime settings from environment variables:
| Variable | Description | Required | Default |
|---|---|---|---|
EVALHUB_MODE | Execution mode (k8s or local) | No | k8s |
EVALHUB_JOB_SPEC_PATH | Path to job spec JSON | Yes (local mode) | /meta/job.json |
SERVICE_URL | Eval-hub service URL | No | null |
REGISTRY_URL | OCI registry URL | No | null |
REGISTRY_USERNAME | Registry username | No | null |
REGISTRY_PASSWORD | Registry password | No | null |
REGISTRY_INSECURE | Allow insecure registry | No | false |
LOG_LEVEL | Logging level | No | INFO |
Complete Example
Section titled “Complete Example”{ "id": "ragas-rag-eval-001", "provider_id": "ragas", "benchmark_id": "ragas_rag_default", "benchmark_index": 0, "model": { "url": "http://127.0.0.1:8000", "name": "Qwen/Qwen2.5-1.5B-Instruct" }, "num_examples": 5, "parameters": { "metrics": [ "answer_relevancy", "context_precision", "faithfulness", "context_recall" ], "embedding_model": "all-MiniLM-L6-v2", "embedding_url": "http://127.0.0.1:8001", "max_tokens": 512, "temperature": 0.1 }, "callback_url": "http://localhost:8080"}