Resource Reference
MCP resources provide read-only access to EvalHub data using the evalhub:// URI scheme. All resources return JSON.
Providers
Section titled “Providers”List all providers
Section titled “List all providers”URI: evalhub://providers
Returns all registered evaluation providers.
[ { "resource": { "id": "garak", "created_at": "2026-05-01T12:00:00Z", "updated_at": "2026-05-10T08:30:00Z" }, "name": "garak", "title": "Garak", "description": "LLM vulnerability scanner and red-teaming framework", "agent": { "evaluates": ["safety", "security", "red_teaming", "toxicity"], "target_type": "model", "summary": "Red-team an LLM for safety vulnerabilities, toxicity, and OWASP risks", "complements": ["lm_evaluation_harness", "guidellm"], "hints": [ "The model endpoint must support OpenAI-compatible chat completions" ], "result_interpretation": [ "attack_success_rate measures how often the model was successfully exploited", "LOWER is better -- 0.0 means no attacks succeeded" ] } }, { "resource": { "id": "guidellm", "created_at": "2026-05-01T12:00:00Z", "updated_at": "2026-05-10T08:30:00Z" }, "name": "guidellm", "title": "GuideLLM", "description": "Performance and latency benchmarking", "agent": { "evaluates": ["performance", "throughput", "latency"], "target_type": "inference_server", "summary": "Benchmark LLM inference server throughput, latency, and scalability" } }]Get a provider by ID
Section titled “Get a provider by ID”URI: evalhub://providers/{id}
Example: evalhub://providers/garak
Returns details for a single provider including its available benchmarks and optional agent metadata.
Benchmarks
Section titled “Benchmarks”List all benchmarks
Section titled “List all benchmarks”URI: evalhub://benchmarks
Returns all benchmarks across all providers.
Get a benchmark by ID
Section titled “Get a benchmark by ID”URI: evalhub://benchmarks/{id}
Example: evalhub://benchmarks/mmlu
Returns details for a single benchmark including its provider, description, and configuration.
Filter benchmarks by label
Section titled “Filter benchmarks by label”URI: evalhub://benchmarks?label={tag}
Filter benchmarks by tag. The label parameter can be repeated for AND-style filtering — all specified labels must match.
Examples:
evalhub://benchmarks?label=safety— benchmarks tagged “safety”evalhub://benchmarks?label=rag&label=reasoning— benchmarks tagged both “rag” and “reasoning”
Collections
Section titled “Collections”List all collections
Section titled “List all collections”URI: evalhub://collections
Returns all pre-defined benchmark collections.
[ { "resource": { "id": "safety-and-fairness-v1", "created_at": "2026-05-01T12:00:00Z", "updated_at": "2026-05-10T08:30:00Z" }, "name": "Safety and Fairness v1", "description": "Safety and bias evaluation benchmarks", "category": "safety", "agent": { "evaluates": ["safety", "fairness", "bias", "toxicity", "ethics", "truthfulness"], "summary": "Comprehensive safety and fairness suite covering toxicity, bias, truthfulness, and ethics", "complements": ["garak", "toxicity-and-ethical-principles"], "hints": [ "Runs 6 benchmarks across truthfulness, toxicity, gender bias, social bias, and ethics", "Overall pass threshold is 0.758" ], "result_interpretation": [ "Aggregate score is a weighted average across all benchmarks, higher is better" ] } }, { "resource": { "id": "leaderboard-v2", "created_at": "2026-05-01T12:00:00Z", "updated_at": "2026-05-10T08:30:00Z" }, "name": "Leaderboard v2", "description": "Standard leaderboard benchmarks", "category": "general" }]Get a collection by ID
Section titled “Get a collection by ID”URI: evalhub://collections/{id}
Example: evalhub://collections/leaderboard-v2
Returns the collection with its full benchmark list and configuration.
List all jobs
Section titled “List all jobs”URI: evalhub://jobs
Returns all evaluation jobs. Supports pagination.
Query parameters:
| Parameter | Type | Description |
|---|---|---|
limit | integer | Maximum items to return (1–2000, default 100) |
offset | integer | Number of items to skip |
Example: evalhub://jobs?limit=20&offset=0
Filter jobs by status
Section titled “Filter jobs by status”URI: evalhub://jobs?status={status}
| Status | Description |
|---|---|
pending | Queued, waiting to start |
running | Currently executing |
completed | All benchmarks finished |
failed | One or more benchmarks failed |
cancelled | Cancelled by user |
partially_failed | Some benchmarks succeeded, others failed |
Examples:
evalhub://jobs?status=running— all currently running jobsevalhub://jobs?status=completed&limit=10— last 10 completed jobs
Get a job by ID
Section titled “Get a job by ID”URI: evalhub://jobs/{id}
Example: evalhub://jobs/job-a1b2c3d4
Returns full job details including state, progress, per-benchmark status, and timestamps.
Server metadata
Section titled “Server metadata”Server version
Section titled “Server version”URI: evalhub://server/version
Returns server version and build information.
{ "version": "0.4.0", "git_hash": "d2c6d42", "build_date": "2026-05-20T12:00:00Z", "go_version": "go1.25.9", "os": "linux", "arch": "amd64", "mcp_library": "github.com/modelcontextprotocol/go-sdk", "mcp_library_version": "0.2.0"}Autocompletion
Section titled “Autocompletion”The MCP server provides autocompletion for resource URI parameters. MCP clients that support completions will suggest valid IDs when you type resource URIs:
- Provider IDs — when typing
evalhub://providers/{id} - Benchmark IDs — when typing
evalhub://benchmarks/{id} - Collection IDs — when typing
evalhub://collections/{id} - Job IDs — when typing
evalhub://jobs/{id} - Status values — when typing
evalhub://jobs?status= - Labels — when typing
evalhub://benchmarks?label=
Completions are cached for 30 seconds and support partial matching.