Skip to content

Resource Reference

MCP resources provide read-only access to EvalHub data using the evalhub:// URI scheme. All resources return JSON.

URI: evalhub://providers

Returns all registered evaluation providers.

[
{
"resource": {
"id": "garak",
"created_at": "2026-05-01T12:00:00Z",
"updated_at": "2026-05-10T08:30:00Z"
},
"name": "garak",
"title": "Garak",
"description": "LLM vulnerability scanner and red-teaming framework",
"agent": {
"evaluates": ["safety", "security", "red_teaming", "toxicity"],
"target_type": "model",
"summary": "Red-team an LLM for safety vulnerabilities, toxicity, and OWASP risks",
"complements": ["lm_evaluation_harness", "guidellm"],
"hints": [
"The model endpoint must support OpenAI-compatible chat completions"
],
"result_interpretation": [
"attack_success_rate measures how often the model was successfully exploited",
"LOWER is better -- 0.0 means no attacks succeeded"
]
}
},
{
"resource": {
"id": "guidellm",
"created_at": "2026-05-01T12:00:00Z",
"updated_at": "2026-05-10T08:30:00Z"
},
"name": "guidellm",
"title": "GuideLLM",
"description": "Performance and latency benchmarking",
"agent": {
"evaluates": ["performance", "throughput", "latency"],
"target_type": "inference_server",
"summary": "Benchmark LLM inference server throughput, latency, and scalability"
}
}
]

URI: evalhub://providers/{id}

Example: evalhub://providers/garak

Returns details for a single provider including its available benchmarks and optional agent metadata.


URI: evalhub://benchmarks

Returns all benchmarks across all providers.

URI: evalhub://benchmarks/{id}

Example: evalhub://benchmarks/mmlu

Returns details for a single benchmark including its provider, description, and configuration.

URI: evalhub://benchmarks?label={tag}

Filter benchmarks by tag. The label parameter can be repeated for AND-style filtering — all specified labels must match.

Examples:

  • evalhub://benchmarks?label=safety — benchmarks tagged “safety”
  • evalhub://benchmarks?label=rag&label=reasoning — benchmarks tagged both “rag” and “reasoning”

URI: evalhub://collections

Returns all pre-defined benchmark collections.

[
{
"resource": {
"id": "safety-and-fairness-v1",
"created_at": "2026-05-01T12:00:00Z",
"updated_at": "2026-05-10T08:30:00Z"
},
"name": "Safety and Fairness v1",
"description": "Safety and bias evaluation benchmarks",
"category": "safety",
"agent": {
"evaluates": ["safety", "fairness", "bias", "toxicity", "ethics", "truthfulness"],
"summary": "Comprehensive safety and fairness suite covering toxicity, bias, truthfulness, and ethics",
"complements": ["garak", "toxicity-and-ethical-principles"],
"hints": [
"Runs 6 benchmarks across truthfulness, toxicity, gender bias, social bias, and ethics",
"Overall pass threshold is 0.758"
],
"result_interpretation": [
"Aggregate score is a weighted average across all benchmarks, higher is better"
]
}
},
{
"resource": {
"id": "leaderboard-v2",
"created_at": "2026-05-01T12:00:00Z",
"updated_at": "2026-05-10T08:30:00Z"
},
"name": "Leaderboard v2",
"description": "Standard leaderboard benchmarks",
"category": "general"
}
]

URI: evalhub://collections/{id}

Example: evalhub://collections/leaderboard-v2

Returns the collection with its full benchmark list and configuration.


URI: evalhub://jobs

Returns all evaluation jobs. Supports pagination.

Query parameters:

ParameterTypeDescription
limitintegerMaximum items to return (1–2000, default 100)
offsetintegerNumber of items to skip

Example: evalhub://jobs?limit=20&offset=0

URI: evalhub://jobs?status={status}

StatusDescription
pendingQueued, waiting to start
runningCurrently executing
completedAll benchmarks finished
failedOne or more benchmarks failed
cancelledCancelled by user
partially_failedSome benchmarks succeeded, others failed

Examples:

  • evalhub://jobs?status=running — all currently running jobs
  • evalhub://jobs?status=completed&limit=10 — last 10 completed jobs

URI: evalhub://jobs/{id}

Example: evalhub://jobs/job-a1b2c3d4

Returns full job details including state, progress, per-benchmark status, and timestamps.


URI: evalhub://server/version

Returns server version and build information.

{
"version": "0.4.0",
"git_hash": "d2c6d42",
"build_date": "2026-05-20T12:00:00Z",
"go_version": "go1.25.9",
"os": "linux",
"arch": "amd64",
"mcp_library": "github.com/modelcontextprotocol/go-sdk",
"mcp_library_version": "0.2.0"
}

The MCP server provides autocompletion for resource URI parameters. MCP clients that support completions will suggest valid IDs when you type resource URIs:

  • Provider IDs — when typing evalhub://providers/{id}
  • Benchmark IDs — when typing evalhub://benchmarks/{id}
  • Collection IDs — when typing evalhub://collections/{id}
  • Job IDs — when typing evalhub://jobs/{id}
  • Status values — when typing evalhub://jobs?status=
  • Labels — when typing evalhub://benchmarks?label=

Completions are cached for 30 seconds and support partial matching.