EvalHub Server¶
Go REST API service for managing LLM evaluation workflows.
REST API¶
All endpoints are under /api/v1. Request and response bodies use JSON. The OpenAPI 3.1.0 specification is served at /openapi.yaml.
See https://eval-hub.github.io/eval-hub/ for the full specification.
Evaluation Jobs¶
POST /api/v1/evaluations/jobs # Submit evaluation
GET /api/v1/evaluations/jobs # List jobs
GET /api/v1/evaluations/jobs/{id} # Get job status and results
DELETE /api/v1/evaluations/jobs/{id} # Cancel job
POST /api/v1/evaluations/jobs/{id}/events # Status/result callback (adapter → server)
Providers¶
GET /api/v1/evaluations/providers # List providers
POST /api/v1/evaluations/providers # Register provider
GET /api/v1/evaluations/providers/{id} # Get provider
PUT /api/v1/evaluations/providers/{id} # Update provider
PATCH /api/v1/evaluations/providers/{id} # Patch provider
DELETE /api/v1/evaluations/providers/{id} # Delete provider
Query parameters: benchmarks=true|false (default true), scope=system|tenant (default is not set which means all providers).
Benchmarks are returned as part of the provider response. There is no separate /benchmarks endpoint.
Collections¶
GET /api/v1/evaluations/collections # List collections
POST /api/v1/evaluations/collections # Create collection
GET /api/v1/evaluations/collections/{id} # Get collection
PUT /api/v1/evaluations/collections/{id} # Update collection
PATCH /api/v1/evaluations/collections/{id} # Patch collection
DELETE /api/v1/evaluations/collections/{id} # Delete collection
Health and Metrics¶
GET /api/v1/health # Health check
GET /metrics # Prometheus metrics
GET /openapi.yaml # OpenAPI specification
GET /docs # Interactive API docs
Configuration¶
Configuration loads from config/config.yaml, with environment variable and file-based secret overrides.
Key Settings¶
| Setting | Env Var | Default | Description |
|---|---|---|---|
service.port |
PORT |
8080 |
API listen port |
database.driver |
- | sqlite |
sqlite or pgx |
database.url |
DB_URL |
SQLite in-memory | Connection string |
mlflow.tracking_uri |
MLFLOW_TRACKING_URI |
- | MLflow server URL |
prometheus.enabled |
- | true |
Enable /metrics |
otel.enabled |
- | false |
Enable OpenTelemetry |
Provider Configuration¶
Providers are loaded from YAML files in config/providers/. Built-in providers: lm_evaluation_harness (167 benchmarks), garak (8), guidellm (7), lighteval (24).
Custom providers can be added via YAML files or the POST /api/v1/evaluations/providers endpoint.
Runtimes¶
Kubernetes (default)¶
Creates a Kubernetes Job per benchmark with:
- ConfigMap: JobSpec mounted at
/meta/job.json - Adapter container: Runs the evaluation framework
- Sidecar container: Forwards status events to the server
- Volumes: OCI credentials, MLflow token, model auth secrets
Local¶
Spawns subprocesses (up to 5 workers) for each benchmark. Enabled with the -local flag. Useful for development without a cluster.
Deployment¶
The server is deployed by the TrustyAI Operator via the EvalHub custom resource. See OpenShift Setup for production deployment.
Next Steps¶
- Python SDK - Client reference
- Architecture - Adapter architecture
- OpenShift Setup - Production deployment