Skip to content

Local Mode

Local mode runs the full EvalHub evaluation pipeline on your workstation without a Kubernetes cluster. Activate it with the --local flag. The REST API is identical to cluster mode — the same endpoints, request bodies, and response schemas apply.

Local mode is useful for:

  • Developing and testing evaluation adapters before deploying to a cluster
  • Running evaluations against locally-served models (Ollama, llama.cpp, vLLM)
  • Iterating on benchmark configurations without infrastructure overhead
  • Debugging the end-to-end evaluation flow

For a hands-on walkthrough, see the Local Mode Tutorial.

Terminal window
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ eval-hub-server
eval-hub-server --version

The server requires a --configdir pointing to a directory containing a server config.yaml and a providers/ subdirectory with provider YAML files (see the Local Mode Tutorial for a step-by-step setup). The --local flag disables authentication and enables CORS. The server starts at http://localhost:8080 with SQLite in-memory storage.

Terminal window
eval-hub-server --local --configdir ./config

Custom port:

Terminal window
PORT=8090 eval-hub-server --local --configdir ./config

With MLflow tracking:

Terminal window
MLFLOW_TRACKING_URI=http://localhost:5000 eval-hub-server --local --configdir ./config
AspectCluster modeLocal mode
Job executionKubernetes Jobs (containers)Host subprocesses (sh -c "<command>")
AuthenticationEnabled (configurable)Disabled automatically
Multi-tenancyTenant isolation via X-Tenant headerSingle-tenant only
CORSDisabled by defaultEnabled
Sidecar proxyInjected into job podsNot used; adapters call services directly
Init containerDownloads test data to /test_dataNot used
Job scheduling (Kueue)Supported via queue configIgnored
Process isolationContainer sandbox per jobShared host environment
Provider runtime configruntime.k8s (image, entrypoint, resources)runtime.local (command, env vars)

When an evaluation job is submitted in local mode, for each benchmark the server:

  1. Writes a job specification (job.json) to /tmp/evalhub-jobs/<job_id>/<benchmark_index>/<provider_id>/<benchmark_id>/meta/
  2. Spawns the provider’s runtime.local.command as a shell process, passing the job spec path via the EVALHUB_JOB_SPEC_PATH environment variable
  3. Captures stdout/stderr to jobrun.log alongside the job spec
  4. Tracks subprocess PIDs for cancellation (kills the entire process group on cancel)
  5. The adapter reads the job spec, runs the benchmark, and reports results back via the callback URL
/tmp/evalhub-jobs/
└── <job_id>/
└── <benchmark_index>/
└── <provider_id>/
└── <benchmark_id>/
├── meta/
│ └── job.json # Job specification for the adapter
└── jobrun.log # Stdout/stderr from the adapter process

The job specification is the same structure used in cluster mode (where it is mounted into the container as a ConfigMap). It contains all the information the adapter needs to run the benchmark:

{
"id": "<job_id>",
"provider_id": "<provider_id>",
"benchmark_id": "<benchmark_id>",
"benchmark_index": 0,
"model": {
"url": "http://localhost:11434/v1",
"name": "llama3.2:3b-instruct-q4_K_M"
},
"num_examples": 10,
"parameters": {},
"experiment_name": "my-experiment",
"tags": [
{ "key": "model", "value": "llama3.2:3b-instruct-q4_K_M" }
],
"callback_url": "http://localhost:8080",
"exports": {
"oci": {
"coordinates": {
"oci_host": "localhost:5001",
"oci_repository": "eval-results"
}
}
}
}

Each provider must have a runtime.local section specifying the adapter command and optional environment variables. The runtime.local.command is executed via sh -c "<command>".

id: my-provider
name: My Evaluation Provider
description: Custom evaluation framework adapter
runtime:
local:
command: "python main.py"
env:
- name: OCI_INSECURE
value: "true"
benchmarks:
- id: my_benchmark
name: My benchmark
description: Description of what this benchmark measures
category: reasoning
metrics:
- acc
primary_score:
metric: acc
lower_is_better: false
pass_criteria:
threshold: 0.25

A provider configuration can include both runtime.local and runtime.k8s sections, allowing the same definition to work in both modes.

The adapter process receives the following environment variables:

VariableDescription
EVALHUB_JOB_SPEC_PATHAbsolute path to the job.json file
Custom env vars from runtime.local.envAny additional variables defined in the provider config

See System Overview — AdapterSettings for the complete list of adapter environment variables.

Adapters that need to work in both cluster and local mode should use a common pattern for resolving the output directory. In cluster mode the adapter writes results relative to its own directory; in local mode the job base path is available and results go under it:

if self.local_jobs_base_path is not None:
output_dir = self.local_jobs_base_path / "results"
else:
output_dir = Path(__file__).parent / "results"

See the LightEval adapter source for a working example.

Check the adapter process output in the job log file:

Terminal window
cat /tmp/evalhub-jobs/<job_id>/<benchmark_index>/<provider_id>/<benchmark_id>/jobrun.log

The server logs structured JSON to stderr. Look for local runtime messages:

  • local runtime job spec written — job spec was created successfully
  • local runtime process started — adapter process was launched with the logged PID and command
  • local runtime benchmark launch failed — adapter command failed to start
SymptomCauseFix
Job fails immediatelyAdapter command not foundVerify runtime.local.command path and that dependencies are installed
Job stays in running stateAdapter is not reporting backCheck the adapter logs in jobrun.log; verify the callback URL is reachable
provider has no local runtime configuredMissing runtime.local in provider YAMLAdd a runtime.local.command to the provider configuration
MLflow experiment not createdMLflow not configuredSet MLFLOW_TRACKING_URI or mlflow.tracking_uri in config.yaml
OCI push failsRegistry not reachable or requires authVerify the registry is running and set OCI_INSECURE=true for local registries