Local Mode
Local mode runs the full EvalHub evaluation pipeline on your workstation without a Kubernetes cluster. Activate it with the --local flag. The REST API is identical to cluster mode — the same endpoints, request bodies, and response schemas apply.
Local mode is useful for:
- Developing and testing evaluation adapters before deploying to a cluster
- Running evaluations against locally-served models (Ollama, llama.cpp, vLLM)
- Iterating on benchmark configurations without infrastructure overhead
- Debugging the end-to-end evaluation flow
For a hands-on walkthrough, see the Local Mode Tutorial.
Starting the Server in Local Mode
Section titled “Starting the Server in Local Mode”pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ eval-hub-servereval-hub-server --versionThe server requires a --configdir pointing to a directory containing a server config.yaml and a providers/ subdirectory with provider YAML files (see the Local Mode Tutorial for a step-by-step setup). The --local flag disables authentication and enables CORS. The server starts at http://localhost:8080 with SQLite in-memory storage.
eval-hub-server --local --configdir ./configCustom port:
PORT=8090 eval-hub-server --local --configdir ./configWith MLflow tracking:
MLFLOW_TRACKING_URI=http://localhost:5000 eval-hub-server --local --configdir ./configDifferences from Cluster Mode
Section titled “Differences from Cluster Mode”| Aspect | Cluster mode | Local mode |
|---|---|---|
| Job execution | Kubernetes Jobs (containers) | Host subprocesses (sh -c "<command>") |
| Authentication | Enabled (configurable) | Disabled automatically |
| Multi-tenancy | Tenant isolation via X-Tenant header | Single-tenant only |
| CORS | Disabled by default | Enabled |
| Sidecar proxy | Injected into job pods | Not used; adapters call services directly |
| Init container | Downloads test data to /test_data | Not used |
| Job scheduling (Kueue) | Supported via queue config | Ignored |
| Process isolation | Container sandbox per job | Shared host environment |
| Provider runtime config | runtime.k8s (image, entrypoint, resources) | runtime.local (command, env vars) |
How Local Job Execution Works
Section titled “How Local Job Execution Works”When an evaluation job is submitted in local mode, for each benchmark the server:
- Writes a job specification (
job.json) to/tmp/evalhub-jobs/<job_id>/<benchmark_index>/<provider_id>/<benchmark_id>/meta/ - Spawns the provider’s
runtime.local.commandas a shell process, passing the job spec path via theEVALHUB_JOB_SPEC_PATHenvironment variable - Captures stdout/stderr to
jobrun.logalongside the job spec - Tracks subprocess PIDs for cancellation (kills the entire process group on cancel)
- The adapter reads the job spec, runs the benchmark, and reports results back via the callback URL
Job file layout
Section titled “Job file layout”/tmp/evalhub-jobs/└── <job_id>/ └── <benchmark_index>/ └── <provider_id>/ └── <benchmark_id>/ ├── meta/ │ └── job.json # Job specification for the adapter └── jobrun.log # Stdout/stderr from the adapter processJob specification (job.json)
Section titled “Job specification (job.json)”The job specification is the same structure used in cluster mode (where it is mounted into the container as a ConfigMap). It contains all the information the adapter needs to run the benchmark:
{ "id": "<job_id>", "provider_id": "<provider_id>", "benchmark_id": "<benchmark_id>", "benchmark_index": 0, "model": { "url": "http://localhost:11434/v1", "name": "llama3.2:3b-instruct-q4_K_M" }, "num_examples": 10, "parameters": {}, "experiment_name": "my-experiment", "tags": [ { "key": "model", "value": "llama3.2:3b-instruct-q4_K_M" } ], "callback_url": "http://localhost:8080", "exports": { "oci": { "coordinates": { "oci_host": "localhost:5001", "oci_repository": "eval-results" } } }}Provider Configuration for Local Mode
Section titled “Provider Configuration for Local Mode”Each provider must have a runtime.local section specifying the adapter command and optional environment variables. The runtime.local.command is executed via sh -c "<command>".
id: my-providername: My Evaluation Providerdescription: Custom evaluation framework adapterruntime: local: command: "python main.py" env: - name: OCI_INSECURE value: "true"
benchmarks: - id: my_benchmark name: My benchmark description: Description of what this benchmark measures category: reasoning metrics: - acc primary_score: metric: acc lower_is_better: false pass_criteria: threshold: 0.25A provider configuration can include both runtime.local and runtime.k8s sections, allowing the same definition to work in both modes.
Environment variables
Section titled “Environment variables”The adapter process receives the following environment variables:
| Variable | Description |
|---|---|
EVALHUB_JOB_SPEC_PATH | Absolute path to the job.json file |
Custom env vars from runtime.local.env | Any additional variables defined in the provider config |
See System Overview — AdapterSettings for the complete list of adapter environment variables.
Writing Adapters for Both Modes
Section titled “Writing Adapters for Both Modes”Adapters that need to work in both cluster and local mode should use a common pattern for resolving the output directory. In cluster mode the adapter writes results relative to its own directory; in local mode the job base path is available and results go under it:
if self.local_jobs_base_path is not None: output_dir = self.local_jobs_base_path / "results"else: output_dir = Path(__file__).parent / "results"See the LightEval adapter source for a working example.
Troubleshooting
Section titled “Troubleshooting”Adapter process logs
Section titled “Adapter process logs”Check the adapter process output in the job log file:
cat /tmp/evalhub-jobs/<job_id>/<benchmark_index>/<provider_id>/<benchmark_id>/jobrun.logServer logs
Section titled “Server logs”The server logs structured JSON to stderr. Look for local runtime messages:
local runtime job spec written— job spec was created successfullylocal runtime process started— adapter process was launched with the logged PID and commandlocal runtime benchmark launch failed— adapter command failed to start
Common issues
Section titled “Common issues”| Symptom | Cause | Fix |
|---|---|---|
| Job fails immediately | Adapter command not found | Verify runtime.local.command path and that dependencies are installed |
Job stays in running state | Adapter is not reporting back | Check the adapter logs in jobrun.log; verify the callback URL is reachable |
provider has no local runtime configured | Missing runtime.local in provider YAML | Add a runtime.local.command to the provider configuration |
| MLflow experiment not created | MLflow not configured | Set MLFLOW_TRACKING_URI or mlflow.tracking_uri in config.yaml |
| OCI push fails | Registry not reachable or requires auth | Verify the registry is running and set OCI_INSECURE=true for local registries |