Local Mode Tutorial
End-to-end walkthrough: run a LightEval evaluation locally with EvalHub, MLflow experiment tracking, and OCI artifact storage — no Kubernetes required.
This tutorial is based on the local-lighteval example in the EvalHub repository. For background on how local mode works, see the Local Mode guide.
Prerequisites
Section titled “Prerequisites”Install the following tools before starting:
- uv — Python package manager
- podman (or Docker) — for running the OCI registry
- ollama (or any OpenAI-compatible LLM server) — for serving a local model
Get the example
Section titled “Get the example”Download the example directory:
mkdir local-lighteval && cd local-lightevalcurl -sL https://github.com/eval-hub/eval-hub/archive/refs/heads/main.tar.gz \ | tar xz --strip-components=3 eval-hub-main/examples/local-lighteval-
Download the LightEval adapter
Download the adapter driver and its requirements from eval-hub-contrib:
Terminal window curl -o main.py https://raw.githubusercontent.com/eval-hub/eval-hub-contrib/main/adapters/lighteval/main.pycurl -o requirements.txt https://raw.githubusercontent.com/eval-hub/eval-hub-contrib/main/adapters/lighteval/requirements.txt -
Install dependencies
Install the project packages and the LightEval adapter requirements:
Terminal window uv sync --extra demouv pip install -r requirements.txtThis installs eval-hub-server, MLflow, notebook dependencies, and the LightEval adapter runtime.
-
Start MLflow (optional)
Only needed if you want experiment tracking. Skip this step if you just want to run evaluations.
Activate the venv and start the MLflow server:
Terminal window source .venv/bin/activatemlflow server \--backend-store-uri sqlite:///mlflow.db \--host localhost \--port 5000Verify from another terminal:
Terminal window curl http://localhost:5000/healthThe MLflow UI dashboard is accessible at
http://localhost:5000. -
Start the OCI registry (optional)
Only needed if you want to persist evaluation artifacts to an OCI registry. Skip this step if you just want to run evaluations.
In another terminal, pull the registry image and start it on
localhost:5001:Terminal window podman pull docker.io/library/registry:2podman run -d -p 5001:5000 \--name eval-hub-oci-registry \-e REGISTRY_STORAGE_DELETE_ENABLED=true \docker.io/library/registry:2 -
Start the LLM server
Pull a model:
Terminal window ollama pull llama3.2:3b-instruct-q4_K_MVerify it’s running:
Terminal window curl -s http://localhost:11434/v1/chat/completions \-H "Content-Type: application/json" \-d '{"model": "llama3.2:3b-instruct-q4_K_M","messages": [{"role": "user", "content": "Why is the sky blue?"}],"max_tokens": 100}' -
Configure eval-hub-server
Download the template
config.yamlfrom the eval-hub repository:Terminal window mkdir -p configcurl -o config/config.yaml https://raw.githubusercontent.com/eval-hub/eval-hub/main/config/config.yamlCreate the provider configuration at
config/providers/lighteval.yaml:Terminal window mkdir -p config/providerscat > config/providers/lighteval.yaml << 'EOF'id: lightevalname: LightEvaldescription: LightEval for evaluation frameworkruntime:local:command: "python main.py"env:- name: OCI_INSECUREvalue: "true"benchmarks:- id: gsm8kname: Grade-school math word problemsdescription: |-Multi-step arithmetic word problems requiring 2-8 reasoning steps(8-shot, 1,319 examples).category: mathmetrics:- exact_match- accnum_few_shot: 8dataset_size: 1319tags:- math- reasoning- lightevalprimary_score:metric: acclower_is_better: falsepass_criteria:threshold: 0.25EOFSee the Local Mode guide — Provider Configuration for details on the
runtime.localsection. -
Start eval-hub-server
In a terminal with the venv activated, start the server:
Terminal window source .venv/bin/activateeval-hub-server --local --configdir ./configIf MLflow is running, pass the tracking URI:
Terminal window MLFLOW_TRACKING_URI=http://localhost:5000 \eval-hub-server --local --configdir ./configVerify from another terminal:
Terminal window curl http://localhost:8080/api/v1/health
Run an evaluation
Section titled “Run an evaluation”With all services running, submit a job using the CLI:
source .venv/bin/activateuv pip install "eval-hub-sdk[cli]"evalhub config set base_url http://localhost:8080
evalhub eval run \ --name local-lighteval-demo \ --model-url http://localhost:11434/v1 \ --model-name llama3.2:3b-instruct-q4_K_M \ --provider lighteval \ --benchmark gsm8k \ --param num_examples=10 \ --param num_few_shot=0 \ --waitCheck results:
evalhub eval results <job-id>With MLflow experiments and OCI artifact storage
Section titled “With MLflow experiments and OCI artifact storage”MLflow experiment tracking and OCI artifact export are optional. If you started MLflow and the OCI registry in the setup steps, add the --experiment and --oci-* flags:
evalhub eval run \ --name local-lighteval-demo \ --model-url http://localhost:11434/v1 \ --model-name llama3.2:3b-instruct-q4_K_M \ --provider lighteval \ --benchmark gsm8k \ --param num_examples=10 \ --param num_few_shot=0 \ --experiment my-local-experiment \ --oci-host localhost:5001 \ --oci-repository local-eval-results \ --waitevalhub eval results <job-id> --format jsonUsing the Python SDK
Section titled “Using the Python SDK”The included evalhub-client.ipynb notebook demonstrates the full evaluation lifecycle using the eval-hub-sdk Python client — submitting jobs, polling status, and retrieving results programmatically.
What’s running
Section titled “What’s running”After completing setup, you have four services on localhost:
| Service | URL | Purpose |
|---|---|---|
| eval-hub-server | http://localhost:8080 | Evaluation orchestration |
| MLflow | http://localhost:5000 | Experiment tracking dashboard |
| OCI registry | http://localhost:5001 | Artifact storage |
| Ollama | http://localhost:11434 | LLM inference |
Next steps
Section titled “Next steps”- Browse the MLflow UI at
http://localhost:5000to see experiment metrics - Read the Local Mode guide for provider configuration details and troubleshooting
- Try other adapters by adding provider YAML files to
config/providers/