Skip to content

Quick Start

Run your first evaluation with EvalHub.

  1. Start a Model Server

    You need an OpenAI-compatible model endpoint. Pick one:

    Terminal window
    kubectl apply -f - <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: vllm-server
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: vllm
    template:
    metadata:
    labels:
    app: vllm
    spec:
    containers:
    - name: vllm
    image: vllm/vllm-openai:latest
    args: [--model, meta-llama/Llama-3.2-1B-Instruct, --port, "8000"]
    ports:
    - containerPort: 8000
    EOF
  2. Install a Client

    Terminal window
    pip install "eval-hub-sdk[cli]"
  3. Submit an Evaluation

    Terminal window
    evalhub eval run \
    --name quickstart-eval \
    --model-url http://localhost:11434/v1 \
    --model-name qwen2.5:1.5b \
    --provider guidellm \
    --benchmark quick_perf_test
    # Job submitted: eval-a1b2c3d4
  4. Wait for Results

    Terminal window
    # Block until the job completes
    evalhub eval run --config eval.yaml --wait
    # Or check status separately
    evalhub eval status eval-a1b2c3d4
    # View results
    evalhub eval results eval-a1b2c3d4
Terminal window
evalhub providers list
evalhub providers describe lm_evaluation_harness
Terminal window
evalhub collections run healthcare_safety_v1 \
--model-url http://localhost:11434/v1 \
--model-name qwen2.5:1.5b \
--name llama3-healthcare-eval \
--wait
Terminal window
evalhub eval run \
--name llama3-mlflow-eval \
--model-url http://localhost:11434/v1 \
--model-name qwen2.5:1.5b \
--provider lm_evaluation_harness \
--benchmark mmlu \
--experiment my-experiment
Terminal window
# JSON
evalhub eval results eval-a1b2c3d4 --format json > results.json
# CSV
evalhub eval results eval-a1b2c3d4 --format csv > results.csv

EvalHub can persist evaluation result files as OCI Artifacts to an OCI Registry. Each result file is stored as a separate layer in the OCI Artifact, allowing consumers to selectively pull only the content they need (e.g. a summary JSON, individual adapter output files, or all files).

First, create a Kubernetes Secret with your registry credentials (aka “OCI Connection”):

kind: Secret
apiVersion: v1
type: kubernetes.io/dockerconfigjson
metadata:
name: my-oci-credentials
data:
.dockerconfigjson: <base64-encoded docker config>

Then reference it in the job submission using the exports.oci field:

Terminal window
evalhub eval run \
--name my-eval-job1 \
--model-url http://vllm-llama3-8b-instruct-svc.evalhub-test.svc.cluster.local:8000/v1 \
--model-name meta-llama/Llama-3.2-1B-Instruct \
--provider demo \
--benchmark demo_benchmark \
--oci-host quay.io \
--oci-repository myorg/myartifact \
--oci-connection my-oci-credentials

When the evaluation completes, the adapter pushes an OCI Artifact to the specified repository (e.g. quay.io/myorg/myartifact). The k8s.connection field references the name of the Kubernetes Secret containing the registry credentials.

Job stuck in pending: Check server logs with kubectl logs deployment/evalhub-server or run locally with debug logging.

Model server not responding: Verify the model endpoint is reachable from the adapter pod (curl http://model-server:8000/v1/models).