Installation¶
Prerequisites¶
- Python 3.11+
- Kubernetes/OpenShift cluster (for production deployment)
Client SDK¶
from evalhub import SyncEvalHubClient
from evalhub.models.api import ModelConfig, BenchmarkConfig, JobSubmissionRequest
client = SyncEvalHubClient(base_url="http://localhost:8080")
providers = client.providers.list()
Server Deployment¶
Install using the TrustyAI Operator:
kubectl apply -f https://github.com/trustyai-explainability/trustyai-service-operator/releases/latest/download/trustyai-operator.yaml
# SQLite (development/testing): no external database needed.
kubectl apply -f - <<EOF
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: EvalHub
metadata:
name: evalhub
namespace: evalhub
spec:
replicas: 1
database:
type: sqlite
providers:
- lm-evaluation-harness
- garak
- guidellm
collections:
- leaderboard-v2
EOF
# PostgreSQL (production): create the credentials secret first, then the CR.
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: evalhub-db-credentials
namespace: evalhub
type: Opaque
stringData:
db-url: "postgres://user:password@db-host:5432/evalhub"
EOF
kubectl apply -f - <<EOF
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: EvalHub
metadata:
name: evalhub
namespace: evalhub
spec:
replicas: 1
database:
type: postgresql
secret: evalhub-db-credentials
providers:
- lm-evaluation-harness
- garak
- guidellm
collections:
- leaderboard-v2
EOF
See OpenShift Setup for a full description of all spec fields.
Provider Configuration¶
Providers are defined in YAML files under config/providers/. Each provider specifies a container image, resource requirements, and available benchmarks.
id: guidellm
name: GuideLLM
description: Performance benchmarking framework
type: builtin
runtime:
k8s:
image: quay.io/eval-hub/community-guidellm:latest
entrypoint: [python, main.py]
cpu_request: 100m
memory_request: 128Mi
cpu_limit: 500m
memory_limit: 1Gi
benchmarks:
- id: sweep
name: Rate Sweep
category: performance
metrics: [requests_per_second, mean_ttft_ms, mean_itl_ms]
Custom providers can also be created via the REST API:
curl -X POST http://localhost:8080/api/v1/evaluations/providers \
-H "Content-Type: application/json" \
-d @my-provider.json
Model Serving¶
For evaluations, you need a model endpoint compatible with the OpenAI API.
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-server
spec:
replicas: 1
selector:
matchLabels:
app: vllm
template:
metadata:
labels:
app: vllm
spec:
containers:
- name: vllm
image: vllm/vllm-openai:latest
args: [--model, meta-llama/Llama-3.2-1B-Instruct, --port, "8000"]
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: 1
EOF
Verification¶
Next Steps¶
- Quick Start - Run your first evaluation
- OpenShift Setup - Production deployment guide