Skip to content

Using Custom Data

Some benchmarks require external test datasets. EvalHub can load this data from S3-compatible storage (AWS S3, MinIO, or any S3 API-compatible service) before the evaluation job runs. Data is downloaded by an init container and mounted at /test_data for the adapter.

This guide gives a minimal, reproducible setup using MinIO in Kubernetes.

  1. You configure a benchmark with a test data reference (bucket, key prefix, and a Kubernetes Secret name for credentials).
  2. When the evaluation job starts, an init container runs first: it uses the secret to connect to S3/MinIO, lists objects under the given key prefix, and downloads them to a shared volume.
  3. The adapter container runs with /test_data populated; the benchmark can read files from that path.

The init container expects credentials from a Kubernetes Secret mounted at /var/run/secrets/test-data with these keys (as filenames):

KeyDescription
AWS_ACCESS_KEY_IDS3 access key
AWS_SECRET_ACCESS_KEYS3 secret key
AWS_DEFAULT_REGIONRegion (e.g. us-east-1; required even for MinIO)
AWS_S3_ENDPOINTFull endpoint URL (e.g. http://minio:9000)
  • Kubernetes cluster with EvalHub installed (see Installation).
  • kubectl configured to use that cluster.
  1. Deploy MinIO

    Create a namespace and deploy MinIO (API on port 9000, console on 9001):

    minio.yaml
    apiVersion: v1
    kind: Namespace
    metadata:
    name: evalhub-demo
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: minio
    namespace: evalhub-demo
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: minio
    template:
    metadata:
    labels:
    app: minio
    spec:
    containers:
    - name: minio
    image: quay.io/minio/minio:latest
    args:
    - server
    - /data
    - --console-address
    - ":9001"
    env:
    - name: MINIO_ROOT_USER
    value: "minioadmin"
    - name: MINIO_ROOT_PASSWORD
    value: "minioadmin"
    ports:
    - containerPort: 9000
    name: api
    - containerPort: 9001
    name: console
    volumeMounts:
    - name: data
    mountPath: /data
    readinessProbe:
    httpGet:
    path: /minio/health/ready
    port: 9000
    initialDelaySeconds: 5
    periodSeconds: 10
    volumes:
    - name: data
    emptyDir: {}
    ---
    apiVersion: v1
    kind: Service
    metadata:
    name: minio
    namespace: evalhub-demo
    spec:
    selector:
    app: minio
    ports:
    - name: api
    port: 9000
    targetPort: 9000
    - name: console
    port: 9001
    targetPort: 9001
    Terminal window
    kubectl apply -f minio.yaml
    kubectl wait -n evalhub-demo deployment/minio --for=condition=Available --timeout=120s
  2. Create bucket and upload test data

    Use the MinIO client or the MinIO web console (port 9001) to create a bucket and upload files. Example using a temporary pod:

    Terminal window
    # Create bucket and upload a sample file from a one-off pod
    kubectl run -n evalhub-demo mc --rm -i --restart=Never --image=quay.io/minio/mc:latest -- \
    sh -c "
    mc alias set myminio http://minio.evalhub-demo.svc.cluster.local:9000 minioadmin minioadmin
    mc mb myminio/evalhub-test --ignore-existing
    echo '{\"question\": \"What is 2+2?\", \"answer\": \"4\"}' | mc pipe myminio/evalhub-test/dataset/sample.json
    mc ls myminio/evalhub-test/dataset/
    "

    This creates bucket evalhub-test and key prefix dataset/ with one file. Adjust bucket name and prefix to match your benchmark.

  3. Create the credentials Secret

    Create a Secret in the same namespace where EvalHub runs evaluation jobs. The init container reads these keys from the mounted secret volume:

    evalhub-s3-credentials.yaml
    apiVersion: v1
    kind: Secret
    metadata:
    name: evalhub-s3-credentials
    namespace: evalhub-demo # use the namespace where EvalHub creates jobs
    type: Opaque
    stringData:
    AWS_ACCESS_KEY_ID: "minioadmin"
    AWS_SECRET_ACCESS_KEY: "minioadmin"
    AWS_DEFAULT_REGION: "us-east-1"
    AWS_S3_ENDPOINT: "http://minio.evalhub-demo.svc.cluster.local:9000"

    Use the in-cluster MinIO service DNS name so the job pod can reach MinIO. If EvalHub runs in a different namespace, create this secret in that namespace and set AWS_S3_ENDPOINT to http://minio.evalhub-demo.svc.cluster.local:9000 (or the correct service FQDN).

    Terminal window
    kubectl apply -f evalhub-s3-credentials.yaml
  4. Submit a job with S3 test data

    Attach the test data reference to the benchmark via test_data_ref.s3: bucket, key (prefix), and secret_ref (Secret name).

    Terminal window
    evalhub eval run \
    --name s3-data-eval \
    --model-url http://your-model:8000/v1 \
    --model-name my-model \
    --provider your_provider_id \
    --benchmark your_benchmark_id \
    --test-data-s3-bucket evalhub-test \
    --test-data-s3-key dataset/ \
    --test-data-s3-secret evalhub-s3-credentials
    • bucket: MinIO bucket name (e.g. evalhub-test).
    • key: Object key prefix; all objects under this prefix are downloaded (e.g. dataset/ or dataset).
    • secret_ref: Name of the Kubernetes Secret that holds the four keys above. The secret must exist in the namespace where the job is created.

    The init container downloads every object under s3://bucket/key into /test_data, preserving relative paths. The adapter can then read files from /test_data.

  • Init container fails (list or get object) Check that the job pod runs in a namespace where it can resolve the MinIO service (e.g. minio.evalhub-demo.svc.cluster.local) and that AWS_S3_ENDPOINT uses that URL. For MinIO, use http:// (or https:// if TLS is configured).

  • No objects found Ensure the bucket and key prefix exist and contain at least one object (the init container requires at least one file under the prefix).

  • Secret not found Create the Secret in the same namespace where EvalHub creates evaluation jobs, and use that exact name in secret_ref.