GuideLLM Adapter

The GuideLLM adapter integrates GuideLLM with the eval-hub evaluation service using the evalhub-sdk framework adapter pattern.

Overview

GuideLLM is a performance benchmarking platform designed to evaluate language model inference servers under realistic production conditions.

Key Features

Multiple execution profiles: Sweep, throughput, concurrent, constant, poisson, synchronous
Comprehensive metrics: Time to First Token (TTFT), Inter-Token Latency (ITL), end-to-end latency, throughput
Flexible data sources: Synthetic data generation, HuggingFace datasets, local files
Rich reporting: JSON, CSV, HTML, and YAML output formats with detailed visualisations

Supported Backends

OpenAI-compatible endpoints (vLLM, Text Generation Inference, etc.)
Any HTTP API following OpenAI’s chat completions or completions format

Architecture

The adapter follows the eval-hub framework adapter pattern with automatic configuration:

GuideLLM adapter architecture

Workflow:

Settings-based configuration: Runtime settings loaded automatically from environment
Automatic JobSpec loading: Job configuration auto-loaded from mounted ConfigMap
Callback-based communication: Progress updates and artifacts sent to sidecar via callbacks
Synchronous execution: The entire job lifetime is defined by the run_benchmark_job() method
OCI artifact persistence: Results persisted as OCI artifacts via the sidecar
Structured results: Returns JobResults with standardised performance metrics

Quick Start

Building the Container

make image-guidellm

Running Locally

For local testing without Kubernetes:

Basic Usage
With Ollama

# Set environment for local mode
export EVALHUB_MODE=local
export EVALHUB_JOB_SPEC_PATH=meta/job.json
export SERVICE_URL=http://localhost:8080  # Optional: if mock service is running

# Run the adapter
python main.py

# Start Ollama and pull a model
ollama run qwen2.5:1.5b

# Run benchmark against Ollama
export EVALHUB_MODE=local
export EVALHUB_JOB_SPEC_PATH=meta/job.json
python main.py

Container Image

# Pull from registry
podman pull quay.io/evalhub/community-guidellm:latest

# Run with custom job spec
podman run \
  -e EVALHUB_MODE=local \
  -e EVALHUB_JOB_SPEC_PATH=/meta/job.json \
  -v $(pwd)/job.json:/meta/job.json:ro \
  quay.io/evalhub/community-guidellm:latest