GuideLLM Adapter
The GuideLLM adapter integrates GuideLLM with the eval-hub evaluation service using the evalhub-sdk framework adapter pattern.
Overview
Section titled “Overview”GuideLLM is a performance benchmarking platform designed to evaluate language model inference servers under realistic production conditions.
Key Features
Section titled “Key Features”- Multiple execution profiles: Sweep, throughput, concurrent, constant, poisson, synchronous
- Comprehensive metrics: Time to First Token (TTFT), Inter-Token Latency (ITL), end-to-end latency, throughput
- Flexible data sources: Synthetic data generation, HuggingFace datasets, local files
- Rich reporting: JSON, CSV, HTML, and YAML output formats with detailed visualisations
Supported Backends
Section titled “Supported Backends”- OpenAI-compatible endpoints (vLLM, Text Generation Inference, etc.)
- Any HTTP API following OpenAI’s chat completions or completions format
Architecture
Section titled “Architecture”The adapter follows the eval-hub framework adapter pattern with automatic configuration:
Workflow:
- Settings-based configuration: Runtime settings loaded automatically from environment
- Automatic JobSpec loading: Job configuration auto-loaded from mounted ConfigMap
- Callback-based communication: Progress updates and artifacts sent to sidecar via callbacks
- Synchronous execution: The entire job lifetime is defined by the
run_benchmark_job()method - OCI artifact persistence: Results persisted as OCI artifacts via the sidecar
- Structured results: Returns
JobResultswith standardised performance metrics
Quick Start
Section titled “Quick Start”Building the Container
Section titled “Building the Container”make image-guidellmRunning Locally
Section titled “Running Locally”For local testing without Kubernetes:
# Set environment for local modeexport EVALHUB_MODE=localexport EVALHUB_JOB_SPEC_PATH=meta/job.jsonexport SERVICE_URL=http://localhost:8080 # Optional: if mock service is running
# Run the adapterpython main.py# Start Ollama and pull a modelollama run qwen2.5:1.5b
# Run benchmark against Ollamaexport EVALHUB_MODE=localexport EVALHUB_JOB_SPEC_PATH=meta/job.jsonpython main.pyContainer Image
Section titled “Container Image”# Pull from registrypodman pull quay.io/evalhub/community-guidellm:latest
# Run with custom job specpodman run \ -e EVALHUB_MODE=local \ -e EVALHUB_JOB_SPEC_PATH=/meta/job.json \ -v $(pwd)/job.json:/meta/job.json:ro \ quay.io/evalhub/community-guidellm:latest