Complete reference for GuideLLM adapter configuration options.
The GuideLLM adapter uses a standardised JobSpec structure:
"benchmark_id": "string",
// GuideLLM-specific configuration
"experiment_name": "string",
| Parameter | Type | Description | Example |
|---|
id | string | Unique job identifier | "guidellm-001" |
benchmark_id | string | Benchmark identifier | "performance_sweep" |
model.name | string | Model name | "Qwen/Qwen2.5-1.5B-Instruct" |
model.url | string | OpenAI-compatible API endpoint | "http://localhost:8000/v1" |
| Parameter | Type | Description | Default |
|---|
experiment_name | string | Experiment identifier | null |
tags | object | Free-form metadata tags | {} |
All configuration is specified in the parameters object.
| Parameter | Type | Description | Options |
|---|
profile | string | Execution profile | sweep, throughput, concurrent, constant, poisson, synchronous |
See Execution Profiles for detailed information on each profile type.
| Parameter | Type | Description | Varies by Profile |
|---|
rate | number or array | Request rate configuration | Profile-dependent |
Profile-specific behaviour:
- sweep: Not used (automatically determined)
- throughput: Not used (maximum speed)
- concurrent: Number of concurrent requests
- constant: Requests per second
- poisson: Average requests per second
- synchronous: Not used (sequential)
| Parameter | Type | Description | Default |
|---|
max_seconds | number | Maximum duration in seconds | None (unlimited) |
max_requests | number | Maximum number of requests | None (unlimited) |
max_errors | number | Error threshold before stopping | None (unlimited) |
max_error_rate | number | Error rate threshold (0-1) | None |
max_global_error_rate | number | Global error rate threshold | None |
| Parameter | Type | Description | Example |
|---|
warmup | string or number | Warmup period to exclude | "5%" or 10 |
cooldown | string or number | Cooldown period to exclude | "5%" or 10 |
Format:
- Percentage:
"5%" - exclude first/last 5% of requests
- Absolute:
10 - exclude first/last 10 seconds
| Parameter | Type | Description | Default |
|---|
detect_saturation | boolean | Enable over-saturation detection | false |
over_saturation | number | Saturation threshold multiplier | 1.5 |
When enabled, automatically detects when the server is saturated and adjusts testing accordingly.
Generate synthetic requests with specified token counts:
"data": "prompt_tokens=50,output_tokens=20"
Format: prompt_tokens=N,output_tokens=M
Use datasets from HuggingFace:
"data": "hf:abisee/cnn_dailymail",
"data_args": {"name": "3.0.0"},
"data_column_mapper": {"text_column": "article"},
| Parameter | Type | Description |
|---|
data | string | Dataset identifier (prefix with hf:) |
data_args | object | Dataset loading arguments |
data_column_mapper | object | Column name mappings |
data_samples | number | Maximum samples to use |
Use local data files:
"data": "file:///path/to/prompts.jsonl",
Supported formats: JSON, JSONL, CSV
| Parameter | Type | Description | Example |
|---|
processor | string | Tokeniser for synthetic data | "gpt2" |
processor_args | array | Processor arguments | [] |
data_num_workers | number | Parallel workers for data loading | 1 |
| Parameter | Type | Description | Options |
|---|
request_type | string | API endpoint type | chat_completions, completions, audio_transcription, audio_translation |
Default: chat_completions
| Parameter | Type | Description | Options |
|---|
data_request_formatter | string | Request format | chat_completions, completions |
data_collator | string | Data collation strategy | generative |
| Parameter | Type | Description | Default |
|---|
outputs | array | Output formats | ["json", "csv", "html", "yaml"] |
output_dir | string | Output directory | /tmp/guidellm_results_* |
| Parameter | Type | Description | Default |
|---|
random_seed | number | Random seed for reproducibility | 42 |
| Parameter | Type | Description | Default |
|---|
backend | string | Backend type | openai_http |
backend_kwargs | object | Additional backend arguments | null |
The adapter reads runtime settings from environment variables:
| Variable | Description | Required | Default |
|---|
EVALHUB_MODE | Execution mode | No | k8s |
EVALHUB_JOB_SPEC_PATH | Path to job spec JSON | Yes (local mode) | /meta/job.json (k8s), meta/job.json (local) |
SERVICE_URL | Eval-hub service URL | No | null |
REGISTRY_URL | OCI registry URL | No | null |
REGISTRY_USERNAME | Registry username | No | null |
REGISTRY_PASSWORD | Registry password | No | null |
REGISTRY_INSECURE | Allow insecure registry | No | false |
"id": "guidellm-production-001",
"benchmark_id": "performance_sweep",
"name": "Qwen/Qwen2.5-1.5B-Instruct",
"url": "http://127.0.0.1:8000/v1"
"data": "prompt_tokens=256,output_tokens=128",
"request_type": "chat_completions",
"detect_saturation": true,
"experiment_name": "qwen-load-test",
"evaluation_type": "performance"