Skip to content

Configuration Reference

Complete reference for GuideLLM adapter configuration options.

The GuideLLM adapter uses a standardised JobSpec structure:

{
"id": "string",
"benchmark_id": "string",
"model": {
"name": "string",
"url": "string"
},
"parameters": {
// GuideLLM-specific configuration
},
"experiment_name": "string",
"tags": {}
}
ParameterTypeDescriptionExample
idstringUnique job identifier"guidellm-001"
benchmark_idstringBenchmark identifier"performance_sweep"
model.namestringModel name"Qwen/Qwen2.5-1.5B-Instruct"
model.urlstringOpenAI-compatible API endpoint"http://localhost:8000/v1"
ParameterTypeDescriptionDefault
experiment_namestringExperiment identifiernull
tagsobjectFree-form metadata tags{}

All configuration is specified in the parameters object.

ParameterTypeDescriptionOptions
profilestringExecution profilesweep, throughput, concurrent, constant, poisson, synchronous

See Execution Profiles for detailed information on each profile type.

ParameterTypeDescriptionVaries by Profile
ratenumber or arrayRequest rate configurationProfile-dependent

Profile-specific behaviour:

  • sweep: Not used (automatically determined)
  • throughput: Not used (maximum speed)
  • concurrent: Number of concurrent requests
  • constant: Requests per second
  • poisson: Average requests per second
  • synchronous: Not used (sequential)
ParameterTypeDescriptionDefault
max_secondsnumberMaximum duration in secondsNone (unlimited)
max_requestsnumberMaximum number of requestsNone (unlimited)
max_errorsnumberError threshold before stoppingNone (unlimited)
max_error_ratenumberError rate threshold (0-1)None
max_global_error_ratenumberGlobal error rate thresholdNone
ParameterTypeDescriptionExample
warmupstring or numberWarmup period to exclude"5%" or 10
cooldownstring or numberCooldown period to exclude"5%" or 10

Format:

  • Percentage: "5%" - exclude first/last 5% of requests
  • Absolute: 10 - exclude first/last 10 seconds
ParameterTypeDescriptionDefault
detect_saturationbooleanEnable over-saturation detectionfalse
over_saturationnumberSaturation threshold multiplier1.5

When enabled, automatically detects when the server is saturated and adjusts testing accordingly.

Generate synthetic requests with specified token counts:

{
"parameters": {
"data": "prompt_tokens=50,output_tokens=20"
}
}

Format: prompt_tokens=N,output_tokens=M

Use datasets from HuggingFace:

{
"parameters": {
"data": "hf:abisee/cnn_dailymail",
"data_args": {"name": "3.0.0"},
"data_column_mapper": {"text_column": "article"},
"data_samples": 100
}
}
ParameterTypeDescription
datastringDataset identifier (prefix with hf:)
data_argsobjectDataset loading arguments
data_column_mapperobjectColumn name mappings
data_samplesnumberMaximum samples to use

Use local data files:

{
"parameters": {
"data": "file:///path/to/prompts.jsonl",
"data_samples": 500
}
}

Supported formats: JSON, JSONL, CSV

ParameterTypeDescriptionExample
processorstringTokeniser for synthetic data"gpt2"
processor_argsarrayProcessor arguments[]
data_num_workersnumberParallel workers for data loading1
ParameterTypeDescriptionOptions
request_typestringAPI endpoint typechat_completions, completions, audio_transcription, audio_translation

Default: chat_completions

ParameterTypeDescriptionOptions
data_request_formatterstringRequest formatchat_completions, completions
data_collatorstringData collation strategygenerative
ParameterTypeDescriptionDefault
outputsarrayOutput formats["json", "csv", "html", "yaml"]
output_dirstringOutput directory/tmp/guidellm_results_*
ParameterTypeDescriptionDefault
random_seednumberRandom seed for reproducibility42
ParameterTypeDescriptionDefault
backendstringBackend typeopenai_http
backend_kwargsobjectAdditional backend argumentsnull

The adapter reads runtime settings from environment variables:

VariableDescriptionRequiredDefault
EVALHUB_MODEExecution modeNok8s
EVALHUB_JOB_SPEC_PATHPath to job spec JSONYes (local mode)/meta/job.json (k8s), meta/job.json (local)
SERVICE_URLEval-hub service URLNonull
REGISTRY_URLOCI registry URLNonull
REGISTRY_USERNAMERegistry usernameNonull
REGISTRY_PASSWORDRegistry passwordNonull
REGISTRY_INSECUREAllow insecure registryNofalse
{
"id": "guidellm-production-001",
"benchmark_id": "performance_sweep",
"model": {
"name": "Qwen/Qwen2.5-1.5B-Instruct",
"url": "http://127.0.0.1:8000/v1"
},
"parameters": {
"profile": "constant",
"rate": 5,
"max_seconds": 60,
"max_requests": 100,
"data": "prompt_tokens=256,output_tokens=128",
"request_type": "chat_completions",
"warmup": "5%",
"detect_saturation": true,
"random_seed": 42
},
"experiment_name": "qwen-load-test",
"tags": {
"framework": "guidellm",
"model_size": "small",
"evaluation_type": "performance"
}
}