Skip to content

Execution Profiles

GuideLLM supports multiple load patterns for different testing scenarios.

Automatically explore different request rates to find safe operating ranges.

Use case: Discovery - find optimal request rates for your deployment

Configuration:

{
"parameters": {
"profile": "sweep",
"max_seconds": 30,
"detect_saturation": true
}
}

Behaviour: Incrementally increases request rate until saturation or limits are reached.


Maximum capacity testing to identify performance limits.

Use case: Stress testing - find the breaking point

Configuration:

{
"parameters": {
"profile": "throughput",
"max_seconds": 60,
"max_requests": 1000
}
}

Behaviour: Sends requests as fast as possible to saturate the server.


Simulate parallel users with fixed concurrency level.

Use case: User simulation - test with realistic concurrent load

Configuration:

{
"parameters": {
"profile": "concurrent",
"rate": 10,
"max_requests": 100
}
}

Behaviour: Maintains exactly N concurrent requests at all times.


Fixed requests per second for steady-state testing.

Use case: Baseline measurement - consistent, predictable load

Configuration:

{
"parameters": {
"profile": "constant",
"rate": 5,
"max_seconds": 10,
"max_requests": 20
}
}

Behaviour: Sends requests at a fixed rate (e.g., 5 req/s).


Randomised request rates following Poisson distribution.

Use case: Realistic simulation - natural traffic patterns

Configuration:

{
"parameters": {
"profile": "poisson",
"rate": 5,
"max_seconds": 30
}
}

Behaviour: Random intervals averaging to the specified rate.


Sequential requests for baseline measurements.

Use case: Single-user testing - minimum latency baseline

Configuration:

{
"parameters": {
"profile": "synchronous",
"max_requests": 50
}
}

Behaviour: Waits for each request to complete before sending the next.

ScenarioRecommended ProfileWhy
First-time testingsweepAutomatically finds safe operating range
Load testingconstantPredictable, repeatable results
Capacity planningthroughputFind maximum capacity
User simulationconcurrentRealistic concurrent load
Production-like trafficpoissonNatural traffic patterns
Baseline latencysynchronousMinimum possible latency

All profiles support these common parameters:

ParameterDescriptionDefault
max_secondsMaximum duration in secondsNone
max_requestsMaximum number of requestsNone
max_errorsError threshold before stoppingNone
warmupWarmup period to exclude (% or absolute)None
cooldownCooldown period to exclude (% or absolute)None

Fast test with minimal samples:

{
"parameters": {
"profile": "constant",
"rate": 5,
"max_seconds": 10,
"max_requests": 20,
"warmup": "0"
}
}

Realistic production simulation:

{
"parameters": {
"profile": "poisson",
"rate": 50,
"max_seconds": 300,
"warmup": "5%",
"detect_saturation": true
}
}

Find maximum throughput:

{
"parameters": {
"profile": "throughput",
"max_seconds": 60,
"max_requests": 5000,
"warmup": "10%"
}
}