Skip to content

LightEval Adapter

The LightEval adapter integrates LightEval with the eval-hub evaluation service using the evalhub-sdk framework adapter pattern.

LightEval is a lightweight evaluation framework for language models that supports multiple model providers and a wide range of benchmarks.

  • Multiple model providers: Transformers, vLLM, OpenAI, Anthropic, custom endpoints
  • Wide range of benchmarks: HellaSwag, ARC, MMLU, TruthfulQA, GSM8K, and many more
  • Few-shot evaluation: Configurable number of few-shot examples
  • Efficient evaluation: Optimised for speed and resource usage
  • transformers: HuggingFace Transformers models
  • vllm: vLLM inference engine
  • openai: OpenAI API
  • anthropic: Anthropic API
  • endpoint: Custom OpenAI-compatible endpoints
  • litellm: LiteLLM proxy

The adapter follows the eval-hub framework adapter pattern:

LightEval adapter architecture

Terminal window
make image-lighteval
Terminal window
# Set environment for local mode
export EVALHUB_MODE=local
export EVALHUB_JOB_SPEC_PATH=meta/job.json
export SERVICE_URL=http://localhost:8080 # Optional
# Run the adapter
python main.py

The adapter supports all LightEval tasks, organised by category:

  • HellaSwag
  • WinoGrande
  • OpenBookQA
  • ARC Easy
  • ARC Challenge
  • PIQA
  • TruthfulQA (multiple choice)
  • TruthfulQA (generation)
  • GSM8K
  • MATH (various subcategories)
  • MMLU
  • TriviaQA
  • GLUE benchmarks