LightEval Adapter
The LightEval adapter integrates LightEval with the eval-hub evaluation service using the evalhub-sdk framework adapter pattern.
Overview
Section titled “Overview”LightEval is a lightweight evaluation framework for language models that supports multiple model providers and a wide range of benchmarks.
Key Features
Section titled “Key Features”- Multiple model providers: Transformers, vLLM, OpenAI, Anthropic, custom endpoints
- Wide range of benchmarks: HellaSwag, ARC, MMLU, TruthfulQA, GSM8K, and many more
- Few-shot evaluation: Configurable number of few-shot examples
- Efficient evaluation: Optimised for speed and resource usage
Supported Providers
Section titled “Supported Providers”- transformers: HuggingFace Transformers models
- vllm: vLLM inference engine
- openai: OpenAI API
- anthropic: Anthropic API
- endpoint: Custom OpenAI-compatible endpoints
- litellm: LiteLLM proxy
Architecture
Section titled “Architecture”The adapter follows the eval-hub framework adapter pattern:
Quick Start
Section titled “Quick Start”Building the Container
Section titled “Building the Container”make image-lightevalRunning Locally
Section titled “Running Locally”# Set environment for local modeexport EVALHUB_MODE=localexport EVALHUB_JOB_SPEC_PATH=meta/job.jsonexport SERVICE_URL=http://localhost:8080 # Optional
# Run the adapterpython main.pySupported Benchmarks
Section titled “Supported Benchmarks”The adapter supports all LightEval tasks, organised by category:
Commonsense Reasoning
Section titled “Commonsense Reasoning”- HellaSwag
- WinoGrande
- OpenBookQA
Scientific Reasoning
Section titled “Scientific Reasoning”- ARC Easy
- ARC Challenge
Physical Commonsense
Section titled “Physical Commonsense”- PIQA
Truthfulness
Section titled “Truthfulness”- TruthfulQA (multiple choice)
- TruthfulQA (generation)
Mathematics
Section titled “Mathematics”- GSM8K
- MATH (various subcategories)
Knowledge
Section titled “Knowledge”- MMLU
- TriviaQA
Language Understanding
Section titled “Language Understanding”- GLUE benchmarks