LightEval Benchmarks
Complete list of supported benchmarks.
Coming Soon
Section titled “Coming Soon”Detailed benchmark documentation is in progress.
Benchmark Categories
Section titled “Benchmark Categories”- Commonsense Reasoning: HellaSwag, WinoGrande, OpenBookQA
- Scientific Reasoning: ARC Easy, ARC Challenge
- Physical Commonsense: PIQA
- Truthfulness: TruthfulQA
- Mathematics: GSM8K, MATH
- Knowledge: MMLU, TriviaQA
- Language Understanding: GLUE benchmarks
For complete documentation, see the LightEval README.