Browse community-contributed evaluation providers from eval-hub-contrib . Each provider ships a provider.yaml that defines benchmarks, runtime settings, and container image references.
The catalog is synced from the contrib repository at site build time . Use the filters below to find a provider, then download either the raw provider.yaml or a ready-to-apply Kubernetes ConfigMap.
Search
Clear filters
Showing 2 of 2 providers
IBM CLEAR ibm-clear
IBM CLEAR evaluation framework for error analysis and reporting on agent traces
3 benchmarks error-analysis
MTEB mteb
Massive Text Embedding Benchmark - comprehensive evaluation for text embedding models
5 benchmarks classification clustering reranking
Select a provider to view details and download configuration.
IBM CLEAR evaluation framework for error analysis and reporting on agent traces
Runtime quay.io/evalhub/community-ibm-clear:latest
Benchmarks (3) ID Name Category agentic-evaluation Agentic Evaluation error-analysis agentic-evaluation-custom-criteria Agentic evaluation (custom criteria) error-analysis agentic-evaluation-predefined-issues Agentic evaluation (predefined issues) error-analysis
ConfigMap options Download ConfigMap
Preview ConfigMap YAML apiVersion: v1
kind: ConfigMap
metadata:
name: evalhub-provider-ibm-clear
namespace: opendatahub
labels:
trustyai.opendatahub.io/evalhub-provider-type: system
trustyai.opendatahub.io/evalhub-provider-name: ibm-clear
data:
ibm-clear.yaml: |
# IBM CLEAR — provider definition (same shape as eval-hub config/providers/*.yaml).
# Mount into eval-hub or sync via your operator’s provider sync process.
#
# Discovery: eval-hub may surface `parameters` in UI. The adapter validates required
# inputs early in run_benchmark_job (`_validate_config`). Benchmark-specific keys are
# enforced in `_validate_benchmark_contract` (see benchmarks below).
# MLflow: set MLFLOW_TRACKING_URI and job experiment_name (or parameters.mlflow_experiment_name).
id: ibm-clear
name: IBM CLEAR
description: IBM CLEAR evaluation framework for error analysis and reporting on agent traces
type: builtin
runtime:
k8s:
image: quay.io/evalhub/community-ibm-clear:latest
image_pull_policy: Always
entrypoint:
- python
- main.py
cpu_request: 100m
memory_request: 128Mi
cpu_limit: 500m
memory_limit: 1Gi
local: null
benchmarks:
- id: agentic-evaluation
name: Agentic Evaluation
description: >
Default agentic CLEAR run: clusters recurring failure patterns across agents using the
standard agent-mode judge criteria (correctness, completeness, clarity, etc.). Use this
when you want IBM CLEAR’s baseline “what went wrong and how often.”
category: error-analysis
metrics:
- total_interactions
- total_issues
- interactions_with_issues
- interactions_no_issues
- total_agents
- pct_interactions_with_issues
- issues_per_interaction
- average_score
tags:
- erroranalysis
- ibmclear
- id: agentic-evaluation-custom-criteria
name: Agentic evaluation (custom criteria)
description: >
Caller supplies parameters.evaluation_criteria (criterion name -> description). Required for this benchmark id.
category: error-analysis
metrics:
- total_interactions
- total_issues
- interactions_with_issues
- interactions_no_issues
- total_agents
- pct_interactions_with_issues
- issues_per_interaction
- average_score
tags:
- ibmclear
- custom-rubric
- id: agentic-evaluation-predefined-issues
name: Agentic evaluation (predefined issues)
description: >
Caller supplies parameters.predefined_issues (list of issue strings). CLEAR skips automatic issue discovery.
category: error-analysis
metrics:
- total_interactions
- total_issues
- interactions_with_issues
- interactions_no_issues
- total_agents
- pct_interactions_with_issues
- issues_per_interaction
- average_score
tags:
- ibmclear
- predefined-issues
parameters:
- name: data_dir
type: string
description: >
Directory of trace JSON files for CLEAR (preferred). Required unless Eval Hub mounts
traces under /test_data or /data in the job pod, or you set traces_input_dir instead.
- name: traces_input_dir
type: string
description: >
Alternate name for the trace directory (same semantics as data_dir).
- name: eval_model_name
type: string
description: >
Required. CLEAR judge model id (e.g. openai/your-model) passed through to the agentic pipeline.
- name: provider
type: string
description: >
Required. LiteLLM/OpenAI-style provider name for the judge (e.g. openai, azure).
- name: inference_backend
type: string
default: litellm
description: >
CLEAR inference backend. Use "endpoint" with model.url or parameters.endpoint_url (or inference_url)
for a fixed HTTP endpoint; default litellm uses the usual CLEAR LiteLLM path.
- name: endpoint_url
type: string
description: >
When inference_backend is "endpoint", HTTP base URL for the judge (inference_url is accepted as an alias).
- name: inference_url
type: string
description: >
Same role as endpoint_url when inference_backend is "endpoint".
- name: evaluation_criteria
type: object
description: >
Required when benchmark_id is agentic-evaluation-custom-criteria: map of criterion name -> description.
- name: predefined_issues
type: array
description: >
Required when benchmark_id is agentic-evaluation-predefined-issues: non-empty list of issue strings.
- name: mlflow_experiment_name
type: string
description: >
Optional fallback experiment name for MLflow artifact upload if JobSpec.experiment_name is unset.
- name: clear_dashboard_theme
type: string
default: red_hat
description: >
Dashboard theme for the generated static CLEAR HTML report.
Default (omit / "red_hat" / "redhat"): apply the branded theme.
Opt out ("clear", "default", "original", "ibm", "none", "false", "0", "off"): keep CLEAR's stock HTML.
Massive Text Embedding Benchmark - comprehensive evaluation for text embedding models
Runtime quay.io/evalhub/community-mteb:latest
Benchmarks (5) ID Name Category mteb_sts Semantic Textual Similarity Suite semantic_similarity mteb_retrieval Retrieval Suite retrieval mteb_classification Classification Suite classification mteb_clustering Clustering Suite clustering mteb_reranking Reranking Suite reranking
ConfigMap options Download ConfigMap
Preview ConfigMap YAML apiVersion: v1
kind: ConfigMap
metadata:
name: evalhub-provider-mteb
namespace: opendatahub
labels:
trustyai.opendatahub.io/evalhub-provider-type: system
trustyai.opendatahub.io/evalhub-provider-name: mteb
data:
mteb.yaml: |
# MTEB Provider Configuration
# Massive Text Embedding Benchmark - comprehensive evaluation for text embedding models
id: mteb
name: MTEB
description: Massive Text Embedding Benchmark - comprehensive evaluation for text embedding models
type: builtin
runtime:
k8s:
image: quay.io/evalhub/community-mteb:latest
entrypoint:
- python
- main.py
cpu_request: 100m
memory_request: 128Mi
cpu_limit: 1000m
memory_limit: 2Gi
local:
# Reserved for local runtime configuration
benchmarks:
# Preset benchmark suites
- id: mteb_sts
name: Semantic Textual Similarity Suite
description: STS benchmark suite covering STS12-17, STSBenchmark, and SICK-R
category: semantic_similarity
metrics:
- main_score
- cosine_spearman
- cosine_pearson
tags:
- embedding
- sts
- similarity
- mteb
- suite
- id: mteb_retrieval
name: Retrieval Suite
description: Information retrieval benchmark suite with NFCorpus, SciFact, ArguAna, TRECCOVID, and Touche2020
category: retrieval
metrics:
- main_score
- ndcg_at_10
- map_at_10
tags:
- embedding
- retrieval
- search
- mteb
- suite
- id: mteb_classification
name: Classification Suite
description: Text classification benchmark suite with AmazonReviewsClassification, Banking77Classification, and EmotionClassification
category: classification
metrics:
- main_score
- accuracy
tags:
- embedding
- classification
- mteb
- suite
- id: mteb_clustering
name: Clustering Suite
description: Document clustering benchmark suite with ArxivClusteringP2P, ArxivClusteringS2S, and BiorxivClusteringP2P
category: clustering
metrics:
- main_score
- v_measure
tags:
- embedding
- clustering
- mteb
- suite
- id: mteb_reranking
name: Reranking Suite
description: Passage reranking benchmark suite with AskUbuntuDupQuestions, MindSmallReranking, and SciDocsRR
category: reranking
metrics:
- main_score
- map
tags:
- embedding
- reranking
- mteb
- suite
parameters:
# Common parameters for all benchmarks
- name: batch_size
type: integer
default: 32
description: Batch size for encoding
- name: device
type: string
default: null
description: Device override (cuda, cpu, mps, cuda:0)
- name: languages
type: array
default:
- eng
description: Language codes to include (ISO 639-3)
- name: verbosity
type: integer
default: 2
description: MTEB verbosity level (0-3)
- name: co2_tracker
type: boolean
default: false
description: Enable CO2 emissions tracking
- name: tasks
type: array
default: null
description: Explicit list of MTEB task names (overrides benchmark preset)
- name: task_types
type: array
default: null
description: Filter by task type (STS, Retrieval, Classification, etc.)
After downloading a ConfigMap YAML, apply it to your cluster:
oc apply -f evalhub-provider-<id>.yaml
For full OpenShift deployment steps, see OpenShift Setup .
The EvalHub operator discovers providers via ConfigMap labels:
trustyai.opendatahub.io/evalhub-provider-type
trustyai.opendatahub.io/evalhub-provider-name
You can customize namespace and label values in the detail panel before downloading.