Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

MTEB & MMTEB: The Massive Text Embedding Benchmark

05 Sep 2025

Description

These academic papers introduce and detail the Massive Multilingual Text Embedding Benchmark (MMTEB), a comprehensive evaluation framework for text embedding models. The MMTEB expands upon existing benchmarks by offering over 500 tasks across 250+ languages and various domains, significantly increasing the diversity and scale of evaluation. It incorporates optimizations like downsampling and caching to reduce computational costs, making the benchmark more accessible, especially for low-resource languages. The papers also evaluate various models, including large language models (LLMs) and smaller, multilingual models, revealing that instruction-tuned models often perform better, and smaller models can surprisingly outperform larger LLMs in highly multilingual or low-resource settings. Ultimately, the MMTEB aims to provide a robust and extensive platform for assessing and advancing text embedding capabilities across a wide spectrum of linguistic and thematic challenges.Sources:June 2025: MMTEB: MASSIVE MULTILINGUAL TEXT EMBEDDING BENCHMARKhttps://arxiv.org/pdf/2502.13595March 2023: MTEB: Massive Text Embedding Benchmarkhttps://arxiv.org/pdf/2210.07316https://github.com/embeddings-benchmark/mteb

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.