Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

TL;DR

IBM has launched two new multilingual embedding models, Granite Embedding Multilingual R2, under Apache 2.0. These models support over 200 languages, handle longer contexts, and outperform previous open models in retrieval benchmarks. They aim to improve multilingual and code retrieval tasks for enterprise and developer use.

IBM has announced the release of two open-source multilingual embedding models, Granite Embedding Multilingual R2, under the Apache 2.0 license, designed to support over 200 languages with enhanced retrieval capabilities.

The models include a 97 million-parameter compact version and a 311 million-parameter full-size version, both built on the ModernBERT architecture. They support 200+ languages, with explicit training for 52 languages, and can handle context lengths up to 32,768 tokens, a 64-fold increase over previous versions.

Both models are optimized for enterprise deployment, compatible with frameworks like sentence-transformers, LangChain, LlamaIndex, and others, requiring minimal integration effort. They include ONNX and OpenVINO weights for CPU inference, making them suitable for large-scale, real-time applications.

Why It Matters

This release marks a significant advance in open multilingual embedding models, offering high retrieval quality across numerous languages at a smaller size, which benefits cross-lingual search, retrieval-augmented generation, and code retrieval in international and diverse teams. It narrows the gap between performance and efficiency, enabling broader adoption in enterprise AI solutions.

Amazon

multilingual embedding models

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Previous models like XLM-RoBERTa had limitations with smaller context windows and less support for many languages. The R2 models are a ground-up rebuild, leveraging recent transformer innovations and more curated training data, including programming code, to improve multilingual retrieval performance. This follows ongoing trends toward more capable, open models for enterprise AI applications.

“The Granite Embedding Multilingual R2 models set new benchmarks for open multilingual retrieval, supporting over 200 languages with enterprise-ready performance.”

— IBM AI Research

“Our models are designed to be easily integrated into existing frameworks, requiring no task-specific instructions and supporting long document contexts.”

— IBM Data Science Team

Deep Learning at Scale: At the Intersection of Hardware, Software, and Data

Deep Learning at Scale: At the Intersection of Hardware, Software, and Data

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how these models perform in real-world enterprise deployments across diverse use cases or how they compare to proprietary models in production environments. Further benchmarks and user feedback are pending.

Transformer-Based LLMs for Long-Context Natural Language Processing: Unleash the True Potential of Your Data (LLMs for Beginners to Experts)

Transformer-Based LLMs for Long-Context Natural Language Processing: Unleash the True Potential of Your Data (LLMs for Beginners to Experts)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include broader adoption by developers and enterprises, integration into AI pipelines, and ongoing benchmarking. IBM may also release updated versions or extensions supporting more languages and tasks.

No-Code AI Tools Explained: Building Intelligent Systems Without Programming (No-Code, Automation & Tools Book 1)

No-Code AI Tools Explained: Building Intelligent Systems Without Programming (No-Code, Automation & Tools Book 1)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

What are the main differences between the R1 and R2 models?

The R2 models are built on ModernBERT with a larger context window, improved architecture, and better training data, resulting in higher retrieval scores and support for longer texts.

Can these models be used for code retrieval?

Yes, both models support cross-lingual code retrieval across nine programming languages, making them suitable for software development and related tasks.

Are these models suitable for enterprise deployment?

Yes, they are designed for enterprise use, with compatibility for popular frameworks, CPU-optimized inference options, and compliance with governance standards.

How many languages do these models support?

They support over 200 languages, with explicit training and retrieval optimization for 52 languages.

What is the licensing for these models?

Both models are released under the Apache 2.0 license, allowing open use and modification.

You May Also Like

Two EA-18 fighter jets collide at Mountain Home airshow, pilots ejected safely

Two U.S. Navy EA-18G Growler jets collided during an air show at Mountain Home AFB; all four crew members ejected safely. Investigation ongoing.

Apple Silicon costs more than OpenRouter

Recent analysis shows Apple Silicon costs more per token for local AI inference than OpenRouter, raising questions about cost-efficiency for on-device models.

AI data centers require 36 times more fiber than designs with standard servers — severe glass shortages push cable lead times out to a full year

AI data centers require significantly more fiber optic cabling, with estimates showing 36 times the amount used in standard server setups, driven by surging demand.

Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets

Google announced updates including Googlebook laptops, vibe-coded widgets, Android Auto enhancements, and Gemini AI integrations at its Android Show.