Which embedding model should I pick in 2026?

Start with OpenAI text-embedding-3-small for cost-friendly general-purpose work, or Voyage voyage-3 for the best quality on retrieval. For self-hosted, BGE-M3 or Nomic Embed v2 are the open-source leaders. Always evaluate on your own data — MTEB rankings don't perfectly transfer.

Can I use a smaller embedding dimension?

Yes. Both OpenAI and Voyage support Matryoshka-style truncation — embed at 3072 dim, store at 768 with minimal quality loss. Cuts vector storage and HNSW index size significantly.

Should I switch embedding models in production?

Switching means re-embedding the entire corpus. Plan it as a project: dual-write during migration, A/B compare retrieval quality on real queries, cut over only when the new model is measurably better on your data.

Embedding Models in 2026 — OpenAI, Voyage, Cohere, BGE, and How to Pick

The embedding model is the most consequential choice in any RAG / semantic-search system. Get it wrong and no amount of prompt engineering downstream rescues you. In 2026 the landscape has matured; this post is the working comparison and decision guide.

The contenders

Model	Provider	Dim	Cost / 1M tokens	Open?
text-embedding-3-small	OpenAI	1536 (or trunc)	$0.02	No
text-embedding-3-large	OpenAI	3072 (or trunc)	$0.13	No
voyage-3	Voyage	1024	$0.06	No
voyage-3-large	Voyage	2048	$0.18	No
voyage-code-3	Voyage	1024	$0.18	No
Cohere embed-v4	Cohere	1536	$0.05	No
BGE-M3	BAAI	1024	self-hosted	Yes
Nomic Embed v2	Nomic	768	self-hosted	Yes
jina-embeddings-v3	Jina	1024	self-hosted or API	Partial
Snowflake arctic-embed-l	Snowflake	1024	self-hosted	Yes

Quality (rough MTEB ranking)

On the MTEB benchmark (averages, will shift):

voyage-3-large
voyage-3
text-embedding-3-large
cohere embed-v4
BGE-M3
text-embedding-3-small
Nomic Embed v2

Real-world: differences between the top 5 are small. Differences on your data can flip the order. Always evaluate.

How to pick

1. Start with general purpose

For a typical RAG over English text:

Hosted, cost-aware: text-embedding-3-small. Cheap, good, ubiquitous.
Hosted, quality-first: voyage-3 or voyage-3-large. Top of MTEB, especially for retrieval.
Self-hosted: BGE-M3 (multilingual) or Nomic Embed v2.

2. Domain-specialized when possible

Code: voyage-code-3 (or BGE-Code-V1.5). Trained on code; significantly better than generic models for code search.
Multilingual: BGE-M3 covers 100+ languages. Voyage and Cohere have multilingual variants too.
Legal / medical: Specialized embeddings exist; benchmark vs general purpose. Often the gain is small once you have a good ranker.

3. Match the task

Two RAG sub-tasks:

Symmetric similarity (find similar items). Most general embeddings work.
Asymmetric retrieval (find documents matching a question). Use voyage-3, cohere embed-v4, or models with explicit asymmetric instruction prefixes.

For Build a RAG App , asymmetric is what you want.

Dimensions and Matryoshka

Smaller dimensions = smaller index, faster search, less storage.

# OpenAI: ask for shorter dim directly
client.embeddings.create(
    model="text-embedding-3-large",
    input=texts,
    dimensions=1024,           # Matryoshka — quality drops gracefully
)

Quality loss at:

3072 → 2048: ~1%
3072 → 1024: ~2–3%
3072 → 512: ~5%

For most RAG, 1024 dim is the sweet spot — much smaller index, marginal quality cost. See pgvector Deep Dive for storage math.

Self-hosting embeddings

For high-volume or privacy-critical:

BGE-M3 on a single L4 GPU does ~1k embed/sec. $200/month vs $20k/month at OpenAI prices for a billion-token-per-day pipeline.
Inference servers like vLLM, TEI (Text Embeddings Inference), or sentence-transformers serve them.

Tradeoff: ops, GPU costs at low volume. Below 100M tokens/month, hosted APIs win.

Evaluating on your data

Don’t trust leaderboards. Build your eval:

30 queries representative of real usage.
For each, gold documents that should be retrieved.
Recall@k for each candidate model.

def recall_at_k(model, queries, k=10):
    correct = 0
    for q in queries:
        docs = retrieve(model, q.text, k=k)
        if any(d.id in q.gold_ids for d in docs):
            correct += 1
    return correct / len(queries)

A 5% recall@10 difference on real data is worth more than 0.5% MTEB difference.

For a deeper eval framework see LLM Evaluations .

Migration

Switching embedding models means re-embedding everything. Plan:

Dual-store the new column: ALTER TABLE chunks ADD COLUMN embedding_v2 vector(1024).
Backfill in batches with the new model.
A/B compare retrieval on a held-out query set.
Cut over reads to the new column.
Drop old column after a release cycle of stability.

Don’t try to do this in-place. Keep both for at least one release.

Cost in production

For a SaaS embedding 10M docs/month + 1M queries/month:

OpenAI 3-small: ~$2k/month total.
Voyage 3: ~$5k/month.
BGE-M3 self-hosted on 1× L4: ~$300/month + ops.

For most early-stage products, hosted APIs win on TCO. Self-host when volume justifies it.

Hybrid is mandatory

No matter the embedding model, hybrid retrieval (vector + BM25) outperforms vector-only. See Build a RAG App with pgvector for the RRF fusion pattern.

Common mistakes

1. Picking by leaderboard, not eval

MTEB averages many tasks. Your task may not match the average. Eval.

2. Forgetting to truncate

3072-dim vectors are 12 KB each. For 10M chunks that’s 120 GB. Truncate to 1024 dim ⇒ 40 GB. Often imperceptible quality loss; massive infra savings.

3. Mixing dimensions

embedding vector(1536)        -- table schema
client.embeddings.create(model="text-embedding-3-small")  -- 1536 ✓
client.embeddings.create(model="text-embedding-3-large")  -- 3072 ✗ wrong

Pick a model and stick with it across reads and writes.

4. No periodic re-eval

Embedding models update. Run your eval set when a new version drops. Decide before migrating.

5. Treating embeddings as the whole solution

Embeddings + reranker > embeddings alone. After top-30 from vector search, rerank with Cohere Rerank or BGE-Reranker → top-10. Often a bigger win than chasing better embeddings.

The contenders#

Quality (rough MTEB ranking)#

How to pick#

1. Start with general purpose#

2. Domain-specialized when possible#

3. Match the task#

Dimensions and Matryoshka#

Self-hosting embeddings#

Evaluating on your data#

Migration#

Cost in production#

Hybrid is mandatory#

Common mistakes#

1. Picking by leaderboard, not eval#

2. Forgetting to truncate#

3. Mixing dimensions#

4. No periodic re-eval#

5. Treating embeddings as the whole solution#

Read this next#