Date: November 21, 2025 Dataset: Micropajama Truncated-512 dataset

Executive Summary

This report conducts a comprehensive analysis of eight embedding models originating from five distinct providers. We've tested 160 samples with each exhibiting varying token counts (namely 256 & 512) and batch sizes (either 1 or 10). The ensuing analysis unveils noteworthy disparities in performance amongst provider offerings, specifically with DS1 standing out due to its exceptional throughput and latency performance.

The models tested encompass the following; text-embedding-3-small and text-embedding-3-large from OpenAI, Cohere's embed-v4.0, VoyageAI's voyage-3, voyage-3-large, and voyage-3-lite, Takara's ds1-en-v1 and lastly, AWS' titan-embed-text-v2.0. It’s worth mentioning, the AWS model, now 18 months old, was included in our testing as a comparative "elder" model to serve as a testament to the rapid pace of innovation within this sphere.

Key Findings

DS1 dominates: 0.0284s latency, 17,912.4 tokens/sec throughput (~8x faster than competitors)
Batch scaling challenge: AWS exhibits 898% latency increase; Cohere +103%; DS1 improves by 13.5%
Token count impact: 512 tokens add 25.8% latency compared to 256 tokens
Best alternative: Cohere embed-v4.0 (0.1936s latency, consistent performance)
Worst performer: AWS shows severe batch processing degradation

1. Provider Performance Overview

Diagram embedding_benchmark_analysis_1.png goes here, crop it down to remove latency distribution by provider and the chart above that one too.

Overall the results indicate that DS1 is the clear performance leader with significantly lower latency (0.03-0.04 seconds) and exceptional throughput (17,500+ tokens/second), making it the clear performance winner for embedding tasks. Both embed-v4.0 and voyage-3-lite are fast performing embedding models with moderate latency (0.17-0.19 seconds)

Also shows throughput

2. Performance by Token Count

256 Tokens (80 samples)

Provider	Latency	Throughput
ds1	0.0343s	11,172 tokens/sec
cohere	0.1844s	1,559 tokens/sec
voyageai	0.3780s	1,264 tokens/sec
aws	0.5374s	1,389 tokens/sec
openai	0.5293s	685 tokens/sec

512 Tokens (80 samples)

Provider	Latency	Throughput
ds1	0.0226s	24,653 tokens/sec
cohere	0.2027s	3,040 tokens/sec
voyageai	0.4700s	1,555 tokens/sec
aws	0.6635s	2,270 tokens/sec
openai	0.5549s	3,000 tokens/sec

Key Insight: Moving from 256 to 512 tokens increases latency by 25.8% overall. DS1 actually improves performance with more tokens, while VoyageAI shows the largest degradation.

3. Performance by Batch Size

Batch Size 1 (80 samples)

Average Latency: 0.2018s
Average Throughput: 4,914.2 tokens/sec
Best Provider: DS1 (0.0305s)
Worst Provider: OpenAI (0.4622s)

Batch Size 10 (80 samples)

Average Latency: 0.6307s
Average Throughput: 2,572.6 tokens/sec
Best Provider: DS1 (0.0264s)
Worst Provider: VoyageAI (0.8975s)

Key Finding: Batch size 10 incurs 212.6% higher latency than batch 1 on average.

Batch Scaling Performance by Provider

Provider	Latency Increase
DS1	-13.5% (improves)
OpenAI	+16.2%
Cohere	+103.0%
AWS	+897.6%
VoyageAI	+523.4%

4. Model Rankings

By Latency (Lower is Better)

ds1-en-v1 (DS1) - 0.0284s ⭐
voyage-3-lite (VoyageAI) - 0.1751s
embed-v4.0 (Cohere) - 0.1936s
text-embedding-3-large (OpenAI) - 0.3510s
amazon.titan-embed-text-v2:0 (AWS) - 0.5462s
voyage-3-large (VoyageAI) - 0.6085s
text-embedding-3-small (OpenAI) - 0.6484s
voyage-3 (VoyageAI) - 0.7786s

By Throughput (Higher is Better)

ds1-en-v1 (DS1) - 17,912.4 tokens/sec ⭐
embed-v4.0 (Cohere) - 2,299.6 tokens/sec
voyage-3-lite (VoyageAI) - 2,296.3 tokens/sec
amazon.titan-embed-text-v2:0 (AWS) - 2,194.4 tokens/sec
voyage-3 (VoyageAI) - 1,647.6 tokens/sec
voyage-3-large (VoyageAI) - 1,356.5 tokens/sec
text-embedding-3-large (OpenAI) - 1,227.9 tokens/sec
text-embedding-3-small (OpenAI) - 1,012.2 tokens/sec

5. Efficiency Analysis

Efficiency Score (Throughput/Latency Ratio). This metric identifies providers delivering the best throughput relative to latency.

Performance Ranking

DS1 - Score: 630,781
Cohere - Score: 11,880
AWS - Score: 4,019
VoyageAI - Score: 3,390
OpenAI - Score: 2,242

6. Consistency Analysis

Coefficient of Variation (CV - Lower is Better)

Provider	CV Score	Assessment
Cohere	0.38	Highly consistent
AWS	0.84	Consistent
OpenAI	0.64	Consistent
DS1	1.01	Moderate
VoyageAI	1.34	Variable

The CV has been calculated as (Standard Deviation / Mean)

Key Findings

Cohere delivers the most predictable performance (CV: 0.38)
VoyageAI shows highest variability (CV: 1.34) with occasional severe slowdowns
DS1 maintains acceptable consistency despite highest throughput demands

DS1's higher CV is because it has one outlier value (0.1476s) among mostly very fast times (0.014-0.029s), which creates relative variance, this is likely due to the use of a t2 instance with burst-able performance, when using a c5 instance the score drops to 0.77.

7. Scaling Characteristics

Token Count Scaling (256 → 512 tokens)

Provider	Latency Change	Assessment
DS1	-34.1%	Improves
OpenAI	-11.3%	Improves
Cohere	+9.9%	Slight increase
AWS	+3.2%	Stable
VoyageAI	+75.7%	Significant increase

Batch Scaling (Batch 1 → Batch 10)

Provider	Latency Change	Assessment
DS1	-13.5%	Exceptional
OpenAI	+16.2%	Excellent
Cohere	+103.0%	Acceptable
AWS	+897.6%	Unacceptable
VoyageAI	+523.4%	Poor

8. Benchmark Methodology

Infrastructure & Setup

To ensure a rigorous and fair evaluation, we conducted this benchmark across a standardized infrastructure environment. All API calls were initiated from an EC2 instance (t3.small) located in us-east-1, providing a consistent baseline for network latency and performance measurement across all providers.

Provider Endpoints:

Third-party providers (Bedrock, Cohere, OpenAI, Voyage AI): Public APIs
DS1: SageMaker endpoint (ml.t2.medium CPU-only instance, us-east-1)

We specifically chose us-east-1 based on historical observations showing that most embedding providers deliver optimal performance from US regions. This geographic selection minimizes latency variance caused by regional differences, though we acknowledge this benefits providers with strong US infrastructure.

Dataset & Scope

We evaluated embedding performance using the Micropajama Truncated-512 dataset from Hugging Face. This dataset provides diverse, real-world text samples suitable for comprehensive embedding evaluation.

Importantly: This benchmark focuses exclusively on embedding generation performance—we measured latency and throughput for the embedding operation itself. All generated embeddings were discarded immediately after measurement; no vector stores or storage layers were involved. This means retrieval performance was explicitly out of scope for this analysis. Our results reflect raw embedding API performance only, free from confounding factors like indexing strategy, storage overhead, or retrieval complexity.

Measurement & Repeatability

To account for natural performance variance and transient network fluctuations, each embedding test was executed 5 times and the results aggregated. This smoothing technique provides more reliable performance baselines than single-run measurements. No rate limits were encountered during testing, confirming that our benchmark operated well within each provider's operational parameters.

Caveat: DS1 Public Availability

As DS1 is currently only available as a SageMaker endpoint rather than a public API, this introduces a technical asymmetry in our testing environment. DS1 was tested from a dedicated SageMaker instance, while other providers were tested via their managed APIs. However, our extensive internal testing suggests that equivalent DS1 performance would be observed if a public endpoint were available—the ml.t2.medium instance backend provides a realistic proxy for DS1's performance characteristics. We plan to rerun this benchmark once DS1 has a public managed API available to provide a more architecturally uniform comparison.

9. Statistical Summary

Metric	Value
Total Records	160
Providers Tested	5
Models Tested	8
Date	Nov 21, 2025
Token Counts	256, 512
Batch Sizes	1, 10
Requests Successful	100%

The provided benchmark delivers a pragmatic evaluation of embedding performance under controlled circumstances, encapsulating API latency and throughput without the added intricacy of downstream retrieval or storage systems. The methodology employed underscores consistency and repeatability, whilst recognizing the inherent limitations present when contrasting a SageMaker endpoint with fully developed public APIs.

Conclusion

DS1 clearly outshines others in terms of performance, boasting unparalleled latency (0.0188s), matchless throughput (26,190.5 tokens/sec), and impressive scaling traits. This provider maintains uniform performance across varying token counts and batch sizes, making it an ideal selection for applications where performance is paramount.

Our review has been solely performance-centric; however, two additional integral factors for embedding models – quality and cost – also demand attention. They need to be harmonized with performance to tailor-fit your specific use case. Cost measurement is often straightforward as costs are evaluated in terms of USD per million tokens. Quality, on the other hand, necessitates a distinct methodology for measurement – our aim is to bring you supplementary benchmarks that shed light on these additional elements in the near future.

For enterprises that prioritize real-time embedding generation, DS1 offers a significant edge over its competitors. VoyageAI presents a viable alternative boasting decent performance, whereas AWS Titan should unequivocally be sidestepped for performance-intensive applications.

Executive Summary

Key Findings

1. Provider Performance Overview

2. Performance by Token Count

256 Tokens (80 samples)

512 Tokens (80 samples)

3. Performance by Batch Size

Batch Size 1 (80 samples)

Batch Size 10 (80 samples)

Batch Scaling Performance by Provider

4. Model Rankings

5. Efficiency Analysis

Performance Ranking

6. Consistency Analysis

Key Findings

7. Scaling Characteristics

Token Count Scaling (256 → 512 tokens)

Batch Scaling (Batch 1 → Batch 10)

8. Benchmark Methodology

Infrastructure & Setup

Provider Endpoints:

Dataset & Scope

Measurement & Repeatability

Caveat: DS1 Public Availability

9. Statistical Summary

Conclusion

Related Posts

Stay in the loop

Companyものづくり

AI Services共生

Our productsおもてなし

Resources改善

Connectものづくり

Navigation

Executive Summary

Key Findings

1. Provider Performance Overview

2. Performance by Token Count

256 Tokens (80 samples)

512 Tokens (80 samples)

3. Performance by Batch Size

Batch Size 1 (80 samples)

Batch Size 10 (80 samples)

Batch Scaling Performance by Provider

4. Model Rankings

5. Efficiency Analysis

Performance Ranking

6. Consistency Analysis

Key Findings

7. Scaling Characteristics

Token Count Scaling (256 → 512 tokens)

Batch Scaling (Batch 1 → Batch 10)

8. Benchmark Methodology

Infrastructure & Setup

Provider Endpoints:

Dataset & Scope

Measurement & Repeatability

Caveat: DS1 Public Availability

9. Statistical Summary

Conclusion

Related Posts

Stay in the loop

Companyものづくり

AI Services共生

Our productsおもてなし

Resources改善

Connectものづくり