
Date: November 21, 2025 Dataset: Micropajama Truncated-512 dataset
Executive Summary
This report conducts a comprehensive analysis of eight embedding models originating from five distinct providers. We've tested 160 samples with each exhibiting varying token counts (namely 256 & 512) and batch sizes (either 1 or 10). The ensuing analysis unveils noteworthy disparities in performance amongst provider offerings, specifically with DS1 standing out due to its exceptional throughput and latency performance.
The models tested encompass the following; text-embedding-3-small and text-embedding-3-large from OpenAI, Cohere's embed-v4.0, VoyageAI's voyage-3, voyage-3-large, and voyage-3-lite, Takara's ds1-en-v1 and lastly, AWS' titan-embed-text-v2.0. It’s worth mentioning, the AWS model, now 18 months old, was included in our testing as a comparative "elder" model to serve as a testament to the rapid pace of innovation within this sphere.
Key Findings
DS1 dominates: 0.0284s latency, 17,912.4 tokens/sec throughput (~8x faster than competitors)
Batch scaling challenge: AWS exhibits 898% latency increase; Cohere +103%; DS1 improves by 13.5%
Token count impact: 512 tokens add 25.8% latency compared to 256 tokens
Best alternative: Cohere embed-v4.0 (0.1936s latency, consistent performance)
Worst performer: AWS shows severe batch processing degradation
1. Provider Performance Overview
Diagram embedding_benchmark_analysis_1.png goes here, crop it down to remove latency distribution by provider and the chart above that one too.
Overall the results indicate that DS1 is the clear performance leader with significantly lower latency (0.03-0.04 seconds) and exceptional throughput (17,500+ tokens/second), making it the clear performance winner for embedding tasks. Both embed-v4.0 and voyage-3-lite are fast performing embedding models with moderate latency (0.17-0.19 seconds)

2. Performance by Token Count
256 Tokens (80 samples)
Provider | Latency | Throughput |
|---|---|---|
ds1 | 0.0343s | 11,172 tokens/sec |
cohere | 0.1844s | 1,559 tokens/sec |
voyageai | 0.3780s | 1,264 tokens/sec |
aws | 0.5374s | 1,389 tokens/sec |
openai | 0.5293s | 685 tokens/sec |
512 Tokens (80 samples)
Provider | Latency | Throughput |
|---|---|---|
ds1 | 0.0226s | 24,653 tokens/sec |
cohere | 0.2027s | 3,040 tokens/sec |
voyageai | 0.4700s | 1,555 tokens/sec |
aws | 0.6635s | 2,270 tokens/sec |
openai | 0.5549s | 3,000 tokens/sec |
Key Insight: Moving from 256 to 512 tokens increases latency by 25.8% overall. DS1 actually improves performance with more tokens, while VoyageAI shows the largest degradation.
3. Performance by Batch Size
Batch Size 1 (80 samples)
Average Latency: 0.2018s
Average Throughput: 4,914.2 tokens/sec
Best Provider: DS1 (0.0305s)
Worst Provider: OpenAI (0.4622s)
Batch Size 10 (80 samples)
Average Latency: 0.6307s
Average Throughput: 2,572.6 tokens/sec
Best Provider: DS1 (0.0264s)
Worst Provider: VoyageAI (0.8975s)
Key Finding: Batch size 10 incurs 212.6% higher latency than batch 1 on average.
Batch Scaling Performance by Provider
Provider | Latency Increase |
|---|---|
DS1 | -13.5% (improves) |
OpenAI | +16.2% |
Cohere | +103.0% |
AWS | +897.6% |
VoyageAI | +523.4% |
4. Model Rankings
By Latency (Lower is Better)
ds1-en-v1 (DS1) - 0.0284s ⭐
voyage-3-lite (VoyageAI) - 0.1751s
embed-v4.0 (Cohere) - 0.1936s
text-embedding-3-large (OpenAI) - 0.3510s
amazon.titan-embed-text-v2:0 (AWS) - 0.5462s
voyage-3-large (VoyageAI) - 0.6085s
text-embedding-3-small (OpenAI) - 0.6484s
voyage-3 (VoyageAI) - 0.7786s
By Throughput (Higher is Better)
ds1-en-v1 (DS1) - 17,912.4 tokens/sec ⭐
embed-v4.0 (Cohere) - 2,299.6 tokens/sec
voyage-3-lite (VoyageAI) - 2,296.3 tokens/sec
amazon.titan-embed-text-v2:0 (AWS) - 2,194.4 tokens/sec
voyage-3 (VoyageAI) - 1,647.6 tokens/sec
voyage-3-large (VoyageAI) - 1,356.5 tokens/sec
text-embedding-3-large (OpenAI) - 1,227.9 tokens/sec
text-embedding-3-small (OpenAI) - 1,012.2 tokens/sec
5. Efficiency Analysis
Efficiency Score (Throughput/Latency Ratio). This metric identifies providers delivering the best throughput relative to latency.
Performance Ranking
DS1 - Score: 630,781
Cohere - Score: 11,880
AWS - Score: 4,019
VoyageAI - Score: 3,390
OpenAI - Score: 2,242
6. Consistency Analysis
Coefficient of Variation (CV - Lower is Better)
Provider | CV Score | Assessment |
|---|---|---|
Cohere | 0.38 | Highly consistent |
AWS | 0.84 | Consistent |
OpenAI | 0.64 | Consistent |
DS1 | 1.01 | Moderate |
VoyageAI | 1.34 | Variable |
The CV has been calculated as (Standard Deviation / Mean)
Key Findings
Cohere delivers the most predictable performance (CV: 0.38)
VoyageAI shows highest variability (CV: 1.34) with occasional severe slowdowns
DS1 maintains acceptable consistency despite highest throughput demands
DS1's higher CV is because it has one outlier value (0.1476s) among mostly very fast times (0.014-0.029s), which creates relative variance, this is likely due to the use of a t2 instance with burst-able performance, when using a c5 instance the score drops to 0.77.
7. Scaling Characteristics
Token Count Scaling (256 → 512 tokens)
Provider | Latency Change | Assessment |
|---|---|---|
DS1 | -34.1% | Improves |
OpenAI | -11.3% | Improves |
Cohere | +9.9% | Slight increase |
AWS | +3.2% | Stable |
VoyageAI | +75.7% | Significant increase |
Batch Scaling (Batch 1 → Batch 10)
Provider | Latency Change | Assessment |
|---|---|---|
DS1 | -13.5% | Exceptional |
OpenAI | +16.2% | Excellent |
Cohere | +103.0% | Acceptable |
AWS | +897.6% | Unacceptable |
VoyageAI | +523.4% | Poor |
8. Benchmark Methodology
Infrastructure & Setup
To ensure a rigorous and fair evaluation, we conducted this benchmark across a standardized infrastructure environment. All API calls were initiated from an EC2 instance (t3.small) located in us-east-1, providing a consistent baseline for network latency and performance measurement across all providers.
Provider Endpoints:
Third-party providers (Bedrock, Cohere, OpenAI, Voyage AI): Public APIs
DS1: SageMaker endpoint (ml.t2.medium CPU-only instance, us-east-1)
We specifically chose us-east-1 based on historical observations showing that most embedding providers deliver optimal performance from US regions. This geographic selection minimizes latency variance caused by regional differences, though we acknowledge this benefits providers with strong US infrastructure.
Dataset & Scope
We evaluated embedding performance using the Micropajama Truncated-512 dataset from Hugging Face. This dataset provides diverse, real-world text samples suitable for comprehensive embedding evaluation.
Importantly: This benchmark focuses exclusively on embedding generation performance—we measured latency and throughput for the embedding operation itself. All generated embeddings were discarded immediately after measurement; no vector stores or storage layers were involved. This means retrieval performance was explicitly out of scope for this analysis. Our results reflect raw embedding API performance only, free from confounding factors like indexing strategy, storage overhead, or retrieval complexity.
Measurement & Repeatability
To account for natural performance variance and transient network fluctuations, each embedding test was executed 5 times and the results aggregated. This smoothing technique provides more reliable performance baselines than single-run measurements. No rate limits were encountered during testing, confirming that our benchmark operated well within each provider's operational parameters.
Caveat: DS1 Public Availability
As DS1 is currently only available as a SageMaker endpoint rather than a public API, this introduces a technical asymmetry in our testing environment. DS1 was tested from a dedicated SageMaker instance, while other providers were tested via their managed APIs. However, our extensive internal testing suggests that equivalent DS1 performance would be observed if a public endpoint were available—the ml.t2.medium instance backend provides a realistic proxy for DS1's performance characteristics. We plan to rerun this benchmark once DS1 has a public managed API available to provide a more architecturally uniform comparison.
9. Statistical Summary
Metric | Value |
|---|---|
Total Records | 160 |
Providers Tested | 5 |
Models Tested | 8 |
Date | Nov 21, 2025 |
Token Counts | 256, 512 |
Batch Sizes | 1, 10 |
Requests Successful | 100% |
The provided benchmark delivers a pragmatic evaluation of embedding performance under controlled circumstances, encapsulating API latency and throughput without the added intricacy of downstream retrieval or storage systems. The methodology employed underscores consistency and repeatability, whilst recognizing the inherent limitations present when contrasting a SageMaker endpoint with fully developed public APIs.
Conclusion
DS1 clearly outshines others in terms of performance, boasting unparalleled latency (0.0188s), matchless throughput (26,190.5 tokens/sec), and impressive scaling traits. This provider maintains uniform performance across varying token counts and batch sizes, making it an ideal selection for applications where performance is paramount.
Our review has been solely performance-centric; however, two additional integral factors for embedding models – quality and cost – also demand attention. They need to be harmonized with performance to tailor-fit your specific use case. Cost measurement is often straightforward as costs are evaluated in terms of USD per million tokens. Quality, on the other hand, necessitates a distinct methodology for measurement – our aim is to bring you supplementary benchmarks that shed light on these additional elements in the near future.
For enterprises that prioritize real-time embedding generation, DS1 offers a significant edge over its competitors. VoyageAI presents a viable alternative boasting decent performance, whereas AWS Titan should unequivocally be sidestepped for performance-intensive applications.
