Contact us

Navigation

DS 1 - Embedding Benchmark Report共生
DS 1 - Embedding Benchmark Report
Nov 25, 2025

Date: November 21, 2025 Dataset: Micropajama Truncated-512 dataset

Executive Summary

This report conducts a comprehensive analysis of eight embedding models originating from five distinct providers. We've tested 160 samples with each exhibiting varying token counts (namely 256 & 512) and batch sizes (either 1 or 10). The ensuing analysis unveils noteworthy disparities in performance amongst provider offerings, specifically with DS1 standing out due to its exceptional throughput and latency performance.

The models tested encompass the following; text-embedding-3-small and text-embedding-3-large from OpenAI, Cohere's embed-v4.0, VoyageAI's voyage-3, voyage-3-large, and voyage-3-lite, Takara's ds1-en-v1 and lastly, AWS' titan-embed-text-v2.0. It’s worth mentioning, the AWS model, now 18 months old, was included in our testing as a comparative "elder" model to serve as a testament to the rapid pace of innovation within this sphere.

Key Findings

  • DS1 dominates: 0.0284s latency, 17,912.4 tokens/sec throughput (~8x faster than competitors)

  • Batch scaling challenge: AWS exhibits 898% latency increase; Cohere +103%; DS1 improves by 13.5%

  • Token count impact: 512 tokens add 25.8% latency compared to 256 tokens

  • Best alternative: Cohere embed-v4.0 (0.1936s latency, consistent performance)

  • Worst performer: AWS shows severe batch processing degradation


1. Provider Performance Overview

Diagram embedding_benchmark_analysis_1.png goes here, crop it down to remove latency distribution by provider and the chart above that one too.

Overall the results indicate that DS1 is the clear performance leader with significantly lower latency (0.03-0.04 seconds) and exceptional throughput (17,500+ tokens/second), making it the clear performance winner for embedding tasks. Both embed-v4.0 and voyage-3-lite are fast performing embedding models with moderate latency (0.17-0.19 seconds)

Also shows throughput
Also shows throughput

2. Performance by Token Count

256 Tokens (80 samples)

Provider

Latency

Throughput

ds1

0.0343s

11,172 tokens/sec

cohere

0.1844s

1,559 tokens/sec

voyageai

0.3780s

1,264 tokens/sec

aws

0.5374s

1,389 tokens/sec

openai

0.5293s

685 tokens/sec

512 Tokens (80 samples)

Provider

Latency

Throughput

ds1

0.0226s

24,653 tokens/sec

cohere

0.2027s

3,040 tokens/sec

voyageai

0.4700s

1,555 tokens/sec

aws

0.6635s

2,270 tokens/sec

openai

0.5549s

3,000 tokens/sec

Key Insight: Moving from 256 to 512 tokens increases latency by 25.8% overall. DS1 actually improves performance with more tokens, while VoyageAI shows the largest degradation.


3. Performance by Batch Size

Batch Size 1 (80 samples)

  • Average Latency: 0.2018s

  • Average Throughput: 4,914.2 tokens/sec

  • Best Provider: DS1 (0.0305s)

  • Worst Provider: OpenAI (0.4622s)

Batch Size 10 (80 samples)

  • Average Latency: 0.6307s

  • Average Throughput: 2,572.6 tokens/sec

  • Best Provider: DS1 (0.0264s)

  • Worst Provider: VoyageAI (0.8975s)

Key Finding: Batch size 10 incurs 212.6% higher latency than batch 1 on average.

Batch Scaling Performance by Provider

Provider

Latency Increase

DS1

-13.5% (improves)

OpenAI

+16.2%

Cohere

+103.0%

AWS

+897.6%

VoyageAI

+523.4%


4. Model Rankings

By Latency (Lower is Better)

  1. ds1-en-v1 (DS1) - 0.0284s ⭐

  2. voyage-3-lite (VoyageAI) - 0.1751s

  3. embed-v4.0 (Cohere) - 0.1936s

  4. text-embedding-3-large (OpenAI) - 0.3510s

  5. amazon.titan-embed-text-v2:0 (AWS) - 0.5462s

  6. voyage-3-large (VoyageAI) - 0.6085s

  7. text-embedding-3-small (OpenAI) - 0.6484s

  8. voyage-3 (VoyageAI) - 0.7786s

By Throughput (Higher is Better)

  1. ds1-en-v1 (DS1) - 17,912.4 tokens/sec ⭐

  2. embed-v4.0 (Cohere) - 2,299.6 tokens/sec

  3. voyage-3-lite (VoyageAI) - 2,296.3 tokens/sec

  4. amazon.titan-embed-text-v2:0 (AWS) - 2,194.4 tokens/sec

  5. voyage-3 (VoyageAI) - 1,647.6 tokens/sec

  6. voyage-3-large (VoyageAI) - 1,356.5 tokens/sec

  7. text-embedding-3-large (OpenAI) - 1,227.9 tokens/sec

  8. text-embedding-3-small (OpenAI) - 1,012.2 tokens/sec

5. Efficiency Analysis

Efficiency Score (Throughput/Latency Ratio). This metric identifies providers delivering the best throughput relative to latency.

Performance Ranking

  1. DS1 - Score: 630,781

  2. Cohere - Score: 11,880

  3. AWS - Score: 4,019

  4. VoyageAI - Score: 3,390

  5. OpenAI - Score: 2,242


6. Consistency Analysis

Coefficient of Variation (CV - Lower is Better)

Provider

CV Score

Assessment

Cohere

0.38

Highly consistent

AWS

0.84

Consistent

OpenAI

0.64

Consistent

DS1

1.01

Moderate

VoyageAI

1.34

Variable

The CV has been calculated as (Standard Deviation / Mean)

Key Findings

  • Cohere delivers the most predictable performance (CV: 0.38)

  • VoyageAI shows highest variability (CV: 1.34) with occasional severe slowdowns

  • DS1 maintains acceptable consistency despite highest throughput demands

DS1's higher CV is because it has one outlier value (0.1476s) among mostly very fast times (0.014-0.029s), which creates relative variance, this is likely due to the use of a t2 instance with burst-able performance, when using a c5 instance the score drops to 0.77.


7. Scaling Characteristics

Token Count Scaling (256 → 512 tokens)

Provider

Latency Change

Assessment

DS1

-34.1%

Improves

OpenAI

-11.3%

Improves

Cohere

+9.9%

Slight increase

AWS

+3.2%

Stable

VoyageAI

+75.7%

Significant increase

Batch Scaling (Batch 1 → Batch 10)

Provider

Latency Change

Assessment

DS1

-13.5%

Exceptional

OpenAI

+16.2%

Excellent

Cohere

+103.0%

Acceptable

AWS

+897.6%

Unacceptable

VoyageAI

+523.4%

Poor


8. Benchmark Methodology

Infrastructure & Setup

To ensure a rigorous and fair evaluation, we conducted this benchmark across a standardized infrastructure environment. All API calls were initiated from an EC2 instance (t3.small) located in us-east-1, providing a consistent baseline for network latency and performance measurement across all providers.

Provider Endpoints:

  • Third-party providers (Bedrock, Cohere, OpenAI, Voyage AI): Public APIs

  • DS1: SageMaker endpoint (ml.t2.medium CPU-only instance, us-east-1)

We specifically chose us-east-1 based on historical observations showing that most embedding providers deliver optimal performance from US regions. This geographic selection minimizes latency variance caused by regional differences, though we acknowledge this benefits providers with strong US infrastructure.

Dataset & Scope

We evaluated embedding performance using the Micropajama Truncated-512 dataset from Hugging Face. This dataset provides diverse, real-world text samples suitable for comprehensive embedding evaluation.

Importantly: This benchmark focuses exclusively on embedding generation performance—we measured latency and throughput for the embedding operation itself. All generated embeddings were discarded immediately after measurement; no vector stores or storage layers were involved. This means retrieval performance was explicitly out of scope for this analysis. Our results reflect raw embedding API performance only, free from confounding factors like indexing strategy, storage overhead, or retrieval complexity.

Measurement & Repeatability

To account for natural performance variance and transient network fluctuations, each embedding test was executed 5 times and the results aggregated. This smoothing technique provides more reliable performance baselines than single-run measurements. No rate limits were encountered during testing, confirming that our benchmark operated well within each provider's operational parameters.

Caveat: DS1 Public Availability

As DS1 is currently only available as a SageMaker endpoint rather than a public API, this introduces a technical asymmetry in our testing environment. DS1 was tested from a dedicated SageMaker instance, while other providers were tested via their managed APIs. However, our extensive internal testing suggests that equivalent DS1 performance would be observed if a public endpoint were available—the ml.t2.medium instance backend provides a realistic proxy for DS1's performance characteristics. We plan to rerun this benchmark once DS1 has a public managed API available to provide a more architecturally uniform comparison.

9. Statistical Summary

Metric

Value

Total Records

160

Providers Tested

5

Models Tested

8

Date

Nov 21, 2025

Token Counts

256, 512

Batch Sizes

1, 10

Requests Successful

100%

The provided benchmark delivers a pragmatic evaluation of embedding performance under controlled circumstances, encapsulating API latency and throughput without the added intricacy of downstream retrieval or storage systems. The methodology employed underscores consistency and repeatability, whilst recognizing the inherent limitations present when contrasting a SageMaker endpoint with fully developed public APIs.

Conclusion

DS1 clearly outshines others in terms of performance, boasting unparalleled latency (0.0188s), matchless throughput (26,190.5 tokens/sec), and impressive scaling traits. This provider maintains uniform performance across varying token counts and batch sizes, making it an ideal selection for applications where performance is paramount.

Our review has been solely performance-centric; however, two additional integral factors for embedding models – quality and cost – also demand attention. They need to be harmonized with performance to tailor-fit your specific use case. Cost measurement is often straightforward as costs are evaluated in terms of USD per million tokens. Quality, on the other hand, necessitates a distinct methodology for measurement – our aim is to bring you supplementary benchmarks that shed light on these additional elements in the near future.

For enterprises that prioritize real-time embedding generation, DS1 offers a significant edge over its competitors. VoyageAI presents a viable alternative boasting decent performance, whereas AWS Titan should unequivocally be sidestepped for performance-intensive applications.

Nov 25, 2025

Related Posts

Stay in the loop

Subscribe for the latest news & updates.

AI Services共生

Our productsおもてなし