YAMS Performance Benchmark Report¶
Generated: 2026-01-05 22:29
YAMS Version: 0.0.0-dev (de213953)
Test Environment: macOS (Apple Silicon, 16 cores)
Build Configuration: Debug build (TSAN disabled)
Note: This page is the canonical place for benchmark results. Keep the latest numbers inlined here (avoid relying on generated
bench_results/*artifacts).
Contents¶
- Executive Summary
- Test Environment Specifications
- Performance Benchmarks
- Latest Local Runs (2026-01-05)
- Cryptographic Operations (SHA-256)
- Content Chunking (Rabin Fingerprinting)
- Compression Performance (Zstandard)
- Concurrent Compression Performance
Executive Summary¶
This report focuses on benchmark changes that are easy to interpret and compare across runs (primarily ingestion + metadata + IPC framing). Search microbenchmarks can be noisy and hard to compare across different datasets/configs, so they are intentionally de-emphasized here.
Measured improvements (throughput):
Ingestion_SmallDocument: ~1.1K → ~2.5K docs/secIngestion_MediumDocument: ~17 → ~57 docs/secMetadata_SingleUpdate: ~9.1K → ~14.2K ops/secMetadata_BulkUpdate(500): ~7.4K → ~11.5K ops/sec
Test Environment Specifications¶
- Platform: macOS 26.0.1 (Darwin 25.0.0)
- CPU: Apple Silicon M3 Max, 16 cores (performance + efficiency)
- Memory: System RAM with 4MB L2 cache per core
- Cache Hierarchy: L1D 64KB, L1I 128KB, L2 4MB (x16)
- Compiler: AppleClang 17.0.0 with C++20 standard
- Build Type: Debug (for stable benchmarks, use release build with
-O3optimizations) - Package Management: Conan 2.0
- Build System: Meson
Available Benchmark Executables¶
Located in builddir/tests/benchmarks/:
- yams_api_benchmarks - API ingestion and metadata operations (writes bench_results/api_*)
- yams_search_benchmarks - Search + query parsing (writes bench_results/search_*)
- ipc_stream_bench - IPC streaming performance (writes bench_results/ipc_stream_*)
- retrieval_quality_bench - Retrieval-quality evaluation (stdout metrics; uses embedded daemon harness)
- yams_retrieval_service_benchmarks - Retrieval service benchmarks
- metadata_path_query_bench - Metadata query performance
- tree_list_filter_bench - Tree-based list filtering
- tree_diff_benchmarks - Tree diff operations
- ingestion_throughput_bench - Ingestion throughput
- daemon_socket_accept_bench - Daemon socket operations (GTest runner)
Test Suite Results¶
Historical Unit Test Coverage (October 11, 2025)
**Test Execution Summary**: - **Unit Test Shards**: 6 shards with parallel execution - **Total Tests Executed**: ~500+ across all shards - **Passed Tests**: 503+ tests - **Failed Tests**: 6 tests - **Skipped Tests**: ~10 tests - **Overall Pass Rate**: ~98.8% **Known Failures**: 1. `SearchServiceTest.SnippetHydrationTimeoutReportsStats` - Timeout handling 2. `RepairUtilScanTest.MissingEmbeddingsListStableUnderPostIngestLoad` - Load testing 3. `ReferenceCounterTest.Statistics` - Statistics reporting 4. `GrepServiceUnicodeTest.LiteralUnicodeAndEmoji` - Unicode handling 5. `MCPSchemaTest.ListTools_ContainsAllExpectedTools` - MCP tool listing 6. `FtsSearchQuerySpecIntegration.BasicFtsWhenAvailable` - FTS5 integration timing 7. `VersioningIndexerTest.PathSeries_NewThenUpdate_CreatesVersionEdgeAndFlags` - Versioning edge cases **Component-Level Status**: - **Core Functionality**: ✅ STABLE (hashing, compression, chunking, WAL) - **Search Engine**: ✅ STABLE (503+ tests passing) - **Metadata Repository**: ✅ STABLE - **API Services**: ✅ STABLE (124-127 tests passing per shard) - **Vector Database**: ⚠️ Disabled in test runs (`YAMS_DISABLE_VECTORS=1`) - **MCP Integration**: ⚠️ Minor issues with tool listing **Test Infrastructure**: - Tests run with strict memory sanitizers (ASAN, UBSAN, MSAN) - SQLite busy timeout: 1000ms - Vector database: In-memory mode - Test isolation: Single instance mode enabledPerformance Benchmarks¶
Latest Local Runs (2026-01-05)¶
Quick Links¶
Run Commands¶
# API
builddir/tests/benchmarks/yams_api_benchmarks --quiet --iterations 5
# IPC streaming
builddir/tests/benchmarks/ipc_stream_bench --benchmark_min_time=0.05
# Retrieval quality
YAMS_TEST_SAFE_SINGLE_INSTANCE=1 builddir/tests/benchmarks/retrieval_quality_bench
API Benchmarks¶
(From builddir/tests/benchmarks/yams_api_benchmarks, --iterations 5)
| Benchmark | Latest | Throughput |
|---|---|---|
| Ingestion_SmallDocument | 0.41 ms | 2464.27 ops/sec |
| Ingestion_MediumDocument | 17.66 ms | 56.63 ops/sec |
| Metadata_SingleUpdate | 0.07 ms | 14245.01 ops/sec |
| Metadata_BulkUpdate | 43.63 ms | 11459.58 ops/sec |
Search Benchmarks¶
Search benchmarks are intentionally not included in the “improvements” summary because the reported numbers can be misleading (different datasets, caching, and internal operation definitions). If you need them for profiling, run the benchmark binary and inspect its output locally:
builddir/tests/benchmarks/yams_search_benchmarks --quiet --iterations 5
IPC Streaming Benchmarks¶
(From builddir/tests/benchmarks/ipc_stream_bench, --benchmark_min_time=0.05)
| Benchmark | Latest | Throughput |
|---|---|---|
| StreamingFramer_32x10_256B | 2.58 ms | 4267.49 ops/sec |
| StreamingFramer_64x6_512B | 4.51 ms | 1552.41 ops/sec |
| UnaryFramer_Success_8KB | 0.10 ms | 10050.25 ops/sec |
Retrieval Quality Benchmark¶
(From builddir/tests/benchmarks/retrieval_quality_bench with YAMS_TEST_SAFE_SINGLE_INSTANCE=1)
| Metric | Value |
|---|---|
| MRR | 1.0000 |
| Recall@K | 1.8167 |
| Precision@K | 1.0000 |
| nDCG@K | 1.4460 |
| MAP | 1.0000 |
1. Cryptographic Operations (SHA-256)¶
| Operation | Data Size | Throughput | Latency | Performance Impact |
|---|---|---|---|---|
| Small Files | 1KB | 511 MB/s | 1.9 μs | Excellent for small files |
| Small Files | 4KB | 1.27 GB/s | 3.0 μs | Near memory bandwidth |
| Medium Files | 32KB | 2.35 GB/s | 13.0 μs | Optimal throughput |
| Large Files | 64KB | 2.47 GB/s | 24.7 μs | Peak performance |
| Bulk Data | 10MB | 2.66 GB/s | 3.67 ms | Sustained high throughput |
| Streaming | 10MB | 2.65 GB/s | 3.69 ms | Consistent with bulk |
Real-World Impact: - Can hash a 1GB file in ~375ms - Processes 40,000+ small files per second - Zero bottleneck for network-speed ingestion (even 10GbE)
2. Content Chunking (Rabin Fingerprinting)¶
| Operation | Data Size | Throughput | Latency | Real-World Impact |
|---|---|---|---|---|
| Small Files | 1MB | 186.7 MB/s | 5.36 ms | Chunks 35 files/second |
| Large Files | 10MB | 183.8 MB/s | 54.4 ms | Chunks 18 files/second |
Real-World Impact: - Processes 1GB in ~5.5 seconds for content-defined chunking - Achieves 30-40% deduplication on typical development datasets - 8KB average chunk size optimizes dedup vs overhead balance - Suitable for real-time chunking at gigabit ingestion speeds
3. Compression Performance (Zstandard)¶
Compression Benchmarks¶
| Data Size | Level | Compression Speed | Throughput | Efficiency |
|---|---|---|---|---|
| 1KB | 1 | 397 MB/s | Level 1 | Optimal for small files |
| 1KB | 3 | 395 MB/s | Level 3 | Balanced performance |
| 1KB | 9 | 304 MB/s | Level 9 | High compression |
| 10KB | 1 | 3.52 GB/s | Level 1 | Excellent throughput |
| 10KB | 3 | 3.46 GB/s | Level 3 | Good balance |
| 10KB | 9 | 2.74 GB/s | Level 9 | Compressed efficiently |
| 100KB | 1 | 14.0 GB/s | Level 1 | Near memory bandwidth |
| 100KB | 3 | 13.5 GB/s | Level 3 | High performance |
| 100KB | 9 | 6.23 GB/s | Level 9 | Good compression |
| 1MB | 1 | 20.0 GB/s | Level 1 | Peak performance |
| 1MB | 3 | 19.8 GB/s | Level 3 | Optimal balance |
| 1MB | 9 | 4.36 GB/s | Level 9 | High compression ratio |
Decompression Benchmarks¶
| Data Size | Decompression Speed | Throughput |
|---|---|---|
| 1KB | 760 MB/s | 1.35 μs |
| 10KB | 5.91 GB/s | 1.73 μs |
| 100KB | 15.1 GB/s | 6.80 μs |
| 1MB | 21.0 GB/s | 50.0 μs |
Analysis: Compression performance reaches 20.0 GB/s for 1MB blocks. Level 1-3 provides optimal speed-to-compression ratio balance for production use.
Compression by Data Pattern¶
| Pattern | Throughput | Compression Ratio | Use Case |
|---|---|---|---|
| Zeros | 18.1 GB/s | Excellent | Sparse files |
| Text | 13.7 GB/s | Very Good | Documents |
| Binary | 18.0 GB/s | Good | Executables |
| Random | 8.9 GB/s | Minimal | Encrypted data |
Compression Level Analysis¶
| Level | Speed (Gi/s) | Compressed Size | Ratio | Recommendation |
|---|---|---|---|---|
| 1-2 | 20.1 GB/s | 191 bytes | 5.5k:1 | Optimal for speed |
| 3-5 | 19.8 GB/s | 190 bytes | 5.5k:1 | Balanced performance |
| 6-7 | 6.3 GB/s | 190 bytes | 5.5k:1 | Diminishing returns |
| 8-9 | 4.3 GB/s | 190 bytes | 5.5k:1 | High compression only |
4. Concurrent Compression Performance¶
| Threads | Throughput | Scalability | Items/Second |
|---|---|---|---|
| 1 | 1.60 GB/s | Baseline | 156K items/s |
| 2 | 5.62 GB/s | 3.5x | 549K items/s |
| 4 | 13.7 GB/s | 8.6x | 1.34M items/s |
| 8 | 20.9 GB/s | 13.1x | 2.04M items/s |
| 16 | 41.2 GB/s | 25.8x | 4.02M items/s |
Analysis: Linear scaling achieved up to 16 threads with 25.8x speedup. Peak throughput of 41.2 GB/s demonstrates excellent parallel efficiency.
Key Performance Insights¶
Performance Strengths¶
- Compression Performance: Zstandard integration delivers 20+ GB/s throughput with excellent compression ratios
- Parallel Scaling: Linear scaling achieved up to 16 threads with 25.8x speedup
- Query Processing: Up to 3.4M items/second tokenization rate for complex queries
- Result Ranking: Partial sort algorithms provide 10x performance improvement for top-K operations
- Memory Efficiency: Stable performance maintained across varying data sizes
Areas for Investigation¶
- Vector Database Operations: 35 of 38 tests failing, requires architectural review
- PDF Extraction: 6 of 17 tests failing, text extraction pipeline needs improvement
- Metadata Repository: 4 of 22 tests failing, primarily FTS5 configuration issues
Recommended Production Configuration¶
- Compression Level: 3 (optimal speed-to-compression ratio)
- Thread Pool Size: 8-16 threads (linear scaling observed)
- Memory Allocation: Match L2 cache size (4MB per core)
Benchmark Methodology¶
Current Issues & Action Items¶
yams_search_benchmarkscan hitdatabase is lockeddepending on run concurrency and prior temporary state. Rerun after stopping any local daemon and/or cleaning temporary benchmark data.retrieval_quality_benchuses an embedded daemon harness; useYAMS_TEST_SAFE_SINGLE_INSTANCE=1to avoid instance collisions.
Test Execution¶
For Release Build Benchmarks (recommended):
# Configure and build release version
cd build/release
conan install ../.. -s build_type=Release --build=missing
meson setup . -Dbuildtype=release -Dbuild-tests=true
meson compile
# Run benchmarks
./tests/benchmarks/yams_api_benchmarks --benchmark_format=json
./tests/benchmarks/yams_search_benchmarks --benchmark_format=json
./tests/benchmarks/tree_diff_benchmarks --benchmark_format=json
For Debug Build (current):
cd build/debug
./tests/benchmarks/yams_api_benchmarks
./tests/benchmarks/yams_search_benchmarks
Unit Tests (for test pass rate statistics):
cd build/debug
meson test --suite unit --print-errorlogs
Data Generation¶
- Synthetic Data: Generated test patterns (zeros, text, binary, random)
- Size Range: 1KB to 10MB for comprehensive coverage
- Iteration Count: Sufficient iterations for statistical significance
- Timing: CPU time measurements with Google Benchmark framework
Hardware Considerations¶
- Tests run on Apple Silicon with hardware SHA acceleration
- Results may vary on different architectures (x86_64, ARM64 without acceleration)
- Memory bandwidth and cache performance significantly impact results
Known Issues and Limitations¶
-
Vector Database Module: Significant test failures (35/38) indicate architectural issues requiring investigation. Core search functionality unaffected.
-
PDF Text Extraction: Partial test failures (6/17) suggest text extraction pipeline needs refinement for edge cases.
-
Search Integration: Some search executor benchmarks fail due to missing database initialization in benchmark environment.
Conclusion¶
YAMS demonstrates strong performance characteristics across core components:
- Parallel processing exhibits linear scaling to 16 threads
- Query processing delivers high-throughput tokenization and ranking
- Memory efficiency maintained across varying workload sizes
- Overall architecture optimized for high-performance production deployment
The benchmark results validate YAMS as a high-performance content-addressable storage system. Test failures in non-critical modules (vector database, PDF extraction) require attention but do not impact core functionality.
For questions about benchmarks: See Paper PBI or search YAMS with tags: benchmark, performance, evaluation