Newsletter¶
Product updates and occasional deep-dives.
Subscribe¶
- SourceHut list: https://lists.sr.ht/~trvon/newsletter
- RSS (GitHub releases): https://github.com/trvon/yams/releases.atom
What you’ll get¶
- Release notes and breaking changes
- Benchmark updates
- Design notes (search, storage, plugins)
2026-04-25: Simeon Is Now The Default Embedding Backend¶
Post: https://manta.black/posts/2026-04-25-simeon/
YAMS is switching its default retrieval embeddings backend to Simeon — training-free, SIMD/NEON, no model downloads. The short version: 7,000 docs/s encode throughput at the quality-matching tier vs MiniLM-L6’s 322 docs/s on the same CPU. That speed plus byte-identical determinism is why it’s the new default.
BEIR scifact — speed/quality Pareto (M-series CPU, single-threaded):
| Configuration | nDCG@10 | enc docs/s | qps |
|---|---|---|---|
bm25_only |
0.633 | — | 31,800 |
bm25_pool500_linear_alpha075_4096_384 |
0.638 | 7,000 | 5,800 |
bm25_pool500_linear_alpha075_4096_768 |
0.638 | 3,700 | 4,900 |
router_default_4096_768 |
0.654 | 3,600 | 1,500 |
| MiniLM-L6 (384-d float32, CPU) | 0.654 | 322 | 1,829 |
router_default_4096_768 matches MiniLM-L6 on nDCG@10 (0.654) and beats it on MRR@10 (0.626 vs 0.607). The router picks among Bm25Atire, Bm25SabSmooth, and CascadeLinearAlpha per query. For medium/long semantic queries it dispatches to PHSS (Per-Hit Score Smoothing) — fragment geometry scoring that cuts the similarity graph at its largest gap.
PHSS cross-corpus (LargestGapApprox, richcov builder t=8):
| Corpus | BM25 | PHSS | Δ |
|---|---|---|---|
| scifact | 0.6188 | 0.6188 | +0.000 |
| NFCorpus | 0.2521 | 0.2544 | +0.002 |
| FiQA | 0.2053 | 0.2089 | +0.004 |
BM25 parity on scifact, small consistent lifts on the other two — at ~2× the query throughput of the heavy LargestGap variant.
Transfer caveat: scifact parity with MiniLM does not transfer. On FiQA the scifact-tuned router lands at 0.202 vs MiniLM’s 0.359. Simeon is a lexical/topical backend that composes well with BM25; it is not a general learned-embedding replacement.
Fusion note: per-component nDCG instrumentation found cases where vector-only ordering was higher than the fused result — fusion was destroying signal, not the embeddings. Linear-α z-scored fusion is the only strategy that consistently beats BM25 alone on top-10 ranking.
| Requested backend | Observed provider | Dim | p50 | p95 | Result |
|---|---|---|---|---|---|
simeon |
plugin:Simeon / Simeon |
1024 | 11.075 ms | 11.094 ms | pass |
onnxruntime |
plugin:ABIModelProvider / all-MiniLM-L6-v2 |
384 | 10.669 ms | 21.259 ms | pass |
The same benchmark now sweeps 100- and 1,000-document batches (--benchmark_filter=EmbeddingBackendAB_Generate,
3 iterations per row). Current daemon-path throughput:
| Requested backend | Docs/batch | Avg latency | p50 throughput | p95 throughput | Avg throughput |
|---|---|---|---|---|---|
simeon |
100 | 10.887 ms | 9,032 docs/s | 8,803 docs/s | 9,185 docs/s |
onnxruntime |
100 | 230.831 ms | 463 docs/s | 381 docs/s | 433 docs/s |
simeon |
1,000 | 61.368 ms | 30,867 docs/s | 8,363 docs/s | 16,295 docs/s |
onnxruntime |
1,000 | 2,579.397 ms | 386 docs/s | 369 docs/s | 388 docs/s |
Links:
- Simeon repo: https://github.com/trvon/simeon
- Benchmarks and research notes: third_party/simeon/docs/research/