Changelog Archive: v0.8.x Series¶
[v0.8.3] - Unreleased¶
Added¶
- Search/grep CLI guidance for related results and improved graph suggestions in search.
- Graph query efficiency improvements to support expanded search suggestions.
Fixed¶
- Release-please version pinned back to the current release.
- Reranker timeout handling now uses a retry guard to prevent transient failures.
- MCP large-graph responses are more reliable, with improved relational graph migrations and repair-service health/auto-repair logic.
- Repair now fixes graph data issues and improves response limit handling in MCP.
- IPC retry logic hardened for better resilience under transient failures.
- MCP download retrievability contract:
downloadno longer returns a non-retrievable hash when post-index ingest fails. On ingest failure, response now setsindexed=falseand clearshashto prevent follow-upget/catcalls from hammering daemon CAS with repeated “File not found” lookups. - Stuck-doc recovery enqueue: Fixed repair stuck-document recovery enqueueing
PostIngestTasks to an unused InternalEventBus channel. Recovery now enqueues topost_ingest(consumed by PostInestQueue), so re-extraction actually runs. -
Doctor FTS5 reindex SQLite TOOBIG: Added best-effort truncation retry for oversized extracted text and report
truncated=<n>in output. -
Embedding plugin output-shape compatibility: ONNX provider output handling now accepts provider-padded batch dimensions (
output_B >= requested_B) and uses the requested prefix rows, preventing false shape-mismatch failures under MIGraphX/accelerated providers. - Embedding dim hint precedence: ONNX plugin no longer lets stale config hints override graph-inferred embedding dimensions when model metadata is available, reducing startup/runtime mismatches.
- GPU escape hatch for stability triage: Added ONNX env controls to force CPU execution (
YAMS_ONNX_FORCE_CPU=1/YAMS_ONNX_DISABLE_GPU=1) for benchmark/repro stability when GPU EPs are unstable. - Cross-process MIGraphX execution guard: ONNX MIGraphX now uses a process-level device lock (default: enabled) so multiple daemon processes do not concurrently execute on the same GPU and trigger ROCm/HSA instability. Configure with
YAMS_MIGRAPHX_PROCESS_LOCK(0to disable) and optionalYAMS_MIGRAPHX_LOCK_PATH. - Gradient limiter clamp safety: Prevented invalid limiter bounds (
minLimit > maxLimit) in post-ingest stage limiter setup when a stage cap resolves to unset/zero, fixing debug-build aborts fromstd::clampassertions during title stage completion. - Snapshot propagation consistency across add paths:
yams addnow propagates one effective snapshot ID per invocation across file, stdin, and directory ingest paths (daemon + local fallback), and preserves snapshot label/id when daemon directory ingest is translated to indexing requests. -
Non-directory snapshot persistence parity: File/stdin document stores now upsert first-class
tree_snapshotsrecords (with label/collection metadata) so snapshot timelines no longer depend only on per-document tags. -
ONNX dynamic tensor padding: Pads input tensors to the aligned actual token length instead of the model’s
max_seq_len(512). Short texts now skip hundreds of zero-padded positions, yielding ~70-85x throughput improvement for short inputs (93 texts/sec vs 1.6 texts/sec on CPU withnomic-embed-text-v1.5). Enabled by default; disable withYAMS_ONNX_DYNAMIC_PADDING=0. Sequence lengths are 8-byte aligned to balance SIMD efficiency with padding reduction. - ROCm / MIGraphX compiled-model caching: Enable save/load of compiled MIGraphX artifacts to avoid paying multi-minute
compile_programcosts when sessions are recreated (e.g., after scale-down/eviction). Defaults to caching hashed*.mxrartifacts under the model directory. Configure withYAMS_MIGRAPHX_COMPILED_PATH(directory; if a.mxrpath is provided, its parent directory is used),YAMS_MIGRAPHX_SAVE_COMPILED, andYAMS_MIGRAPHX_LOAD_COMPILED. - ONNX batch warmup and auto-tuning:
warmupModel()runs a 4-text batch embedding immediately after model load, pre-warming the ONNX Runtime session and measuring throughput. WhenYAMS_EMBED_DOC_CAPis unset, an auto-tuning sweep (batch sizes 4→8→16→32→64) selects the batch size with peak texts/sec as the default cap viaTuneAdvisor::setEmbedDocCap(). Disable auto-tuning withYAMS_EMBED_AUTOTUNE=0. - ONNX session reuse: Pool-managed sessions now show 25x latency reduction on warm cycles (259ms cold → 10ms warm), confirming session creation cost is amortized after the first inference.
- Hardware-adaptive pool sizing:
TuneAdvisor::onnxSessionsPerModel()dynamically sizes the session pool based on available CPU cores and GPU state, preventing over-subscription on constrained hardware.
Diagnostics¶
- ONNX ROCm diagnostic output: GPU diagnostic now reports cold vs warm embedding timing to distinguish first-run compilation from steady-state inference.
Changed¶
- Balanced/base daemon scaling defaults: Updated no-env defaults for multi-client stability and fairness:
YAMS_IO_THREADSdefault2 -> 6,YAMS_CONN_SLOTS_MINdefault64 -> 256,YAMS_SERVER_MAX_INFLIGHTdefault2048 -> 256,YAMS_SERVER_QUEUE_BYTES_CAPdefault256MiB -> 128MiB,YAMS_IPC_TIMEOUT_MSdefault5000 -> 15000,YAMS_MAX_IDLE_TIMEOUTSdefault3 -> 12, and server writer budget fallback now defaults to8MiBper turn. - MCP graph tool consolidation: Unified KG ingest under
graphviaaction="ingest"and removed separatekg_ingesttool exposure to keep the MCP tool surface smaller and cleaner. -
Benchmark reporting terminology: Normalized benchmark docs to use neutral retrieval result wording (instead of subjective “significant” labels), and explicitly tagged the current baseline as the SciFact benchmark result.
-
Post-ingest embedding fan-out (phased rollout):
- Phase 1: Added centralized embedding selection policy (strategy + mode + caps/boosts) via
ConfigResolverwith config/env precedence. - Phase 2: Moved embed preparation upstream so post-ingest can queue prepared embed payloads; embedding service consumes prepared-doc fast-path.
- Phase 3: Added extraction utility that returns both extracted text and optional content bytes, enabling downstream stage reuse and reducing duplicate content-store reads.
-
Phase 4: Added selection strategy toggle (
rankedvsintro_headings) and queue observability counters for prepared-doc/chunk vs hash-only embed dispatch. -
Embedding selection defaults codified: default strategy
ranked, modebudgeted,max_chunks_per_doc=8,max_chars_per_doc=24000,heading_boost=1.25,intro_boost=0.75. - Embedding chunking defaults codified: default chunk strategy
paragraph; default config favors sentence boundary flexibility in embedding pipeline (preserve_sentences=false,use_token_count=falseunless overridden).
ONNX Embedding Benchmarks (CPU-only, macOS M3 Max, nomic-embed-text-v1.5 768-dim)¶
| Benchmark | Metric | Result |
|---|---|---|
| Batch size sweep (64 texts) | Peak throughput | 55.9 texts/sec @ batch=16 |
| Dynamic padding ON vs OFF (short texts) | Speedup | ~70-85x (111 vs 1.5 texts/sec) |
| Session reuse (5 cycles) | Cold vs warm | 259ms → 10ms (25.9x) |
| Concurrent inference (4 threads) | Throughput | 83.7 texts/sec |
[v0.8.2] - February 2, 2026¶
Fixed¶
- MCP stdio: Improved OpenCode compatibility during handshake/tool discovery (initialize capabilities schema, strict JSON-RPC batch responses, more robust NDJSON parsing).
- MCP tools: Fixed
mcp.echotoolinputSchemashape so tool schemas validate correctly. - FTS5 orphan detection: Fixed bug where orphan scan used synthetic hashes (e.g.,
orphan_id_12345) that never matched actual documents. Orphans were detected but never removed because the removal query used non-existent hashes. Now passes document rowids directly viaFts5Job.idsand callsremoveFromIndex(docId)which deletes by FTS5 rowid.
Added¶
- MCP grep tag filtering:
greptool now acceptstagsandmatch_all_tagsparameters, aligning MCP grep with CLIyams grep --tagscapabilities. - Blackboard search tools: Three new OpenCode blackboard tools for discovering content:
bb_search_tasks— semantic search for tasks (mirrorsbb_search_findings).bb_search— unified cross-entity semantic search returning both findings and tasks.bb_grep— regex/pattern search across all blackboard content with optional entity filtering.
Performance¶
- Stdin routed through daemon:
yams add -(stdin) now sends content directly to the daemon via inline content support instead of falling back to local service initialization (~40s startup penalty eliminated). Blackboard plugin also drops unnecessary--syncflag for further latency reduction. - Adaptive sync polling:
--syncextraction polling now uses exponential backoff (5ms → 100ms) instead of fixed 100ms intervals. Small documents that extract via fast-track are detected on the first poll (~5ms) instead of waiting a full 100ms cycle. - Unified async add pipeline with parallel batching:
yams addwith multiple files now processes up to 4 files concurrently viaaddBatch()instead of sequentially. Single sharedDaemonClientis reused across all add operations (CLI and MCP), eliminating per-file client construction overhead. addViaDaemonAsynccoroutine: New async entry point for all add operations. Replaces promise/future-per-attempt pattern with directco_await, reducing overhead. MCPhandleStoreDocument,handleAddDirectory, and download post-index all route through this single path.- Batch FTS5 orphan removal: New
removeFromIndexByHashBatch()wraps all per-hash SELECT+DELETE operations in a single transaction with cached prepared statements. Replaces N individual autocommit transactions with 1 transaction for N hashes. Eliminates prolonged DB lock contention during orphan scans (~29k orphans previously caused ~58k individual transactions), which blocked CLI requests (yams stats,yams list) and caused timeouts/segfaults. yams updateskip CLI-side name resolution for daemon path:yams update --nameno longer resolves names to hashes CLI-side before sending to the daemon. Eliminates ~40sensureStorageInitialized()penalty and 1-4 redundant daemon round-trips. Name resolution now only occurs in the local fallback path.
[v0.8.1] - January 31, 2026¶
Added¶
yams list --metadata-valuesfor showing unique metadata values with counts (useful for PBI discovery).- Post-ingest file/directory tracking: New metrics for tracking files and directories added/processed through the ingestion pipeline (
filesAdded,directoriesAdded,filesProcessed,directoriesProcessed). - OpenCode Blackboard Plugin (
external/opencode-blackboard/): Multi-agent blackboard architecture plugin for OpenCode using YAMS as shared memory. Enables agent-to-agent communication through structured findings, task coordination, and context grouping. Requires YAMS v0.8.1+. - Per-stage queue depth exposure: Real-time queue depth metrics for KG, symbol, entity, and title extraction stages accessible via daemon status.
- Progress bars in CLI: Visual progress bars for queue utilization, worker pool, memory pressure, and pipeline stages in
yams daemon statusandyams statuscommands. - Unified status UI:
yams statusdaemon-connected display now uses consistent section headers, row rendering, and status indicators matchingyams daemon status. - Unique PBI selection guidance in AGENTS workflow (metadata search + list values).
- Data-dir single-instance enforcement: Prevents multiple daemons from sharing the same data directory via flock-based
.yams-lockfile. Newer daemon requests shutdown of existing daemon and takes over, enabling seamless upgrades/restarts.
Performance¶
- Reduced status tick interval: Governor metrics now update every 50ms (was 250ms) for more responsive CLI status output.
- Batch snapshot info API: New
batchGetSnapshotInfo()method eliminates N+1 query pattern inyams list --snapshots. Reduces 3N queries to 1 query for N unknown snapshots. - Enumeration query caching:
getSnapshots(),getSnapshotLabels(),getCollections(), andgetAllTags()now use a 60-second TTL cache with signal-based invalidation. Repeated calls return cached results, reducing database scans. - CPU-aware throttling: ResourceGovernor now monitors CPU usage alongside memory pressure. Admission control rejects new work when CPU exceeds threshold (default 70%, configurable via
YAMS_CPU_HIGH_PCT). Prevents CPU saturation during large batch adds.
Fixed¶
- Post-ingest tuning reconciles per-stage concurrency targets to the overall budget.
- Post-ingest stage throttling now respects pause states and stage availability when computing TuneAdvisor budgets.
- Post-ingest pollers back off when a stage is paused or has a zero concurrency cap to avoid runaway CPU.
- Added a post-ingest stage snapshot log (active/paused/limits) at startup for easier tuning verification.
- Grep integration tests create the ingest directory before daemon startup to avoid missing queue paths.
- Post-ingest jobs reuse content bytes for KG/symbol/entity stages to avoid repeated content loads.
- Post-ingest KG stage no longer triggers duplicate symbol extraction when the symbol pipeline is active.
- External entity extraction reuses a single base64 payload per document across batches.
- CLI snippet formatting is now shared between search and grep for consistent output.
yams listnow uses the shared snippet formatter for previews.yams grephonors--extfilters, accepts--cwdwith an optional path, and treats**/*patterns as matching direct children.yams list --metadata-valuesnow aggregates counts in the database and respects list filters, avoiding large client-side scans.- Added metadata aggregation indexes to speed up key/value count queries.
Documentation¶
- Updated YAMS skill guide with unique PBI discovery and tagged search examples.
[v0.8.0] - January 24, 2026¶
Breaking¶
- Vector database migration required: sqlite-vec-cpp HNSW rewrite invalidates existing vector indices. After upgrading, run:
Without this, search will fall back to FTS5-only (no semantic search).
yams doctor repair --embeddings # Regenerate all embeddings - sqlite-vec-cpp submodule: HNSW API changes and third-party library removal (soft deletion, multi-threading, fp16 quantization, incremental persistence, pre-filtering).
Added¶
- MCP
graphtool for knowledge graph queries (parity with CLIyams graph). - Graph: snapshot-scoped version nodes,
containsedges for file→symbol,--dead-code-report. - Graph prune policy (
daemon.graph_prune) to keep latest snapshot versions. - Download CLI: progress streaming (human/json) via DownloadService callbacks.
- Symbol-aware search ranking: definitions rank higher than usages (
YAMS_SYMBOL_WEIGHT). - Zig language support: functions, structs, enums, unions, fields, imports, calls.
- ColBERT MaxSim reranking when the preferred model is a ColBERT variant.
- Added support for the mxbai-edge-colbert-v0-17m model (embedding + MaxSim reranking, max-pooled and L2-normalized embeddings).
- Vector DB auto-rebuild on embedding dimension mismatch (
daemon.auto_rebuild_on_dim_mismatch). - Init now prompts for a tuning profile (efficient/balanced/aggressive) and writes
tuning.profile. - Search config supports a dedicated reranker model (
search.reranker_model) with CLI helpers (yams config search reranker). - WEIGHTED_MAX fusion strategy: Takes maximum weighted score per document instead of sum. Prevents “hub” documents from dominating via multi-component consensus boost. Used by SCIENTIFIC tuning profile for benchmark corpora.
Performance¶
IPC & Daemon¶
- IPC latency reduced from ~8-28ms to ~2-5ms (connection pooling, async timers, cached config).
- Daemon startup throttling: PathTreeRepair via RepairCoordinator, Fts5Job startup delay (2s), reduced batch sizes (1000→100).
Ingestion & Storage¶
- Post-ingest throughput: dedicated worker pool, adaptive backoff, batched directory ingests.
- In-memory chunking for
storeBytes()- avoids temp file I/O for large documents.
Database & Metadata¶
- KGWriteQueue: Batched, serialized writes to KnowledgeGraphStore via async writer coroutine. Eliminates “database is locked” errors during high-throughput ingestion by queueing KG operations (nodes, edges, aliases, doc entities) and committing in batches. Both symbol extraction and NL entity extraction now use deferred batching with nodeKey→nodeId resolution at commit time.
- Prepared statement caching for SQLite queries - reduces SQL compilation overhead on repeated operations. Cached methods:
setMetadata,setMetadataBatch,getMetadata,getAllMetadata,getContent,getDocument,getDocumentByHash,updateDocument,deleteDocument,insertContent. setMetadataBatch()API for bulk metadata updates - 4x faster than individual calls.
Search & Retrieval¶
- Batch vector/KG lookups, flat_map for cache-friendly access, branch hints, memory pre-allocation.
- Concept boost post-processing now caps scan count and uses SIMD-accelerated matching with CPU feature auto-detect (fallback to scalar), reducing latency for large result sets.
Throughput Benchmarks (Debug, macOS M3 Max)¶
| Benchmark | Oct 2025 | Jan 2026 | Change |
|---|---|---|---|
| Ingestion_SmallDocument | 2,771 ops/s | 2,821 ops/s | ~same |
| Ingestion_MediumDocument | 56 ops/s | 57 ops/s | ~same |
| Ingestion_E2E (100 docs) | - | 9.2 docs/s | new (KGWriteQueue) |
| Metadata_SingleUpdate | 10,537 ops/s | 17,794 ops/s | +69% |
| Metadata_BulkUpdate(500) | 7,823 ops/s | 50,473 ops/s | +6.5x |
| IPC_StreamingFramer | - | 3,732 ops/s | new |
| IPC_UnaryFramer | - | 10,088 ops/s | new |
Experimental¶
- libSQL backend: Default database backend with concurrent write support via MVCC.
Enables up to 4x write throughput during heavy indexing. Configure with meson option
database-backend(choices:libsql[default],sqlite).
Installation: If Rust toolchain is available, libsql builds automatically from source via the meson subproject. Otherwise falls back to SQLite.
# Ensure Rust is installed (for automatic build)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Or disable libsql to use standard SQLite
meson configure -Ddatabase-backend=sqlite
Documentation¶
- Embedding model recommendations: Added model comparison table to README. 384-dim models
(e.g.,
all-MiniLM-L6-v2) recommended for best speed/quality tradeoff.
Changed¶
- Reranking: Score-based reranking is now the default. Uses geometric mean of text and
vector scores to boost documents with multi-component consensus. No external model needed.
Cross-encoder model reranking is opt-in via
enableModelRerankingconfig option. - Tuning profile multipliers updated: efficient 0.5x, balanced 1.0x, aggressive 1.5x.
Fixed¶
- FTS5 natural language queries: OR fallback now correctly triggers when AND query returns zero results. Previously, long queries like scientific abstracts would fail because the AND query returned nothing and the OR fallback condition was never met.
- ONNX multi-threading on Linux/macOS: Removed forced single-threaded execution that was
only needed for Windows. Non-Windows platforms now use
intra_op_threads=4by default, improving inference speed for 768-dim and larger models by 2-4x. - Hybrid search fusion: fallback to non-empty
filePathwhen vector results have empty paths (hash→path lookup failures no longer cause result mismatches). - TSAN race in
daemon_search(): passDaemonSearchOptionsby value to avoid stack reference escaping to coroutine thread. - TSAN race in
handle_streaming_request(): checkconnection_closing_beforesocket.is_open()to avoid race withhandle_connectionclosing the socket. - Compression stats now persist across daemon restarts (
Storage Logical BytesvsCAS Unique Raw Bytesnow show correct values). - CLI rejects ambiguous subcommands (e.g.,
yams search graph→ use--query). --paths-onlysearch now returns results correctly.yams watchwaits for daemon ready; always ignores.gitcontents.- Expanded prune patterns for build artifacts and language caches.
- Fixed ONNX model loading deadlock on Windows (single-flight pattern, recursive mutex).
- Streaming: 30s chunk timeout, backpressure stops producer on queue overflow.
yams addreturns immediately; hash computed async during ingestion.- Replaced experimental Boost.Asio APIs with stable
async_initiate(fixes TSAN races). - File history now records snapshot metadata for single-file adds, not just directory snapshots.