Embeddings

Embeddings & FTS5: Auto Mode and Repair

Overview - YAMS can automatically generate embeddings and index document content on add. - Embeddings are stored in the vector database; text content is indexed in SQLite FTS5. - Both paths are best‑effort and non‑blocking to keep ingestion fast.

Auto Embeddings on Add - Toggle via config: set in ~/.config/yams/config.toml. - Section [embeddings] keys: - auto_on_add = "true|false" — when true, new adds queue background embedding. - preferred_model and model_path optionally control model selection/loading. - Behavior: - yams add and daemon directory indexing call embeddings asynchronously when enabled. - Extraction + FTS5 indexing still run immediately during add (for supported formats).

FTS5 Indexing - When available in the SQLite build, YAMS indexes extracted text into documents_fts. - DocumentService indexes FTS5 during add; directory add uses the same storage path. - Searching uses hybrid keyword (FTS5) + semantic (vector) ranking when configured.

Repair and Rebuild - CLI supports targeted repair flows: - yams repair --embeddings — generates missing embeddings for stored documents. - yams repair --fts5 — rebuilds FTS5 entries (delete/insert) using robust extraction. - Daemon RepairCoordinator also performs a best‑effort FTS5 reindex for documents it fixes embeddings for.

Notes - Embedding and FTS5 operations degrade gracefully: failures are logged and skipped. - Batch sizes and retries are conservative to avoid blocking foreground operations.

Embedding Dimension Source of Truth - Single key: set embeddings.embedding_dim in ~/.config/yams/config.toml. - Runtime precedence: config > env (YAMS_EMBED_DIM) > generator > heuristic. - The daemon derives vector DB schema and in-memory index dimensions from this single value to prevent drift.