YAMS Monitoring (Operations)¶
Monitor YAMS in production with the checks below.
TL;DR¶
- Service health:
- systemd:
systemctl is-active yams && echo OK || echo FAIL
- Logs (last 100):
journalctl -u yams -n 100 --no-pager
- Storage health:
- Stats:
yams stats --json | jq
- Capacity:
du -sh "${YAMS_STORAGE:-$HOME/.local/share/yams}"
- TCP (WebSocket transport):
nc -z 127.0.0.1 8080 || exit 1
- Alert quickly on:
- Service down
- Disk usage > 85% on $YAMS_STORAGE
- Error spikes in logs
- Stats report errors or unhealthy state
What to monitor¶
1) Process and service - yams process present and running - systemd unit active and restarting rarely
2) Storage and capacity
- Disk space under $YAMS_STORAGE
- File descriptor limits (if you expect many connections)
- yams stats outputs “healthy”
3) Logs and errors - Rate of ERROR/WARN lines - Repeated migrations or init attempts (misconfiguration) - Transport failures (WebSocket binds/connects)
4) Query/search health (optional) - yams search timing in app logs (if you instrument upstream) - Rate of empty/errored searches in your integration
Quick checks¶
# Service
systemctl is-active yams && echo "yams: active" || (echo "yams: down" && exit 1)
# Logs (errors in the last 5 minutes)
journalctl -u yams --since "5 min ago" --no-pager | grep -E "ERROR|Error|Failed" || true
# Storage capacity
du -sh "${YAMS_STORAGE:-$HOME/.local/share/yams}"
# Basic stats JSON sanity
yams stats --json | jq '{ok: (has("error")|not), total: .total_documents, size_bytes: .total_size_bytes}'
# TCP reachability (WebSocket)
nc -z 127.0.0.1 8080 || echo "port 8080 not reachable"
Prometheus exporter (textfile)¶
If you run node_exporter with the textfile collector, translate yams stats --json
to a .prom
file.
#!/usr/bin/env bash
# /usr/local/bin/yams_stats_exporter.sh
# Requires: jq, yams, and node_exporter textfile collector enabled
set -euo pipefail
OUT_DIR="/var/lib/node_exporter/textfile_collector"
OUT_FILE="${OUT_DIR}/yams.prom"
TMP=$(mktemp)
trap 'rm -f "$TMP"' EXIT
STATS_JSON=$(yams stats --json || echo '{}')
# Extract fields (adjust keys to match your yams stats payload)
total_docs=$(echo "$STATS_JSON" | jq -r '.total_documents // 0')
total_bytes=$(echo "$STATS_JSON" | jq -r '.total_size_bytes // 0')
unique_blocks=$(echo "$STATS_JSON" | jq -r '.unique_blocks // 0')
compression_ratio=$(echo "$STATS_JSON" | jq -r '.compression_ratio // 0')
health=$(echo "$STATS_JSON" | jq -r 'if .error then 0 else 1 end')
cat > "$TMP" <<EOF
# HELP yams_total_documents Total number of documents known to YAMS.
# TYPE yams_total_documents gauge
yams_total_documents $total_docs
# HELP yams_total_size_bytes Total size of stored data in bytes.
# TYPE yams_total_size_bytes gauge
yams_total_size_bytes $total_bytes
# HELP yams_unique_blocks Number of unique storage blocks.
# TYPE yams_unique_blocks gauge
yams_unique_blocks $unique_blocks
# HELP yams_compression_ratio Effective compression ratio (>= 1.0 means compressed).
# TYPE yams_compression_ratio gauge
yams_compression_ratio $compression_ratio
# HELP yams_health Overall YAMS health (1=ok, 0=error).
# TYPE yams_health gauge
yams_health $health
EOF
mv "$TMP" "$OUT_FILE"
Systemd timer to run every minute:
# /etc/systemd/system/yams-stats-exporter.service
[Unit]
Description=YAMS Stats Exporter
[Service]
Type=oneshot
Environment=YAMS_STORAGE=/var/lib/yams
ExecStart=/usr/local/bin/yams_stats_exporter.sh
User=node-exporter
Group=node-exporter
# /etc/systemd/system/yams-stats-exporter.timer
[Unit]
Description=Run YAMS Stats Exporter every minute
[Timer]
OnCalendar=*:0/1
AccuracySec=10s
Persistent=true
[Install]
WantedBy=timers.target
Enable:
systemctl daemon-reload
systemctl enable --now yams-stats-exporter.timer
Suggested alerts¶
- Service down:
- yams_active == 0 for 2m
- Health flag:
- yams_health == 0 for 1m
- Disk capacity:
- node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.15 on $YAMS_STORAGE mount
- Error rate:
- “ERROR|Failed” lines > threshold in last 5m (via log pipeline or Loki)
- Exporter stale:
time() - yams_stats_exporter_timestamp_seconds > 180
Tune thresholds to your environment.
Logs and levels¶
- Default logging goes to stdout/stderr (systemd → journald).
- Tail:
journalctl -u yams -f -o short-iso
- Grep errors:
journalctl -u yams --since "1 hour ago" | grep -E "ERROR|Error|Failed"
- Consider a log shipper (journald → syslog, Loki, etc.) for centralization.
Health checks (app + infra)¶
- App CLI:
yams stats
non‑zero exit or missing fields → unhealthy- TCP:
nc -z 127.0.0.1 8080
- Systemd:
- Restart count > N per hour → investigate flapping
Dashboards (examples)¶
- Storage
- Documents over time, total bytes, compression ratio, unique blocks
- Health
- yams_health, restart count, error rate
- Capacity
- Filesystem usage for
$YAMS_STORAGE
- Queries (if you instrument upstream)
- Search latency p50/p95, result counts
Troubleshooting¶
- Service won’t start:
journalctl -u yams -n 200 --no-pager
- Verify
Environment=YAMS_STORAGE=...
and directory permissions - Stats empty or failing:
- Run
yams stats --json
manually with the same user/env as the service - Port conflicts:
ss -lntp | grep 8080
(Linux); adjust--port
or stop the conflicting service- Disk full:
- Expand or rotate; keep >15% free headroom
See also¶
- Deployment:
./deployment.md
- Backup & Recovery:
./backup.md
(coming soon) - Troubleshooting:
./troubleshooting.md
(coming soon) - Performance Tuning:
./performance.md
(coming soon) - CLI Reference:
../user_guide/cli.md
- Admin Configuration:
../admin/configuration.md