# Memory Vector System **Status:** ✅ Supermonkey-Powered **Created:** 2026-03-02 **Replaces:** Supermemory cloud embedding API --- ## Overview Local semantic memory search using SQLite + Ollama embeddings. No cloud dependency, no API limits, works offline. ## Architecture ### File-Based Pipeline ``` Memory Files (markdown) ↓ memory_embedding_worker.py ↓ Ollama (nomic-embed-text) → 768-dim vectors ↓ SQLite + sqlite-vector extension ↓ Cosine similarity search ``` ### Real-Time Session Pipeline (NEW) ``` Discord/Chat Messages ↓ OpenClaw Session Transcript (.jsonl) ↓ session_monitor.py (cron every 2 min) ↓ Count messages → At 15: summarize ↓ Ollama (nomic-embed-text) ↓ SQLite + sqlite-vector ``` **Innovation:** We read OpenClaw's own session transcripts to auto-capture conversations without manual tracking! ## Components ### Core Module **File:** `memory_vector.py` - `MemoryVectorDB` class — database wrapper - `store_memory()` — save embedding - `search_memories()` — semantic search - `setup_memory_vectors()` — one-time init ### Worker Scripts **File:** `tools/memory_embedding_worker.py` - Daily/batch processing for memory.md files - Processes section-by-section - Called by cron at 3 AM **File:** `tools/session_monitor.py` ⭐ NEW - Reads OpenClaw session transcripts live - Tracks message counts automatically - Creates snapshots every 15 messages - Cron: every 2 minutes - Database: `session_tracking` table **File:** `tools/session_snapshotter.py` - Manual session capture (legacy) - Use session_monitor.py for auto-tracking **File:** `tools/search_memories.py` - CLI tool for manual searches - Interactive or one-shot mode **File:** `tools/bulk_memory_loader.py` - One-time historical import - Processed 1,186 embeddings on first run **File:** `scripts/memory-embeddings-cron.ps1` - PowerShell wrapper for daily cron - Checks Ollama availability ## Database Schema ### memory_embeddings | Column | Type | Description | |--------|------|-------------| | id | INTEGER | Primary key | | source_type | TEXT | "daily", "memory_md", "project", "session_snapshot", "auto_session" | | source_path | TEXT | File path + section or timestamp | | content_text | TEXT | First 500 chars (searchable preview) | | embedding | BLOB | 768-dim Float32 vector (3,072 bytes) | | created_at | TIMESTAMP | Auto-set | ### session_tracking ⭐ NEW | Column | Type | Description | |--------|------|-------------| | session_id | TEXT | OpenClaw session UUID | | channel_key | TEXT | discord:channel_id | | transcript_path | TEXT | Path to .jsonl file | | last_message_index | INTEGER | Last processed line | | messages_since_snapshot | INTEGER | Counter since last embed | | last_checkpoint_time | TIMESTAMP | Last check | | is_active | BOOLEAN | Session still exists? | ## Cron Schedule | Job | Schedule | What It Does | |-----|----------|--------------| | **Memory Embeddings Daily** | 3:00 AM | Process yesterday's memory file | | **Session Monitor** ⭐ NEW | Every 2 min | Reads transcripts, auto-snapshots at 15 msgs | | Session Snapshots (legacy) | Manual | Manual capture via script | **How Session Monitor Works:** 1. Reads `.openclaw/agents/main/sessions/*.jsonl` 2. Tracks `last_message_index` per session 3. Counts new user messages 4. At 15 messages: summarize → embed → store 5. Updates checkpoint in `session_tracking` table ## Usage ### Search ```powershell # Search by query python tools/search_memories.py "home assistant automation" # Interactive mode python tools/search_memories.py --interactive ``` ### Manual Snapshot ```powershell python tools/session_snapshotter.py "Summary of important discussion" ``` ### From Python ```python from memory_vector import search_memories # Generate query embedding with Ollama # Then search results = search_memories(query_embedding, k=5) # Returns: [(source_path, content_text, distance), ...] ``` ## Stats | Metric | Value | |--------|-------| | Total embeddings | **1,623** | | Daily notes | 818 | | Project files | 332 | | MEMORY.md | 33 | | Manual session snapshots | 2 | | **Auto session snapshots** ⭐ | **27** | | Tracked sessions | 245 | | Active sessions | 243 | | Database size | ~5 MB | **Live Stats Query:** ```powershell python -c "import sqlite3; db=sqlite3.connect(r'C:\Users\admin\.openclaw\memory.db'); c=db.cursor(); c.execute('SELECT COUNT(*) FROM memory_embeddings'); print('Total:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM memory_embeddings WHERE source_type=\'auto_session\''); print('Auto snapshots:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM session_tracking WHERE is_active=1'); print('Active sessions:', c.fetchone()[0]); db.close()" ``` **Check Session Monitor Status:** ```powershell # See last run python tools/session_monitor.py # Check if cron is running openclaw cron list | findstr "Session Monitor" ``` ## The Innovation ⭐ **Problem:** How to automatically capture live conversations without manual tracking? **Solution:** Read OpenClaw's own session transcripts! OpenClaw stores every session in `.openclaw/agents/main/sessions/[session-id].jsonl`. We discovered we can: 1. **Monitor these files live** — cron job every 2 minutes 2. **Track line position** — `last_message_index` checkpoint 3. **Count user messages** — parse JSONL for `role: "user"` 4. **Auto-snapshot at threshold** — 15 messages → summarize → embed **Why this matters:** - ✅ No manual message counting - ✅ Survives session restarts - ✅ Multi-channel aware (each channel = separate session file) - ✅ No OpenClaw hooks required (we read their existing data) **Credit:** Corey's genius idea 💡 *(Corey, 2026-03-03)* --- ## Comparison: Old vs New | Feature | Supermemory | SQLite-Vector | |---------|-------------|---------------| | Cloud dependency | Required | None | | API limits | Yes | No | | Offline use | No | Yes | | Embeddings stored | Cloud | Local | | Search speed | Network latency | <100ms local | | Reliability | Crashing | Stable | | Cost | API-based | Free | ## Troubleshooting ### Ollama not running ```powershell # Check status Invoke-RestMethod -Uri "http://localhost:11434/api/tags" # Start Ollama ollama serve ``` ### Missing model ```powershell ollama pull nomic-embed-text ``` ### Database locked Close any GUI tools (DB Browser) before running scripts. ## Future Enhancements - [ ] Keyword filtering alongside vector search - [ ] Date range queries - [ ] Source type filtering (e.g., only projects) - [ ] Embedding quality scoring - [ ] Auto-summarization improvements --- **Status:** Fully operational **Last updated:** 2026-03-02