Fresh start - excluded large ROM JSON files

2026-04-11 09:45:12 -05:00
commit 5deb387aa6
395 changed files with 47744 additions and 0 deletions
--- a/memory/projects/memory-vector-system.md
+++ b/memory/projects/memory-vector-system.md
@@ -0,0 +1,240 @@
+# Memory Vector System
+
+**Status:** ✅ Supermonkey-Powered  
+**Created:** 2026-03-02  
+**Replaces:** Supermemory cloud embedding API
+
+---
+
+## Overview
+
+Local semantic memory search using SQLite + Ollama embeddings. No cloud dependency, no API limits, works offline.
+
+## Architecture
+
+### File-Based Pipeline
+```
+Memory Files (markdown)
+    ↓
+memory_embedding_worker.py
+    ↓
+Ollama (nomic-embed-text) → 768-dim vectors
+    ↓
+SQLite + sqlite-vector extension
+    ↓
+Cosine similarity search
+```
+
+### Real-Time Session Pipeline (NEW)
+```
+Discord/Chat Messages
+    ↓
+OpenClaw Session Transcript (.jsonl)
+    ↓
+session_monitor.py (cron every 2 min)
+    ↓
+Count messages → At 15: summarize
+    ↓
+Ollama (nomic-embed-text)
+    ↓
+SQLite + sqlite-vector
+```
+
+**Innovation:** We read OpenClaw's own session transcripts to auto-capture conversations without manual tracking!
+
+## Components
+
+### Core Module
+**File:** `memory_vector.py`
+- `MemoryVectorDB` class — database wrapper
+- `store_memory()` — save embedding
+- `search_memories()` — semantic search
+- `setup_memory_vectors()` — one-time init
+
+### Worker Scripts
+**File:** `tools/memory_embedding_worker.py`
+- Daily/batch processing for memory.md files
+- Processes section-by-section
+- Called by cron at 3 AM
+
+**File:** `tools/session_monitor.py` ⭐ NEW
+- Reads OpenClaw session transcripts live
+- Tracks message counts automatically
+- Creates snapshots every 15 messages
+- Cron: every 2 minutes
+- Database: `session_tracking` table
+
+**File:** `tools/session_snapshotter.py`
+- Manual session capture (legacy)
+- Use session_monitor.py for auto-tracking
+
+**File:** `tools/search_memories.py`
+- CLI tool for manual searches
+- Interactive or one-shot mode
+
+**File:** `tools/bulk_memory_loader.py`
+- One-time historical import
+- Processed 1,186 embeddings on first run
+
+**File:** `scripts/memory-embeddings-cron.ps1`
+- PowerShell wrapper for daily cron
+- Checks Ollama availability
+
+## Database Schema
+
+### memory_embeddings
+| Column | Type | Description |
+|--------|------|-------------|
+| id | INTEGER | Primary key |
+| source_type | TEXT | "daily", "memory_md", "project", "session_snapshot", "auto_session" |
+| source_path | TEXT | File path + section or timestamp |
+| content_text | TEXT | First 500 chars (searchable preview) |
+| embedding | BLOB | 768-dim Float32 vector (3,072 bytes) |
+| created_at | TIMESTAMP | Auto-set |
+
+### session_tracking ⭐ NEW
+| Column | Type | Description |
+|--------|------|-------------|
+| session_id | TEXT | OpenClaw session UUID |
+| channel_key | TEXT | discord:channel_id |
+| transcript_path | TEXT | Path to .jsonl file |
+| last_message_index | INTEGER | Last processed line |
+| messages_since_snapshot | INTEGER | Counter since last embed |
+| last_checkpoint_time | TIMESTAMP | Last check |
+| is_active | BOOLEAN | Session still exists? |
+
+## Cron Schedule
+
+| Job | Schedule | What It Does |
+|-----|----------|--------------|
+| **Memory Embeddings Daily** | 3:00 AM | Process yesterday's memory file |
+| **Session Monitor** ⭐ NEW | Every 2 min | Reads transcripts, auto-snapshots at 15 msgs |
+| Session Snapshots (legacy) | Manual | Manual capture via script |
+
+**How Session Monitor Works:**
+1. Reads `.openclaw/agents/main/sessions/*.jsonl`
+2. Tracks `last_message_index` per session
+3. Counts new user messages
+4. At 15 messages: summarize → embed → store
+5. Updates checkpoint in `session_tracking` table
+
+## Usage
+
+### Search
+```powershell
+# Search by query
+python tools/search_memories.py "home assistant automation"
+
+# Interactive mode
+python tools/search_memories.py --interactive
+```
+
+### Manual Snapshot
+```powershell
+python tools/session_snapshotter.py "Summary of important discussion"
+```
+
+### From Python
+```python
+from memory_vector import search_memories
+
+# Generate query embedding with Ollama
+# Then search
+results = search_memories(query_embedding, k=5)
+# Returns: [(source_path, content_text, distance), ...]
+```
+
+## Stats
+
+| Metric | Value |
+|--------|-------|
+| Total embeddings | **1,623** |
+| Daily notes | 818 |
+| Project files | 332 |
+| MEMORY.md | 33 |
+| Manual session snapshots | 2 |
+| **Auto session snapshots** ⭐ | **27** |
+| Tracked sessions | 245 |
+| Active sessions | 243 |
+| Database size | ~5 MB |
+
+**Live Stats Query:**
+```powershell
+python -c "import sqlite3; db=sqlite3.connect(r'C:\Users\admin\.openclaw\memory.db'); c=db.cursor(); c.execute('SELECT COUNT(*) FROM memory_embeddings'); print('Total:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM memory_embeddings WHERE source_type=\'auto_session\''); print('Auto snapshots:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM session_tracking WHERE is_active=1'); print('Active sessions:', c.fetchone()[0]); db.close()"
+```
+
+**Check Session Monitor Status:**
+```powershell
+# See last run
+python tools/session_monitor.py
+
+# Check if cron is running
+openclaw cron list | findstr "Session Monitor"
+```
+
+## The Innovation ⭐
+
+**Problem:** How to automatically capture live conversations without manual tracking?
+
+**Solution:** Read OpenClaw's own session transcripts!
+
+OpenClaw stores every session in `.openclaw/agents/main/sessions/[session-id].jsonl`. We discovered we can:
+
+1. **Monitor these files live** — cron job every 2 minutes
+2. **Track line position** — `last_message_index` checkpoint
+3. **Count user messages** — parse JSONL for `role: "user"`
+4. **Auto-snapshot at threshold** — 15 messages → summarize → embed
+
+**Why this matters:**
+- ✅ No manual message counting
+- ✅ Survives session restarts
+- ✅ Multi-channel aware (each channel = separate session file)
+- ✅ No OpenClaw hooks required (we read their existing data)
+
+**Credit:** Corey's genius idea 💡 *(Corey, 2026-03-03)*
+
+---
+
+## Comparison: Old vs New
+
+| Feature | Supermemory | SQLite-Vector |
+|---------|-------------|---------------|
+| Cloud dependency | Required | None |
+| API limits | Yes | No |
+| Offline use | No | Yes |
+| Embeddings stored | Cloud | Local |
+| Search speed | Network latency | <100ms local |
+| Reliability | Crashing | Stable |
+| Cost | API-based | Free |
+
+## Troubleshooting
+
+### Ollama not running
+```powershell
+# Check status
+Invoke-RestMethod -Uri "http://localhost:11434/api/tags"
+
+# Start Ollama
+ollama serve
+```
+
+### Missing model
+```powershell
+ollama pull nomic-embed-text
+```
+
+### Database locked
+Close any GUI tools (DB Browser) before running scripts.
+
+## Future Enhancements
+
+- [ ] Keyword filtering alongside vector search
+- [ ] Date range queries
+- [ ] Source type filtering (e.g., only projects)
+- [ ] Embedding quality scoring
+- [ ] Auto-summarization improvements
+
+---
+
+**Status:** Fully operational  
+**Last updated:** 2026-03-02