6.5 KiB
Memory Vector System
Status: ✅ Supermonkey-Powered
Created: 2026-03-02
Replaces: Supermemory cloud embedding API
Overview
Local semantic memory search using SQLite + Ollama embeddings. No cloud dependency, no API limits, works offline.
Architecture
File-Based Pipeline
Memory Files (markdown)
↓
memory_embedding_worker.py
↓
Ollama (nomic-embed-text) → 768-dim vectors
↓
SQLite + sqlite-vector extension
↓
Cosine similarity search
Real-Time Session Pipeline (NEW)
Discord/Chat Messages
↓
OpenClaw Session Transcript (.jsonl)
↓
session_monitor.py (cron every 2 min)
↓
Count messages → At 15: summarize
↓
Ollama (nomic-embed-text)
↓
SQLite + sqlite-vector
Innovation: We read OpenClaw's own session transcripts to auto-capture conversations without manual tracking!
Components
Core Module
File: memory_vector.py
MemoryVectorDBclass — database wrapperstore_memory()— save embeddingsearch_memories()— semantic searchsetup_memory_vectors()— one-time init
Worker Scripts
File: tools/memory_embedding_worker.py
- Daily/batch processing for memory.md files
- Processes section-by-section
- Called by cron at 3 AM
File: tools/session_monitor.py ⭐ NEW
- Reads OpenClaw session transcripts live
- Tracks message counts automatically
- Creates snapshots every 15 messages
- Cron: every 2 minutes
- Database:
session_trackingtable
File: tools/session_snapshotter.py
- Manual session capture (legacy)
- Use session_monitor.py for auto-tracking
File: tools/search_memories.py
- CLI tool for manual searches
- Interactive or one-shot mode
File: tools/bulk_memory_loader.py
- One-time historical import
- Processed 1,186 embeddings on first run
File: scripts/memory-embeddings-cron.ps1
- PowerShell wrapper for daily cron
- Checks Ollama availability
Database Schema
memory_embeddings
| Column | Type | Description |
|---|---|---|
| id | INTEGER | Primary key |
| source_type | TEXT | "daily", "memory_md", "project", "session_snapshot", "auto_session" |
| source_path | TEXT | File path + section or timestamp |
| content_text | TEXT | First 500 chars (searchable preview) |
| embedding | BLOB | 768-dim Float32 vector (3,072 bytes) |
| created_at | TIMESTAMP | Auto-set |
session_tracking ⭐ NEW
| Column | Type | Description |
|---|---|---|
| session_id | TEXT | OpenClaw session UUID |
| channel_key | TEXT | discord:channel_id |
| transcript_path | TEXT | Path to .jsonl file |
| last_message_index | INTEGER | Last processed line |
| messages_since_snapshot | INTEGER | Counter since last embed |
| last_checkpoint_time | TIMESTAMP | Last check |
| is_active | BOOLEAN | Session still exists? |
Cron Schedule
| Job | Schedule | What It Does |
|---|---|---|
| Memory Embeddings Daily | 3:00 AM | Process yesterday's memory file |
| Session Monitor ⭐ NEW | Every 2 min | Reads transcripts, auto-snapshots at 15 msgs |
| Session Snapshots (legacy) | Manual | Manual capture via script |
How Session Monitor Works:
- Reads
.openclaw/agents/main/sessions/*.jsonl - Tracks
last_message_indexper session - Counts new user messages
- At 15 messages: summarize → embed → store
- Updates checkpoint in
session_trackingtable
Usage
Search
# Search by query
python tools/search_memories.py "home assistant automation"
# Interactive mode
python tools/search_memories.py --interactive
Manual Snapshot
python tools/session_snapshotter.py "Summary of important discussion"
From Python
from memory_vector import search_memories
# Generate query embedding with Ollama
# Then search
results = search_memories(query_embedding, k=5)
# Returns: [(source_path, content_text, distance), ...]
Stats
| Metric | Value |
|---|---|
| Total embeddings | 1,623 |
| Daily notes | 818 |
| Project files | 332 |
| MEMORY.md | 33 |
| Manual session snapshots | 2 |
| Auto session snapshots ⭐ | 27 |
| Tracked sessions | 245 |
| Active sessions | 243 |
| Database size | ~5 MB |
Live Stats Query:
python -c "import sqlite3; db=sqlite3.connect(r'C:\Users\admin\.openclaw\memory.db'); c=db.cursor(); c.execute('SELECT COUNT(*) FROM memory_embeddings'); print('Total:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM memory_embeddings WHERE source_type=\'auto_session\''); print('Auto snapshots:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM session_tracking WHERE is_active=1'); print('Active sessions:', c.fetchone()[0]); db.close()"
Check Session Monitor Status:
# See last run
python tools/session_monitor.py
# Check if cron is running
openclaw cron list | findstr "Session Monitor"
The Innovation ⭐
Problem: How to automatically capture live conversations without manual tracking?
Solution: Read OpenClaw's own session transcripts!
OpenClaw stores every session in .openclaw/agents/main/sessions/[session-id].jsonl. We discovered we can:
- Monitor these files live — cron job every 2 minutes
- Track line position —
last_message_indexcheckpoint - Count user messages — parse JSONL for
role: "user" - Auto-snapshot at threshold — 15 messages → summarize → embed
Why this matters:
- ✅ No manual message counting
- ✅ Survives session restarts
- ✅ Multi-channel aware (each channel = separate session file)
- ✅ No OpenClaw hooks required (we read their existing data)
Credit: Corey's genius idea 💡 (Corey, 2026-03-03)
Comparison: Old vs New
| Feature | Supermemory | SQLite-Vector |
|---|---|---|
| Cloud dependency | Required | None |
| API limits | Yes | No |
| Offline use | No | Yes |
| Embeddings stored | Cloud | Local |
| Search speed | Network latency | <100ms local |
| Reliability | Crashing | Stable |
| Cost | API-based | Free |
Troubleshooting
Ollama not running
# Check status
Invoke-RestMethod -Uri "http://localhost:11434/api/tags"
# Start Ollama
ollama serve
Missing model
ollama pull nomic-embed-text
Database locked
Close any GUI tools (DB Browser) before running scripts.
Future Enhancements
- Keyword filtering alongside vector search
- Date range queries
- Source type filtering (e.g., only projects)
- Embedding quality scoring
- Auto-summarization improvements
Status: Fully operational
Last updated: 2026-03-02