Fresh start - excluded large ROM JSON files
This commit is contained in:
240
memory/projects/memory-vector-system.md
Normal file
240
memory/projects/memory-vector-system.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Memory Vector System
|
||||
|
||||
**Status:** ✅ Supermonkey-Powered
|
||||
**Created:** 2026-03-02
|
||||
**Replaces:** Supermemory cloud embedding API
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Local semantic memory search using SQLite + Ollama embeddings. No cloud dependency, no API limits, works offline.
|
||||
|
||||
## Architecture
|
||||
|
||||
### File-Based Pipeline
|
||||
```
|
||||
Memory Files (markdown)
|
||||
↓
|
||||
memory_embedding_worker.py
|
||||
↓
|
||||
Ollama (nomic-embed-text) → 768-dim vectors
|
||||
↓
|
||||
SQLite + sqlite-vector extension
|
||||
↓
|
||||
Cosine similarity search
|
||||
```
|
||||
|
||||
### Real-Time Session Pipeline (NEW)
|
||||
```
|
||||
Discord/Chat Messages
|
||||
↓
|
||||
OpenClaw Session Transcript (.jsonl)
|
||||
↓
|
||||
session_monitor.py (cron every 2 min)
|
||||
↓
|
||||
Count messages → At 15: summarize
|
||||
↓
|
||||
Ollama (nomic-embed-text)
|
||||
↓
|
||||
SQLite + sqlite-vector
|
||||
```
|
||||
|
||||
**Innovation:** We read OpenClaw's own session transcripts to auto-capture conversations without manual tracking!
|
||||
|
||||
## Components
|
||||
|
||||
### Core Module
|
||||
**File:** `memory_vector.py`
|
||||
- `MemoryVectorDB` class — database wrapper
|
||||
- `store_memory()` — save embedding
|
||||
- `search_memories()` — semantic search
|
||||
- `setup_memory_vectors()` — one-time init
|
||||
|
||||
### Worker Scripts
|
||||
**File:** `tools/memory_embedding_worker.py`
|
||||
- Daily/batch processing for memory.md files
|
||||
- Processes section-by-section
|
||||
- Called by cron at 3 AM
|
||||
|
||||
**File:** `tools/session_monitor.py` ⭐ NEW
|
||||
- Reads OpenClaw session transcripts live
|
||||
- Tracks message counts automatically
|
||||
- Creates snapshots every 15 messages
|
||||
- Cron: every 2 minutes
|
||||
- Database: `session_tracking` table
|
||||
|
||||
**File:** `tools/session_snapshotter.py`
|
||||
- Manual session capture (legacy)
|
||||
- Use session_monitor.py for auto-tracking
|
||||
|
||||
**File:** `tools/search_memories.py`
|
||||
- CLI tool for manual searches
|
||||
- Interactive or one-shot mode
|
||||
|
||||
**File:** `tools/bulk_memory_loader.py`
|
||||
- One-time historical import
|
||||
- Processed 1,186 embeddings on first run
|
||||
|
||||
**File:** `scripts/memory-embeddings-cron.ps1`
|
||||
- PowerShell wrapper for daily cron
|
||||
- Checks Ollama availability
|
||||
|
||||
## Database Schema
|
||||
|
||||
### memory_embeddings
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | INTEGER | Primary key |
|
||||
| source_type | TEXT | "daily", "memory_md", "project", "session_snapshot", "auto_session" |
|
||||
| source_path | TEXT | File path + section or timestamp |
|
||||
| content_text | TEXT | First 500 chars (searchable preview) |
|
||||
| embedding | BLOB | 768-dim Float32 vector (3,072 bytes) |
|
||||
| created_at | TIMESTAMP | Auto-set |
|
||||
|
||||
### session_tracking ⭐ NEW
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| session_id | TEXT | OpenClaw session UUID |
|
||||
| channel_key | TEXT | discord:channel_id |
|
||||
| transcript_path | TEXT | Path to .jsonl file |
|
||||
| last_message_index | INTEGER | Last processed line |
|
||||
| messages_since_snapshot | INTEGER | Counter since last embed |
|
||||
| last_checkpoint_time | TIMESTAMP | Last check |
|
||||
| is_active | BOOLEAN | Session still exists? |
|
||||
|
||||
## Cron Schedule
|
||||
|
||||
| Job | Schedule | What It Does |
|
||||
|-----|----------|--------------|
|
||||
| **Memory Embeddings Daily** | 3:00 AM | Process yesterday's memory file |
|
||||
| **Session Monitor** ⭐ NEW | Every 2 min | Reads transcripts, auto-snapshots at 15 msgs |
|
||||
| Session Snapshots (legacy) | Manual | Manual capture via script |
|
||||
|
||||
**How Session Monitor Works:**
|
||||
1. Reads `.openclaw/agents/main/sessions/*.jsonl`
|
||||
2. Tracks `last_message_index` per session
|
||||
3. Counts new user messages
|
||||
4. At 15 messages: summarize → embed → store
|
||||
5. Updates checkpoint in `session_tracking` table
|
||||
|
||||
## Usage
|
||||
|
||||
### Search
|
||||
```powershell
|
||||
# Search by query
|
||||
python tools/search_memories.py "home assistant automation"
|
||||
|
||||
# Interactive mode
|
||||
python tools/search_memories.py --interactive
|
||||
```
|
||||
|
||||
### Manual Snapshot
|
||||
```powershell
|
||||
python tools/session_snapshotter.py "Summary of important discussion"
|
||||
```
|
||||
|
||||
### From Python
|
||||
```python
|
||||
from memory_vector import search_memories
|
||||
|
||||
# Generate query embedding with Ollama
|
||||
# Then search
|
||||
results = search_memories(query_embedding, k=5)
|
||||
# Returns: [(source_path, content_text, distance), ...]
|
||||
```
|
||||
|
||||
## Stats
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Total embeddings | **1,623** |
|
||||
| Daily notes | 818 |
|
||||
| Project files | 332 |
|
||||
| MEMORY.md | 33 |
|
||||
| Manual session snapshots | 2 |
|
||||
| **Auto session snapshots** ⭐ | **27** |
|
||||
| Tracked sessions | 245 |
|
||||
| Active sessions | 243 |
|
||||
| Database size | ~5 MB |
|
||||
|
||||
**Live Stats Query:**
|
||||
```powershell
|
||||
python -c "import sqlite3; db=sqlite3.connect(r'C:\Users\admin\.openclaw\memory.db'); c=db.cursor(); c.execute('SELECT COUNT(*) FROM memory_embeddings'); print('Total:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM memory_embeddings WHERE source_type=\'auto_session\''); print('Auto snapshots:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM session_tracking WHERE is_active=1'); print('Active sessions:', c.fetchone()[0]); db.close()"
|
||||
```
|
||||
|
||||
**Check Session Monitor Status:**
|
||||
```powershell
|
||||
# See last run
|
||||
python tools/session_monitor.py
|
||||
|
||||
# Check if cron is running
|
||||
openclaw cron list | findstr "Session Monitor"
|
||||
```
|
||||
|
||||
## The Innovation ⭐
|
||||
|
||||
**Problem:** How to automatically capture live conversations without manual tracking?
|
||||
|
||||
**Solution:** Read OpenClaw's own session transcripts!
|
||||
|
||||
OpenClaw stores every session in `.openclaw/agents/main/sessions/[session-id].jsonl`. We discovered we can:
|
||||
|
||||
1. **Monitor these files live** — cron job every 2 minutes
|
||||
2. **Track line position** — `last_message_index` checkpoint
|
||||
3. **Count user messages** — parse JSONL for `role: "user"`
|
||||
4. **Auto-snapshot at threshold** — 15 messages → summarize → embed
|
||||
|
||||
**Why this matters:**
|
||||
- ✅ No manual message counting
|
||||
- ✅ Survives session restarts
|
||||
- ✅ Multi-channel aware (each channel = separate session file)
|
||||
- ✅ No OpenClaw hooks required (we read their existing data)
|
||||
|
||||
**Credit:** Corey's genius idea 💡 *(Corey, 2026-03-03)*
|
||||
|
||||
---
|
||||
|
||||
## Comparison: Old vs New
|
||||
|
||||
| Feature | Supermemory | SQLite-Vector |
|
||||
|---------|-------------|---------------|
|
||||
| Cloud dependency | Required | None |
|
||||
| API limits | Yes | No |
|
||||
| Offline use | No | Yes |
|
||||
| Embeddings stored | Cloud | Local |
|
||||
| Search speed | Network latency | <100ms local |
|
||||
| Reliability | Crashing | Stable |
|
||||
| Cost | API-based | Free |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Ollama not running
|
||||
```powershell
|
||||
# Check status
|
||||
Invoke-RestMethod -Uri "http://localhost:11434/api/tags"
|
||||
|
||||
# Start Ollama
|
||||
ollama serve
|
||||
```
|
||||
|
||||
### Missing model
|
||||
```powershell
|
||||
ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
### Database locked
|
||||
Close any GUI tools (DB Browser) before running scripts.
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- [ ] Keyword filtering alongside vector search
|
||||
- [ ] Date range queries
|
||||
- [ ] Source type filtering (e.g., only projects)
|
||||
- [ ] Embedding quality scoring
|
||||
- [ ] Auto-summarization improvements
|
||||
|
||||
---
|
||||
|
||||
**Status:** Fully operational
|
||||
**Last updated:** 2026-03-02
|
||||
Reference in New Issue
Block a user