Files
openclaw-workspace/memory/projects/memory-vector-system.md
2026-04-11 09:45:12 -05:00

6.5 KiB

Memory Vector System

Status: Supermonkey-Powered
Created: 2026-03-02
Replaces: Supermemory cloud embedding API


Overview

Local semantic memory search using SQLite + Ollama embeddings. No cloud dependency, no API limits, works offline.

Architecture

File-Based Pipeline

Memory Files (markdown)
    ↓
memory_embedding_worker.py
    ↓
Ollama (nomic-embed-text) → 768-dim vectors
    ↓
SQLite + sqlite-vector extension
    ↓
Cosine similarity search

Real-Time Session Pipeline (NEW)

Discord/Chat Messages
    ↓
OpenClaw Session Transcript (.jsonl)
    ↓
session_monitor.py (cron every 2 min)
    ↓
Count messages → At 15: summarize
    ↓
Ollama (nomic-embed-text)
    ↓
SQLite + sqlite-vector

Innovation: We read OpenClaw's own session transcripts to auto-capture conversations without manual tracking!

Components

Core Module

File: memory_vector.py

  • MemoryVectorDB class — database wrapper
  • store_memory() — save embedding
  • search_memories() — semantic search
  • setup_memory_vectors() — one-time init

Worker Scripts

File: tools/memory_embedding_worker.py

  • Daily/batch processing for memory.md files
  • Processes section-by-section
  • Called by cron at 3 AM

File: tools/session_monitor.py NEW

  • Reads OpenClaw session transcripts live
  • Tracks message counts automatically
  • Creates snapshots every 15 messages
  • Cron: every 2 minutes
  • Database: session_tracking table

File: tools/session_snapshotter.py

  • Manual session capture (legacy)
  • Use session_monitor.py for auto-tracking

File: tools/search_memories.py

  • CLI tool for manual searches
  • Interactive or one-shot mode

File: tools/bulk_memory_loader.py

  • One-time historical import
  • Processed 1,186 embeddings on first run

File: scripts/memory-embeddings-cron.ps1

  • PowerShell wrapper for daily cron
  • Checks Ollama availability

Database Schema

memory_embeddings

Column Type Description
id INTEGER Primary key
source_type TEXT "daily", "memory_md", "project", "session_snapshot", "auto_session"
source_path TEXT File path + section or timestamp
content_text TEXT First 500 chars (searchable preview)
embedding BLOB 768-dim Float32 vector (3,072 bytes)
created_at TIMESTAMP Auto-set

session_tracking NEW

Column Type Description
session_id TEXT OpenClaw session UUID
channel_key TEXT discord:channel_id
transcript_path TEXT Path to .jsonl file
last_message_index INTEGER Last processed line
messages_since_snapshot INTEGER Counter since last embed
last_checkpoint_time TIMESTAMP Last check
is_active BOOLEAN Session still exists?

Cron Schedule

Job Schedule What It Does
Memory Embeddings Daily 3:00 AM Process yesterday's memory file
Session Monitor NEW Every 2 min Reads transcripts, auto-snapshots at 15 msgs
Session Snapshots (legacy) Manual Manual capture via script

How Session Monitor Works:

  1. Reads .openclaw/agents/main/sessions/*.jsonl
  2. Tracks last_message_index per session
  3. Counts new user messages
  4. At 15 messages: summarize → embed → store
  5. Updates checkpoint in session_tracking table

Usage

# Search by query
python tools/search_memories.py "home assistant automation"

# Interactive mode
python tools/search_memories.py --interactive

Manual Snapshot

python tools/session_snapshotter.py "Summary of important discussion"

From Python

from memory_vector import search_memories

# Generate query embedding with Ollama
# Then search
results = search_memories(query_embedding, k=5)
# Returns: [(source_path, content_text, distance), ...]

Stats

Metric Value
Total embeddings 1,623
Daily notes 818
Project files 332
MEMORY.md 33
Manual session snapshots 2
Auto session snapshots 27
Tracked sessions 245
Active sessions 243
Database size ~5 MB

Live Stats Query:

python -c "import sqlite3; db=sqlite3.connect(r'C:\Users\admin\.openclaw\memory.db'); c=db.cursor(); c.execute('SELECT COUNT(*) FROM memory_embeddings'); print('Total:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM memory_embeddings WHERE source_type=\'auto_session\''); print('Auto snapshots:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM session_tracking WHERE is_active=1'); print('Active sessions:', c.fetchone()[0]); db.close()"

Check Session Monitor Status:

# See last run
python tools/session_monitor.py

# Check if cron is running
openclaw cron list | findstr "Session Monitor"

The Innovation

Problem: How to automatically capture live conversations without manual tracking?

Solution: Read OpenClaw's own session transcripts!

OpenClaw stores every session in .openclaw/agents/main/sessions/[session-id].jsonl. We discovered we can:

  1. Monitor these files live — cron job every 2 minutes
  2. Track line positionlast_message_index checkpoint
  3. Count user messages — parse JSONL for role: "user"
  4. Auto-snapshot at threshold — 15 messages → summarize → embed

Why this matters:

  • No manual message counting
  • Survives session restarts
  • Multi-channel aware (each channel = separate session file)
  • No OpenClaw hooks required (we read their existing data)

Credit: Corey's genius idea 💡 (Corey, 2026-03-03)


Comparison: Old vs New

Feature Supermemory SQLite-Vector
Cloud dependency Required None
API limits Yes No
Offline use No Yes
Embeddings stored Cloud Local
Search speed Network latency <100ms local
Reliability Crashing Stable
Cost API-based Free

Troubleshooting

Ollama not running

# Check status
Invoke-RestMethod -Uri "http://localhost:11434/api/tags"

# Start Ollama
ollama serve

Missing model

ollama pull nomic-embed-text

Database locked

Close any GUI tools (DB Browser) before running scripts.

Future Enhancements

  • Keyword filtering alongside vector search
  • Date range queries
  • Source type filtering (e.g., only projects)
  • Embedding quality scoring
  • Auto-summarization improvements

Status: Fully operational
Last updated: 2026-03-02