admin/openclaw-workspace

Fork 0

Files

OpenClaw Agent 5deb387aa6 Fresh start - excluded large ROM JSON files

2026-04-11 09:45:12 -05:00

6.5 KiB

Raw Permalink Blame History

Memory Vector System

Status: ✅ Supermonkey-Powered
Created: 2026-03-02
Replaces: Supermemory cloud embedding API

Overview

Local semantic memory search using SQLite + Ollama embeddings. No cloud dependency, no API limits, works offline.

Architecture

File-Based Pipeline

Memory Files (markdown)
    ↓
memory_embedding_worker.py
    ↓
Ollama (nomic-embed-text) → 768-dim vectors
    ↓
SQLite + sqlite-vector extension
    ↓
Cosine similarity search

Real-Time Session Pipeline (NEW)

Discord/Chat Messages
    ↓
OpenClaw Session Transcript (.jsonl)
    ↓
session_monitor.py (cron every 2 min)
    ↓
Count messages → At 15: summarize
    ↓
Ollama (nomic-embed-text)
    ↓
SQLite + sqlite-vector

Innovation: We read OpenClaw's own session transcripts to auto-capture conversations without manual tracking!

Components

Core Module

File: memory_vector.py

MemoryVectorDB class — database wrapper
store_memory() — save embedding
search_memories() — semantic search
setup_memory_vectors() — one-time init

Worker Scripts

File: tools/memory_embedding_worker.py

Daily/batch processing for memory.md files
Processes section-by-section
Called by cron at 3 AM

File: tools/session_monitor.py ⭐ NEW

Reads OpenClaw session transcripts live
Tracks message counts automatically
Creates snapshots every 15 messages
Cron: every 2 minutes
Database: session_tracking table

File: tools/session_snapshotter.py

Manual session capture (legacy)
Use session_monitor.py for auto-tracking

File: tools/search_memories.py

CLI tool for manual searches
Interactive or one-shot mode

File: tools/bulk_memory_loader.py

One-time historical import
Processed 1,186 embeddings on first run

File: scripts/memory-embeddings-cron.ps1

PowerShell wrapper for daily cron
Checks Ollama availability

Database Schema

memory_embeddings

Column	Type	Description
id	INTEGER	Primary key
source_type	TEXT	"daily", "memory_md", "project", "session_snapshot", "auto_session"
source_path	TEXT	File path + section or timestamp
content_text	TEXT	First 500 chars (searchable preview)
embedding	BLOB	768-dim Float32 vector (3,072 bytes)
created_at	TIMESTAMP	Auto-set

session_tracking ⭐ NEW

Column	Type	Description
session_id	TEXT	OpenClaw session UUID
channel_key	TEXT	discord:channel_id
transcript_path	TEXT	Path to .jsonl file
last_message_index	INTEGER	Last processed line
messages_since_snapshot	INTEGER	Counter since last embed
last_checkpoint_time	TIMESTAMP	Last check
is_active	BOOLEAN	Session still exists?

Cron Schedule

Job	Schedule	What It Does
Memory Embeddings Daily	3:00 AM	Process yesterday's memory file
Session Monitor ⭐ NEW	Every 2 min	Reads transcripts, auto-snapshots at 15 msgs
Session Snapshots (legacy)	Manual	Manual capture via script

How Session Monitor Works:

Reads .openclaw/agents/main/sessions/*.jsonl
Tracks last_message_index per session
Counts new user messages
At 15 messages: summarize → embed → store
Updates checkpoint in session_tracking table

Usage

Search

# Search by query
python tools/search_memories.py "home assistant automation"

# Interactive mode
python tools/search_memories.py --interactive

Manual Snapshot

python tools/session_snapshotter.py "Summary of important discussion"

From Python

from memory_vector import search_memories

# Generate query embedding with Ollama
# Then search
results = search_memories(query_embedding, k=5)
# Returns: [(source_path, content_text, distance), ...]

Stats

Metric	Value
Total embeddings	1,623
Daily notes	818
Project files	332
MEMORY.md	33
Manual session snapshots	2
Auto session snapshots ⭐	27
Tracked sessions	245
Active sessions	243
Database size	~5 MB

Live Stats Query:

python -c "import sqlite3; db=sqlite3.connect(r'C:\Users\admin\.openclaw\memory.db'); c=db.cursor(); c.execute('SELECT COUNT(*) FROM memory_embeddings'); print('Total:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM memory_embeddings WHERE source_type=\'auto_session\''); print('Auto snapshots:', c.fetchone()[0]); c.execute('SELECT COUNT(*) FROM session_tracking WHERE is_active=1'); print('Active sessions:', c.fetchone()[0]); db.close()"

Check Session Monitor Status:

# See last run
python tools/session_monitor.py

# Check if cron is running
openclaw cron list | findstr "Session Monitor"

The Innovation ⭐

Problem: How to automatically capture live conversations without manual tracking?

Solution: Read OpenClaw's own session transcripts!

OpenClaw stores every session in .openclaw/agents/main/sessions/[session-id].jsonl. We discovered we can:

Monitor these files live — cron job every 2 minutes
Track line position — last_message_index checkpoint
Count user messages — parse JSONL for role: "user"
Auto-snapshot at threshold — 15 messages → summarize → embed

Why this matters:

✅ No manual message counting
✅ Survives session restarts
✅ Multi-channel aware (each channel = separate session file)
✅ No OpenClaw hooks required (we read their existing data)

Credit: Corey's genius idea 💡 (Corey, 2026-03-03)

Comparison: Old vs New

Feature	Supermemory	SQLite-Vector
Cloud dependency	Required	None
API limits	Yes	No
Offline use	No	Yes
Embeddings stored	Cloud	Local
Search speed	Network latency	<100ms local
Reliability	Crashing	Stable
Cost	API-based	Free

Troubleshooting

Ollama not running

# Check status
Invoke-RestMethod -Uri "http://localhost:11434/api/tags"

# Start Ollama
ollama serve

Missing model

ollama pull nomic-embed-text

Database locked

Close any GUI tools (DB Browser) before running scripts.

Future Enhancements

Keyword filtering alongside vector search
Date range queries
Source type filtering (e.g., only projects)
Embedding quality scoring
Auto-summarization improvements

Status: Fully operational
Last updated: 2026-03-02

6.5 KiB Raw Permalink Blame History