7.9 KiB
NL2SQL — Natural Language Search for Historian Data
This document captures the ongoing work to add conversational, open-ended search capabilities to the LASUCA controls dashboard. The goal is to let operators ask questions in plain English (e.g., "Show me Boiler 3 steam for the last 24 hours") and receive historian data without needing to know the underlying SQL.
1. Background
Current state
| Layer | Description |
|---|---|
| Data store | SQL Server historian on 192.168.0.13\SQLEXPRESS, database history. Core tables: dbo.archive (timestamped values), dbo.id_names (tag metadata). |
| Search UI | PHP forms (opcsearch.php, trend pages) with pre-built queries driven by dropdowns (tag picker, date range). |
| RAG infrastructure | Vector DB already in use for memory/context; SQL agents consume context files to help generate queries. |
Objective
Replace or augment the rigid form-based search with a conversational interface that:
- Accepts natural-language questions.
- Translates them to valid SQL via an LLM agent.
- Executes the query safely (read-only, row-limited).
- Returns results in a friendly format (table, chart link, plain-English summary).
2. Architecture overview
┌──────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ Chat UI │─────▶│ nl_query.php │─────▶│ SQL Agent │
│ (browser) │ │ (orchestrator) │ │ (Python/LLM) │
└──────────────┘ └───────────────────┘ └────────┬────────┘
│
┌──────────────────────────────────────────────────┘
▼
┌────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Schema context │ │ Few-shot store │ │ Tag index │
│ (schema.md) │ │ (vector DB) │ │ (id_names embed)│
└────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└──────────────────────┴─────────────────────┘
│
▼
┌─────────────────┐
│ SQL Server │
│ (history DB) │
└─────────────────┘
Key components
| Component | Purpose | Status |
|---|---|---|
| Schema context | Compact description of tables, columns, types, relationships, and sample rows. Gives the LLM structural knowledge. | Export created; cleanup in progress. |
| Few-shot store | Curated pairs of (user_query, sql_template) embedded in vector DB. Retriever surfaces the most relevant examples at query time. |
Seeding strategy defined; logging table created. |
| Tag index | id_names rows embedded so fuzzy tag references resolve to exact names. |
Planned. |
| Query log | Captures form-based searches to bootstrap few-shot examples and track usage patterns. | Table schema created (dbo.controls_query_log). |
| Orchestrator endpoint | PHP (or Python) service that accepts NL prompt, calls agent, validates SQL, executes, returns results. | Not started. |
| Chat UI | Lightweight front-end alongside existing forms. | Not started. |
3. Query logging
To seed few-shot examples from real usage, we created a logging table in SQL Server.
Location: sql/create_query_log.sql
Table: dbo.controls_query_log
| Column | Type | Description |
|---|---|---|
id |
INT (identity) | Primary key. |
user_query |
NVARCHAR(1000) | Natural-language description derived from form inputs. |
sql_template |
NVARCHAR(MAX) | Parameterized SQL with placeholders (:tag, :start, :end). |
params |
NVARCHAR(MAX) | JSON blob of actual parameter values. |
source_form |
NVARCHAR(100) | Originating form (e.g., opcsearch, trend). |
user_session |
NVARCHAR(100) | Optional session/user ID. |
execution_ms |
INT | Query runtime in milliseconds. |
row_count |
INT | Rows returned. |
created_at |
DATETIME2 | UTC timestamp (defaults to SYSUTCDATETIME()). |
Helper view: dbo.vw_controls_query_templates
Aggregates distinct sql_template values with usage counts and a sample user_query for each—useful when exporting few-shot pairs.
Deduplication strategy
Form queries are repetitive (same SQL template, different parameters). When seeding the vector DB:
- Group by
sql_template. - Pick one representative
user_queryper template (or synthesize a canonical description). - Embed the
user_query; attachsql_templateas metadata.
4. Safety guardrails
| Guardrail | Implementation |
|---|---|
| Read-only connection | Agent uses a login with db_datareader only; no write permissions. |
| Row limit injection | Generated SQL wrapped with SELECT TOP 1000 ... or validated to include a cap. |
| Table/column allow-list | Agent context explicitly lists permitted objects; LLM instructed to reject queries outside scope. |
| Syntax validation | Run SET PARSEONLY ON or use sqlparse before execution. |
| Audit log | Every generated query stored in query_log with execution stats. |
5. Next steps
Immediate (in progress)
- Create
controls_query_logtable onlasucaaidatabase (192.168.0.16) - Build logging helper (
includes/log_query.php) - Instrument
opcsearch.phpto log searches - Deploy to production and gather data for a few days
Short-term (after data collection)
- Review
vw_controls_query_templates— analyze distinct query patterns - Clean up schema export (
docs/historian-readme.md) — remove noise, add sample rows - Seed tag index — embed
id_namesinto vector DB for fuzzy tag resolution - Export few-shot pairs — write script to pull user_query ↔ sql_template pairs from log
- Instrument other search forms (trends, boiler averages) if needed
Medium-term (build the agent)
- Build orchestrator endpoint (
nl_query.phpor Python sidecar) - Integrate schema context + few-shot retrieval + tag index
- Add SQL validation step (syntax check before execution)
- Create chat UI — text input alongside existing forms
- Implement safety guardrails (read-only connection, row limits, allow-list)
Long-term (iterate & improve)
- Review agent failures and add corrective few-shot pairs
- Add result summarization (natural-language answers, not just tables)
- Consider chart generation from NL queries
- Expand to other databases if needed
6. File reference
| File | Purpose |
|---|---|
sql/create_query_log.sql |
Creates dbo.controls_query_log table and vw_controls_query_templates view (run on lasucaai DB). |
includes/log_query.php |
PHP helper to log searches; connects to 192.168.0.16/lasucaai. |
docs/historian-readme.md |
Existing historian schema documentation. |
docs/nl2sql.md |
This document. |
7. Database connections
| Purpose | Server | Database | User |
|---|---|---|---|
| Historian (read) | 192.168.0.13\SQLEXPRESS | history | opce |
| AI/Logging (write) | 192.168.0.16 | lasucaai | lasucaai |
Last updated: December 16, 2025