Files
controls-web/docs/nl2sql.md
2026-02-17 12:44:37 -06:00

7.9 KiB

NL2SQL — Natural Language Search for Historian Data

This document captures the ongoing work to add conversational, open-ended search capabilities to the LASUCA controls dashboard. The goal is to let operators ask questions in plain English (e.g., "Show me Boiler 3 steam for the last 24 hours") and receive historian data without needing to know the underlying SQL.


1. Background

Current state

Layer Description
Data store SQL Server historian on 192.168.0.13\SQLEXPRESS, database history. Core tables: dbo.archive (timestamped values), dbo.id_names (tag metadata).
Search UI PHP forms (opcsearch.php, trend pages) with pre-built queries driven by dropdowns (tag picker, date range).
RAG infrastructure Vector DB already in use for memory/context; SQL agents consume context files to help generate queries.

Objective

Replace or augment the rigid form-based search with a conversational interface that:

  1. Accepts natural-language questions.
  2. Translates them to valid SQL via an LLM agent.
  3. Executes the query safely (read-only, row-limited).
  4. Returns results in a friendly format (table, chart link, plain-English summary).

2. Architecture overview

┌──────────────┐      ┌───────────────────┐      ┌─────────────────┐
│  Chat UI     │─────▶│  nl_query.php     │─────▶│  SQL Agent      │
│  (browser)   │      │  (orchestrator)   │      │  (Python/LLM)   │
└──────────────┘      └───────────────────┘      └────────┬────────┘
                                                          │
       ┌──────────────────────────────────────────────────┘
       ▼
┌────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│ Schema context │   │ Few-shot store  │   │ Tag index       │
│ (schema.md)    │   │ (vector DB)     │   │ (id_names embed)│
└────────────────┘   └─────────────────┘   └─────────────────┘
       │                      │                     │
       └──────────────────────┴─────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │ SQL Server      │
                    │ (history DB)    │
                    └─────────────────┘

Key components

Component Purpose Status
Schema context Compact description of tables, columns, types, relationships, and sample rows. Gives the LLM structural knowledge. Export created; cleanup in progress.
Few-shot store Curated pairs of (user_query, sql_template) embedded in vector DB. Retriever surfaces the most relevant examples at query time. Seeding strategy defined; logging table created.
Tag index id_names rows embedded so fuzzy tag references resolve to exact names. Planned.
Query log Captures form-based searches to bootstrap few-shot examples and track usage patterns. Table schema created (dbo.controls_query_log).
Orchestrator endpoint PHP (or Python) service that accepts NL prompt, calls agent, validates SQL, executes, returns results. Not started.
Chat UI Lightweight front-end alongside existing forms. Not started.

3. Query logging

To seed few-shot examples from real usage, we created a logging table in SQL Server.

Location: sql/create_query_log.sql

Table: dbo.controls_query_log

Column Type Description
id INT (identity) Primary key.
user_query NVARCHAR(1000) Natural-language description derived from form inputs.
sql_template NVARCHAR(MAX) Parameterized SQL with placeholders (:tag, :start, :end).
params NVARCHAR(MAX) JSON blob of actual parameter values.
source_form NVARCHAR(100) Originating form (e.g., opcsearch, trend).
user_session NVARCHAR(100) Optional session/user ID.
execution_ms INT Query runtime in milliseconds.
row_count INT Rows returned.
created_at DATETIME2 UTC timestamp (defaults to SYSUTCDATETIME()).

Helper view: dbo.vw_controls_query_templates

Aggregates distinct sql_template values with usage counts and a sample user_query for each—useful when exporting few-shot pairs.

Deduplication strategy

Form queries are repetitive (same SQL template, different parameters). When seeding the vector DB:

  1. Group by sql_template.
  2. Pick one representative user_query per template (or synthesize a canonical description).
  3. Embed the user_query; attach sql_template as metadata.

4. Safety guardrails

Guardrail Implementation
Read-only connection Agent uses a login with db_datareader only; no write permissions.
Row limit injection Generated SQL wrapped with SELECT TOP 1000 ... or validated to include a cap.
Table/column allow-list Agent context explicitly lists permitted objects; LLM instructed to reject queries outside scope.
Syntax validation Run SET PARSEONLY ON or use sqlparse before execution.
Audit log Every generated query stored in query_log with execution stats.

5. Next steps

Immediate (in progress)

  • Create controls_query_log table on lasucaai database (192.168.0.16)
  • Build logging helper (includes/log_query.php)
  • Instrument opcsearch.php to log searches
  • Deploy to production and gather data for a few days

Short-term (after data collection)

  • Review vw_controls_query_templates — analyze distinct query patterns
  • Clean up schema export (docs/historian-readme.md) — remove noise, add sample rows
  • Seed tag index — embed id_names into vector DB for fuzzy tag resolution
  • Export few-shot pairs — write script to pull user_query ↔ sql_template pairs from log
  • Instrument other search forms (trends, boiler averages) if needed

Medium-term (build the agent)

  • Build orchestrator endpoint (nl_query.php or Python sidecar)
  • Integrate schema context + few-shot retrieval + tag index
  • Add SQL validation step (syntax check before execution)
  • Create chat UI — text input alongside existing forms
  • Implement safety guardrails (read-only connection, row limits, allow-list)

Long-term (iterate & improve)

  • Review agent failures and add corrective few-shot pairs
  • Add result summarization (natural-language answers, not just tables)
  • Consider chart generation from NL queries
  • Expand to other databases if needed

6. File reference

File Purpose
sql/create_query_log.sql Creates dbo.controls_query_log table and vw_controls_query_templates view (run on lasucaai DB).
includes/log_query.php PHP helper to log searches; connects to 192.168.0.16/lasucaai.
docs/historian-readme.md Existing historian schema documentation.
docs/nl2sql.md This document.

7. Database connections

Purpose Server Database User
Historian (read) 192.168.0.13\SQLEXPRESS history opce
AI/Logging (write) 192.168.0.16 lasucaai lasucaai

Last updated: December 16, 2025