# NL2SQL — Natural Language Search for Historian Data This document captures the ongoing work to add conversational, open-ended search capabilities to the LASUCA controls dashboard. The goal is to let operators ask questions in plain English (e.g., *"Show me Boiler 3 steam for the last 24 hours"*) and receive historian data without needing to know the underlying SQL. --- ## 1. Background ### Current state | Layer | Description | |-------|-------------| | **Data store** | SQL Server historian on `192.168.0.13\SQLEXPRESS`, database `history`. Core tables: `dbo.archive` (timestamped values), `dbo.id_names` (tag metadata). | | **Search UI** | PHP forms (`opcsearch.php`, trend pages) with pre-built queries driven by dropdowns (tag picker, date range). | | **RAG infrastructure** | Vector DB already in use for memory/context; SQL agents consume context files to help generate queries. | ### Objective Replace or augment the rigid form-based search with a conversational interface that: 1. Accepts natural-language questions. 2. Translates them to valid SQL via an LLM agent. 3. Executes the query safely (read-only, row-limited). 4. Returns results in a friendly format (table, chart link, plain-English summary). --- ## 2. Architecture overview ``` ┌──────────────┐ ┌───────────────────┐ ┌─────────────────┐ │ Chat UI │─────▶│ nl_query.php │─────▶│ SQL Agent │ │ (browser) │ │ (orchestrator) │ │ (Python/LLM) │ └──────────────┘ └───────────────────┘ └────────┬────────┘ │ ┌──────────────────────────────────────────────────┘ ▼ ┌────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Schema context │ │ Few-shot store │ │ Tag index │ │ (schema.md) │ │ (vector DB) │ │ (id_names embed)│ └────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ └──────────────────────┴─────────────────────┘ │ ▼ ┌─────────────────┐ │ SQL Server │ │ (history DB) │ └─────────────────┘ ``` ### Key components | Component | Purpose | Status | |-----------|---------|--------| | **Schema context** | Compact description of tables, columns, types, relationships, and sample rows. Gives the LLM structural knowledge. | Export created; cleanup in progress. | | **Few-shot store** | Curated pairs of `(user_query, sql_template)` embedded in vector DB. Retriever surfaces the most relevant examples at query time. | Seeding strategy defined; logging table created. | | **Tag index** | `id_names` rows embedded so fuzzy tag references resolve to exact names. | Planned. | | **Query log** | Captures form-based searches to bootstrap few-shot examples and track usage patterns. | Table schema created (`dbo.controls_query_log`). | | **Orchestrator endpoint** | PHP (or Python) service that accepts NL prompt, calls agent, validates SQL, executes, returns results. | Not started. | | **Chat UI** | Lightweight front-end alongside existing forms. | Not started. | --- ## 3. Query logging To seed few-shot examples from real usage, we created a logging table in SQL Server. **Location:** `sql/create_query_log.sql` ### Table: `dbo.controls_query_log` | Column | Type | Description | |--------|------|-------------| | `id` | INT (identity) | Primary key. | | `user_query` | NVARCHAR(1000) | Natural-language description derived from form inputs. | | `sql_template` | NVARCHAR(MAX) | Parameterized SQL with placeholders (`:tag`, `:start`, `:end`). | | `params` | NVARCHAR(MAX) | JSON blob of actual parameter values. | | `source_form` | NVARCHAR(100) | Originating form (e.g., `opcsearch`, `trend`). | | `user_session` | NVARCHAR(100) | Optional session/user ID. | | `execution_ms` | INT | Query runtime in milliseconds. | | `row_count` | INT | Rows returned. | | `created_at` | DATETIME2 | UTC timestamp (defaults to `SYSUTCDATETIME()`). | ### Helper view: `dbo.vw_controls_query_templates` Aggregates distinct `sql_template` values with usage counts and a sample `user_query` for each—useful when exporting few-shot pairs. ### Deduplication strategy Form queries are repetitive (same SQL template, different parameters). When seeding the vector DB: 1. Group by `sql_template`. 2. Pick one representative `user_query` per template (or synthesize a canonical description). 3. Embed the `user_query`; attach `sql_template` as metadata. --- ## 4. Safety guardrails | Guardrail | Implementation | |-----------|----------------| | **Read-only connection** | Agent uses a login with `db_datareader` only; no write permissions. | | **Row limit injection** | Generated SQL wrapped with `SELECT TOP 1000 ...` or validated to include a cap. | | **Table/column allow-list** | Agent context explicitly lists permitted objects; LLM instructed to reject queries outside scope. | | **Syntax validation** | Run `SET PARSEONLY ON` or use `sqlparse` before execution. | | **Audit log** | Every generated query stored in `query_log` with execution stats. | --- ## 5. Next steps ### Immediate (in progress) - [x] Create `controls_query_log` table on `lasucaai` database (192.168.0.16) - [x] Build logging helper (`includes/log_query.php`) - [x] Instrument `opcsearch.php` to log searches - [ ] Deploy to production and gather data for a few days ### Short-term (after data collection) - [ ] Review `vw_controls_query_templates` — analyze distinct query patterns - [ ] Clean up schema export (`docs/historian-readme.md`) — remove noise, add sample rows - [ ] Seed tag index — embed `id_names` into vector DB for fuzzy tag resolution - [ ] Export few-shot pairs — write script to pull user_query ↔ sql_template pairs from log - [ ] Instrument other search forms (trends, boiler averages) if needed ### Medium-term (build the agent) - [ ] Build orchestrator endpoint (`nl_query.php` or Python sidecar) - [ ] Integrate schema context + few-shot retrieval + tag index - [ ] Add SQL validation step (syntax check before execution) - [ ] Create chat UI — text input alongside existing forms - [ ] Implement safety guardrails (read-only connection, row limits, allow-list) ### Long-term (iterate & improve) - [ ] Review agent failures and add corrective few-shot pairs - [ ] Add result summarization (natural-language answers, not just tables) - [ ] Consider chart generation from NL queries - [ ] Expand to other databases if needed --- ## 6. File reference | File | Purpose | |------|---------| | `sql/create_query_log.sql` | Creates `dbo.controls_query_log` table and `vw_controls_query_templates` view (run on `lasucaai` DB). | | `includes/log_query.php` | PHP helper to log searches; connects to 192.168.0.16/lasucaai. | | `docs/historian-readme.md` | Existing historian schema documentation. | | `docs/nl2sql.md` | This document. | --- ## 7. Database connections | Purpose | Server | Database | User | |---------|--------|----------|------| | Historian (read) | 192.168.0.13\SQLEXPRESS | history | opce | | AI/Logging (write) | 192.168.0.16 | lasucaai | lasucaai | --- *Last updated: December 16, 2025*