158 lines
7.9 KiB
Markdown
158 lines
7.9 KiB
Markdown
# NL2SQL — Natural Language Search for Historian Data
|
|
|
|
This document captures the ongoing work to add conversational, open-ended search capabilities to the LASUCA controls dashboard. The goal is to let operators ask questions in plain English (e.g., *"Show me Boiler 3 steam for the last 24 hours"*) and receive historian data without needing to know the underlying SQL.
|
|
|
|
---
|
|
|
|
## 1. Background
|
|
|
|
### Current state
|
|
| Layer | Description |
|
|
|-------|-------------|
|
|
| **Data store** | SQL Server historian on `192.168.0.13\SQLEXPRESS`, database `history`. Core tables: `dbo.archive` (timestamped values), `dbo.id_names` (tag metadata). |
|
|
| **Search UI** | PHP forms (`opcsearch.php`, trend pages) with pre-built queries driven by dropdowns (tag picker, date range). |
|
|
| **RAG infrastructure** | Vector DB already in use for memory/context; SQL agents consume context files to help generate queries. |
|
|
|
|
### Objective
|
|
Replace or augment the rigid form-based search with a conversational interface that:
|
|
1. Accepts natural-language questions.
|
|
2. Translates them to valid SQL via an LLM agent.
|
|
3. Executes the query safely (read-only, row-limited).
|
|
4. Returns results in a friendly format (table, chart link, plain-English summary).
|
|
|
|
---
|
|
|
|
## 2. Architecture overview
|
|
|
|
```
|
|
┌──────────────┐ ┌───────────────────┐ ┌─────────────────┐
|
|
│ Chat UI │─────▶│ nl_query.php │─────▶│ SQL Agent │
|
|
│ (browser) │ │ (orchestrator) │ │ (Python/LLM) │
|
|
└──────────────┘ └───────────────────┘ └────────┬────────┘
|
|
│
|
|
┌──────────────────────────────────────────────────┘
|
|
▼
|
|
┌────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ Schema context │ │ Few-shot store │ │ Tag index │
|
|
│ (schema.md) │ │ (vector DB) │ │ (id_names embed)│
|
|
└────────────────┘ └─────────────────┘ └─────────────────┘
|
|
│ │ │
|
|
└──────────────────────┴─────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ SQL Server │
|
|
│ (history DB) │
|
|
└─────────────────┘
|
|
```
|
|
|
|
### Key components
|
|
|
|
| Component | Purpose | Status |
|
|
|-----------|---------|--------|
|
|
| **Schema context** | Compact description of tables, columns, types, relationships, and sample rows. Gives the LLM structural knowledge. | Export created; cleanup in progress. |
|
|
| **Few-shot store** | Curated pairs of `(user_query, sql_template)` embedded in vector DB. Retriever surfaces the most relevant examples at query time. | Seeding strategy defined; logging table created. |
|
|
| **Tag index** | `id_names` rows embedded so fuzzy tag references resolve to exact names. | Planned. |
|
|
| **Query log** | Captures form-based searches to bootstrap few-shot examples and track usage patterns. | Table schema created (`dbo.controls_query_log`). |
|
|
| **Orchestrator endpoint** | PHP (or Python) service that accepts NL prompt, calls agent, validates SQL, executes, returns results. | Not started. |
|
|
| **Chat UI** | Lightweight front-end alongside existing forms. | Not started. |
|
|
|
|
---
|
|
|
|
## 3. Query logging
|
|
|
|
To seed few-shot examples from real usage, we created a logging table in SQL Server.
|
|
|
|
**Location:** `sql/create_query_log.sql`
|
|
|
|
### Table: `dbo.controls_query_log`
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| `id` | INT (identity) | Primary key. |
|
|
| `user_query` | NVARCHAR(1000) | Natural-language description derived from form inputs. |
|
|
| `sql_template` | NVARCHAR(MAX) | Parameterized SQL with placeholders (`:tag`, `:start`, `:end`). |
|
|
| `params` | NVARCHAR(MAX) | JSON blob of actual parameter values. |
|
|
| `source_form` | NVARCHAR(100) | Originating form (e.g., `opcsearch`, `trend`). |
|
|
| `user_session` | NVARCHAR(100) | Optional session/user ID. |
|
|
| `execution_ms` | INT | Query runtime in milliseconds. |
|
|
| `row_count` | INT | Rows returned. |
|
|
| `created_at` | DATETIME2 | UTC timestamp (defaults to `SYSUTCDATETIME()`). |
|
|
|
|
### Helper view: `dbo.vw_controls_query_templates`
|
|
|
|
Aggregates distinct `sql_template` values with usage counts and a sample `user_query` for each—useful when exporting few-shot pairs.
|
|
|
|
### Deduplication strategy
|
|
|
|
Form queries are repetitive (same SQL template, different parameters). When seeding the vector DB:
|
|
1. Group by `sql_template`.
|
|
2. Pick one representative `user_query` per template (or synthesize a canonical description).
|
|
3. Embed the `user_query`; attach `sql_template` as metadata.
|
|
|
|
---
|
|
|
|
## 4. Safety guardrails
|
|
|
|
| Guardrail | Implementation |
|
|
|-----------|----------------|
|
|
| **Read-only connection** | Agent uses a login with `db_datareader` only; no write permissions. |
|
|
| **Row limit injection** | Generated SQL wrapped with `SELECT TOP 1000 ...` or validated to include a cap. |
|
|
| **Table/column allow-list** | Agent context explicitly lists permitted objects; LLM instructed to reject queries outside scope. |
|
|
| **Syntax validation** | Run `SET PARSEONLY ON` or use `sqlparse` before execution. |
|
|
| **Audit log** | Every generated query stored in `query_log` with execution stats. |
|
|
|
|
---
|
|
|
|
## 5. Next steps
|
|
|
|
### Immediate (in progress)
|
|
- [x] Create `controls_query_log` table on `lasucaai` database (192.168.0.16)
|
|
- [x] Build logging helper (`includes/log_query.php`)
|
|
- [x] Instrument `opcsearch.php` to log searches
|
|
- [ ] Deploy to production and gather data for a few days
|
|
|
|
### Short-term (after data collection)
|
|
- [ ] Review `vw_controls_query_templates` — analyze distinct query patterns
|
|
- [ ] Clean up schema export (`docs/historian-readme.md`) — remove noise, add sample rows
|
|
- [ ] Seed tag index — embed `id_names` into vector DB for fuzzy tag resolution
|
|
- [ ] Export few-shot pairs — write script to pull user_query ↔ sql_template pairs from log
|
|
- [ ] Instrument other search forms (trends, boiler averages) if needed
|
|
|
|
### Medium-term (build the agent)
|
|
- [ ] Build orchestrator endpoint (`nl_query.php` or Python sidecar)
|
|
- [ ] Integrate schema context + few-shot retrieval + tag index
|
|
- [ ] Add SQL validation step (syntax check before execution)
|
|
- [ ] Create chat UI — text input alongside existing forms
|
|
- [ ] Implement safety guardrails (read-only connection, row limits, allow-list)
|
|
|
|
### Long-term (iterate & improve)
|
|
- [ ] Review agent failures and add corrective few-shot pairs
|
|
- [ ] Add result summarization (natural-language answers, not just tables)
|
|
- [ ] Consider chart generation from NL queries
|
|
- [ ] Expand to other databases if needed
|
|
|
|
---
|
|
|
|
## 6. File reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `sql/create_query_log.sql` | Creates `dbo.controls_query_log` table and `vw_controls_query_templates` view (run on `lasucaai` DB). |
|
|
| `includes/log_query.php` | PHP helper to log searches; connects to 192.168.0.16/lasucaai. |
|
|
| `docs/historian-readme.md` | Existing historian schema documentation. |
|
|
| `docs/nl2sql.md` | This document. |
|
|
|
|
---
|
|
|
|
## 7. Database connections
|
|
|
|
| Purpose | Server | Database | User |
|
|
|---------|--------|----------|------|
|
|
| Historian (read) | 192.168.0.13\SQLEXPRESS | history | opce |
|
|
| AI/Logging (write) | 192.168.0.16 | lasucaai | lasucaai |
|
|
|
|
---
|
|
|
|
*Last updated: December 16, 2025*
|