add all files

This commit is contained in:
Rucus
2026-02-17 09:29:34 -06:00
parent b8c8d67c67
commit 782d203799
21925 changed files with 2433086 additions and 0 deletions

View File

@@ -0,0 +1,157 @@
# NL2SQL — Natural Language Search for Historian Data
This document captures the ongoing work to add conversational, open-ended search capabilities to the LASUCA controls dashboard. The goal is to let operators ask questions in plain English (e.g., *"Show me Boiler 3 steam for the last 24 hours"*) and receive historian data without needing to know the underlying SQL.
---
## 1. Background
### Current state
| Layer | Description |
|-------|-------------|
| **Data store** | SQL Server historian on `192.168.0.13\SQLEXPRESS`, database `history`. Core tables: `dbo.archive` (timestamped values), `dbo.id_names` (tag metadata). |
| **Search UI** | PHP forms (`opcsearch.php`, trend pages) with pre-built queries driven by dropdowns (tag picker, date range). |
| **RAG infrastructure** | Vector DB already in use for memory/context; SQL agents consume context files to help generate queries. |
### Objective
Replace or augment the rigid form-based search with a conversational interface that:
1. Accepts natural-language questions.
2. Translates them to valid SQL via an LLM agent.
3. Executes the query safely (read-only, row-limited).
4. Returns results in a friendly format (table, chart link, plain-English summary).
---
## 2. Architecture overview
```
┌──────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ Chat UI │─────▶│ nl_query.php │─────▶│ SQL Agent │
│ (browser) │ │ (orchestrator) │ │ (Python/LLM) │
└──────────────┘ └───────────────────┘ └────────┬────────┘
┌──────────────────────────────────────────────────┘
┌────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Schema context │ │ Few-shot store │ │ Tag index │
│ (schema.md) │ │ (vector DB) │ │ (id_names embed)│
└────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└──────────────────────┴─────────────────────┘
┌─────────────────┐
│ SQL Server │
│ (history DB) │
└─────────────────┘
```
### Key components
| Component | Purpose | Status |
|-----------|---------|--------|
| **Schema context** | Compact description of tables, columns, types, relationships, and sample rows. Gives the LLM structural knowledge. | Export created; cleanup in progress. |
| **Few-shot store** | Curated pairs of `(user_query, sql_template)` embedded in vector DB. Retriever surfaces the most relevant examples at query time. | Seeding strategy defined; logging table created. |
| **Tag index** | `id_names` rows embedded so fuzzy tag references resolve to exact names. | Planned. |
| **Query log** | Captures form-based searches to bootstrap few-shot examples and track usage patterns. | Table schema created (`dbo.controls_query_log`). |
| **Orchestrator endpoint** | PHP (or Python) service that accepts NL prompt, calls agent, validates SQL, executes, returns results. | Not started. |
| **Chat UI** | Lightweight front-end alongside existing forms. | Not started. |
---
## 3. Query logging
To seed few-shot examples from real usage, we created a logging table in SQL Server.
**Location:** `sql/create_query_log.sql`
### Table: `dbo.controls_query_log`
| Column | Type | Description |
|--------|------|-------------|
| `id` | INT (identity) | Primary key. |
| `user_query` | NVARCHAR(1000) | Natural-language description derived from form inputs. |
| `sql_template` | NVARCHAR(MAX) | Parameterized SQL with placeholders (`:tag`, `:start`, `:end`). |
| `params` | NVARCHAR(MAX) | JSON blob of actual parameter values. |
| `source_form` | NVARCHAR(100) | Originating form (e.g., `opcsearch`, `trend`). |
| `user_session` | NVARCHAR(100) | Optional session/user ID. |
| `execution_ms` | INT | Query runtime in milliseconds. |
| `row_count` | INT | Rows returned. |
| `created_at` | DATETIME2 | UTC timestamp (defaults to `SYSUTCDATETIME()`). |
### Helper view: `dbo.vw_controls_query_templates`
Aggregates distinct `sql_template` values with usage counts and a sample `user_query` for each—useful when exporting few-shot pairs.
### Deduplication strategy
Form queries are repetitive (same SQL template, different parameters). When seeding the vector DB:
1. Group by `sql_template`.
2. Pick one representative `user_query` per template (or synthesize a canonical description).
3. Embed the `user_query`; attach `sql_template` as metadata.
---
## 4. Safety guardrails
| Guardrail | Implementation |
|-----------|----------------|
| **Read-only connection** | Agent uses a login with `db_datareader` only; no write permissions. |
| **Row limit injection** | Generated SQL wrapped with `SELECT TOP 1000 ...` or validated to include a cap. |
| **Table/column allow-list** | Agent context explicitly lists permitted objects; LLM instructed to reject queries outside scope. |
| **Syntax validation** | Run `SET PARSEONLY ON` or use `sqlparse` before execution. |
| **Audit log** | Every generated query stored in `query_log` with execution stats. |
---
## 5. Next steps
### Immediate (in progress)
- [x] Create `controls_query_log` table on `lasucaai` database (192.168.0.16)
- [x] Build logging helper (`includes/log_query.php`)
- [x] Instrument `opcsearch.php` to log searches
- [ ] Deploy to production and gather data for a few days
### Short-term (after data collection)
- [ ] Review `vw_controls_query_templates` — analyze distinct query patterns
- [ ] Clean up schema export (`docs/historian-readme.md`) — remove noise, add sample rows
- [ ] Seed tag index — embed `id_names` into vector DB for fuzzy tag resolution
- [ ] Export few-shot pairs — write script to pull user_query ↔ sql_template pairs from log
- [ ] Instrument other search forms (trends, boiler averages) if needed
### Medium-term (build the agent)
- [ ] Build orchestrator endpoint (`nl_query.php` or Python sidecar)
- [ ] Integrate schema context + few-shot retrieval + tag index
- [ ] Add SQL validation step (syntax check before execution)
- [ ] Create chat UI — text input alongside existing forms
- [ ] Implement safety guardrails (read-only connection, row limits, allow-list)
### Long-term (iterate & improve)
- [ ] Review agent failures and add corrective few-shot pairs
- [ ] Add result summarization (natural-language answers, not just tables)
- [ ] Consider chart generation from NL queries
- [ ] Expand to other databases if needed
---
## 6. File reference
| File | Purpose |
|------|---------|
| `sql/create_query_log.sql` | Creates `dbo.controls_query_log` table and `vw_controls_query_templates` view (run on `lasucaai` DB). |
| `includes/log_query.php` | PHP helper to log searches; connects to 192.168.0.16/lasucaai. |
| `docs/historian-readme.md` | Existing historian schema documentation. |
| `docs/nl2sql.md` | This document. |
---
## 7. Database connections
| Purpose | Server | Database | User |
|---------|--------|----------|------|
| Historian (read) | 192.168.0.13\SQLEXPRESS | history | opce |
| AI/Logging (write) | 192.168.0.16 | lasucaai | lasucaai |
---
*Last updated: December 16, 2025*