Files
controls-web/ai_agents/db_agent/knowledge_prep.md
2026-02-17 09:29:34 -06:00

68 lines
2.9 KiB
Markdown

# Database Agent Knowledge Prep
Use this checklist when preparing context for the natural-language SQL agent targeting our MSSQL sandbox.
## 1. Schema Catalog
- Summarize each table: name, purpose, primary keys, row granularity.
- List columns with data types and short descriptions; call out nullable fields.
- Document relationships: foreign-key paths, cardinality, and join direction.
- Recommended format (YAML example):
```yaml
tables:
production_events:
description: "Aggregated production metrics per line and hour."
columns:
- name: event_id
type: INT IDENTITY PRIMARY KEY
description: "Unique row ID."
nullable: false
- name: line_id
type: INT
description: "Foreign key to lines.id."
nullable: false
relationships:
- target: lines.id
via: production_events.line_id
cardinality: many-to-one
```
## 2. Business Semantics
- Provide glossary entries translating operator language to schema terms (e.g., "downtime" = `status_code` IN ('STOP','IDLE')).
- Explain derived metrics (OEE, throughput, scrap rate) and point to source columns.
- Note standard filters such as default time windows or equipment subsets.
## 3. Constraints & Guardrails
- Specify the read-only SQL credential and allowed schemas.
- List restricted tables, PII fields, or aggregates-only policies.
- Define execution limits: use `TOP` N defaults, row caps, timeout expectations.
- Capture timezone rules (UTC vs. local), especially for reporting dates.
## 4. Column Value Hints
- Enumerate controlled vocabularies (status codes, shift codes, unit names).
- Record measurement units and typical ranges to guide threshold suggestions.
- Mention any sentinel values representing missing or error states.
## 5. Worked Examples
- Include natural-language question, approved SQL, and a quick rationale.
- Aim for 5-10 examples covering joins, filters, time windows, aggregations.
- Store as Markdown table or JSON array for programmatic retrieval:
```json
[
{
"question": "Show top 10 downtime events last week for line A.",
"sql": "SELECT TOP (10) event_id, start_time, duration_min FROM downtime_events WHERE line_id = 'A' AND start_time >= DATEADD(day, -7, SYSUTCDATETIME()) ORDER BY duration_min DESC;",
"notes": "Uses UTC timestamps, filters to line A, orders by duration."
}
]
```
## 6. Delivery Format
- Consolidate the above into a single Markdown or JSON document checked into `db_agent/context/`.
- Keep the file under 20 KB so it can be injected directly into prompts; otherwise plan a retrieval step.
- Version changes alongside schema migrations so the agent stays accurate.
## 7. Maintenance Tips
- Regenerate the schema summary whenever the database structure changes (use `INFORMATION_SCHEMA`).
- Review logs of model-generated SQL to discover recurring gaps in the context.
- Expand the glossary with real user questions and clarifications over time.