68 lines
2.9 KiB
Markdown
68 lines
2.9 KiB
Markdown
# Database Agent Knowledge Prep
|
|
|
|
Use this checklist when preparing context for the natural-language SQL agent targeting our MSSQL sandbox.
|
|
|
|
## 1. Schema Catalog
|
|
- Summarize each table: name, purpose, primary keys, row granularity.
|
|
- List columns with data types and short descriptions; call out nullable fields.
|
|
- Document relationships: foreign-key paths, cardinality, and join direction.
|
|
- Recommended format (YAML example):
|
|
```yaml
|
|
tables:
|
|
production_events:
|
|
description: "Aggregated production metrics per line and hour."
|
|
columns:
|
|
- name: event_id
|
|
type: INT IDENTITY PRIMARY KEY
|
|
description: "Unique row ID."
|
|
nullable: false
|
|
- name: line_id
|
|
type: INT
|
|
description: "Foreign key to lines.id."
|
|
nullable: false
|
|
relationships:
|
|
- target: lines.id
|
|
via: production_events.line_id
|
|
cardinality: many-to-one
|
|
```
|
|
|
|
## 2. Business Semantics
|
|
- Provide glossary entries translating operator language to schema terms (e.g., "downtime" = `status_code` IN ('STOP','IDLE')).
|
|
- Explain derived metrics (OEE, throughput, scrap rate) and point to source columns.
|
|
- Note standard filters such as default time windows or equipment subsets.
|
|
|
|
## 3. Constraints & Guardrails
|
|
- Specify the read-only SQL credential and allowed schemas.
|
|
- List restricted tables, PII fields, or aggregates-only policies.
|
|
- Define execution limits: use `TOP` N defaults, row caps, timeout expectations.
|
|
- Capture timezone rules (UTC vs. local), especially for reporting dates.
|
|
|
|
## 4. Column Value Hints
|
|
- Enumerate controlled vocabularies (status codes, shift codes, unit names).
|
|
- Record measurement units and typical ranges to guide threshold suggestions.
|
|
- Mention any sentinel values representing missing or error states.
|
|
|
|
## 5. Worked Examples
|
|
- Include natural-language question, approved SQL, and a quick rationale.
|
|
- Aim for 5-10 examples covering joins, filters, time windows, aggregations.
|
|
- Store as Markdown table or JSON array for programmatic retrieval:
|
|
```json
|
|
[
|
|
{
|
|
"question": "Show top 10 downtime events last week for line A.",
|
|
"sql": "SELECT TOP (10) event_id, start_time, duration_min FROM downtime_events WHERE line_id = 'A' AND start_time >= DATEADD(day, -7, SYSUTCDATETIME()) ORDER BY duration_min DESC;",
|
|
"notes": "Uses UTC timestamps, filters to line A, orders by duration."
|
|
}
|
|
]
|
|
```
|
|
|
|
## 6. Delivery Format
|
|
- Consolidate the above into a single Markdown or JSON document checked into `db_agent/context/`.
|
|
- Keep the file under 20 KB so it can be injected directly into prompts; otherwise plan a retrieval step.
|
|
- Version changes alongside schema migrations so the agent stays accurate.
|
|
|
|
## 7. Maintenance Tips
|
|
- Regenerate the schema summary whenever the database structure changes (use `INFORMATION_SCHEMA`).
|
|
- Review logs of model-generated SQL to discover recurring gaps in the context.
|
|
- Expand the glossary with real user questions and clarifications over time.
|