2.9 KiB
2.9 KiB
Database Agent Knowledge Prep
Use this checklist when preparing context for the natural-language SQL agent targeting our MSSQL sandbox.
1. Schema Catalog
- Summarize each table: name, purpose, primary keys, row granularity.
- List columns with data types and short descriptions; call out nullable fields.
- Document relationships: foreign-key paths, cardinality, and join direction.
- Recommended format (YAML example):
tables: production_events: description: "Aggregated production metrics per line and hour." columns: - name: event_id type: INT IDENTITY PRIMARY KEY description: "Unique row ID." nullable: false - name: line_id type: INT description: "Foreign key to lines.id." nullable: false relationships: - target: lines.id via: production_events.line_id cardinality: many-to-one
2. Business Semantics
- Provide glossary entries translating operator language to schema terms (e.g., "downtime" =
status_codeIN ('STOP','IDLE')). - Explain derived metrics (OEE, throughput, scrap rate) and point to source columns.
- Note standard filters such as default time windows or equipment subsets.
3. Constraints & Guardrails
- Specify the read-only SQL credential and allowed schemas.
- List restricted tables, PII fields, or aggregates-only policies.
- Define execution limits: use
TOPN defaults, row caps, timeout expectations. - Capture timezone rules (UTC vs. local), especially for reporting dates.
4. Column Value Hints
- Enumerate controlled vocabularies (status codes, shift codes, unit names).
- Record measurement units and typical ranges to guide threshold suggestions.
- Mention any sentinel values representing missing or error states.
5. Worked Examples
- Include natural-language question, approved SQL, and a quick rationale.
- Aim for 5-10 examples covering joins, filters, time windows, aggregations.
- Store as Markdown table or JSON array for programmatic retrieval:
[ { "question": "Show top 10 downtime events last week for line A.", "sql": "SELECT TOP (10) event_id, start_time, duration_min FROM downtime_events WHERE line_id = 'A' AND start_time >= DATEADD(day, -7, SYSUTCDATETIME()) ORDER BY duration_min DESC;", "notes": "Uses UTC timestamps, filters to line A, orders by duration." } ]
6. Delivery Format
- Consolidate the above into a single Markdown or JSON document checked into
db_agent/context/. - Keep the file under 20 KB so it can be injected directly into prompts; otherwise plan a retrieval step.
- Version changes alongside schema migrations so the agent stays accurate.
7. Maintenance Tips
- Regenerate the schema summary whenever the database structure changes (use
INFORMATION_SCHEMA). - Review logs of model-generated SQL to discover recurring gaps in the context.
- Expand the glossary with real user questions and clarifications over time.