Database Agent Knowledge Prep

Use this checklist when preparing context for the natural-language SQL agent targeting our MSSQL sandbox.

1. Schema Catalog

Summarize each table: name, purpose, primary keys, row granularity.
List columns with data types and short descriptions; call out nullable fields.
Document relationships: foreign-key paths, cardinality, and join direction.

Recommended format (YAML example):

tables:
  production_events:
    description: "Aggregated production metrics per line and hour."
    columns:
      - name: event_id
        type: INT IDENTITY PRIMARY KEY
        description: "Unique row ID."
        nullable: false
      - name: line_id
        type: INT
        description: "Foreign key to lines.id."
        nullable: false
    relationships:
      - target: lines.id
        via: production_events.line_id
        cardinality: many-to-one

2. Business Semantics

Provide glossary entries translating operator language to schema terms (e.g., "downtime" = status_code IN ('STOP','IDLE')).
Explain derived metrics (OEE, throughput, scrap rate) and point to source columns.
Note standard filters such as default time windows or equipment subsets.

3. Constraints & Guardrails

Specify the read-only SQL credential and allowed schemas.
List restricted tables, PII fields, or aggregates-only policies.
Define execution limits: use TOP N defaults, row caps, timeout expectations.
Capture timezone rules (UTC vs. local), especially for reporting dates.

4. Column Value Hints

Enumerate controlled vocabularies (status codes, shift codes, unit names).
Record measurement units and typical ranges to guide threshold suggestions.
Mention any sentinel values representing missing or error states.

5. Worked Examples

Include natural-language question, approved SQL, and a quick rationale.
Aim for 5-10 examples covering joins, filters, time windows, aggregations.

Store as Markdown table or JSON array for programmatic retrieval:

[
  {
    "question": "Show top 10 downtime events last week for line A.",
    "sql": "SELECT TOP (10) event_id, start_time, duration_min FROM downtime_events WHERE line_id = 'A' AND start_time >= DATEADD(day, -7, SYSUTCDATETIME()) ORDER BY duration_min DESC;",
    "notes": "Uses UTC timestamps, filters to line A, orders by duration."
  }
]

6. Delivery Format

Consolidate the above into a single Markdown or JSON document checked into db_agent/context/.
Keep the file under 20 KB so it can be injected directly into prompts; otherwise plan a retrieval step.
Version changes alongside schema migrations so the agent stays accurate.

7. Maintenance Tips

Regenerate the schema summary whenever the database structure changes (use INFORMATION_SCHEMA).
Review logs of model-generated SQL to discover recurring gaps in the context.
Expand the glossary with real user questions and clarifications over time.

2.9 KiB Raw Blame History