Files
controls-web/ai_agents/db_agent/knowledge_prep.md
2026-02-17 09:29:34 -06:00

2.9 KiB

Database Agent Knowledge Prep

Use this checklist when preparing context for the natural-language SQL agent targeting our MSSQL sandbox.

1. Schema Catalog

  • Summarize each table: name, purpose, primary keys, row granularity.
  • List columns with data types and short descriptions; call out nullable fields.
  • Document relationships: foreign-key paths, cardinality, and join direction.
  • Recommended format (YAML example):
    tables:
      production_events:
        description: "Aggregated production metrics per line and hour."
        columns:
          - name: event_id
            type: INT IDENTITY PRIMARY KEY
            description: "Unique row ID."
            nullable: false
          - name: line_id
            type: INT
            description: "Foreign key to lines.id."
            nullable: false
        relationships:
          - target: lines.id
            via: production_events.line_id
            cardinality: many-to-one
    

2. Business Semantics

  • Provide glossary entries translating operator language to schema terms (e.g., "downtime" = status_code IN ('STOP','IDLE')).
  • Explain derived metrics (OEE, throughput, scrap rate) and point to source columns.
  • Note standard filters such as default time windows or equipment subsets.

3. Constraints & Guardrails

  • Specify the read-only SQL credential and allowed schemas.
  • List restricted tables, PII fields, or aggregates-only policies.
  • Define execution limits: use TOP N defaults, row caps, timeout expectations.
  • Capture timezone rules (UTC vs. local), especially for reporting dates.

4. Column Value Hints

  • Enumerate controlled vocabularies (status codes, shift codes, unit names).
  • Record measurement units and typical ranges to guide threshold suggestions.
  • Mention any sentinel values representing missing or error states.

5. Worked Examples

  • Include natural-language question, approved SQL, and a quick rationale.
  • Aim for 5-10 examples covering joins, filters, time windows, aggregations.
  • Store as Markdown table or JSON array for programmatic retrieval:
    [
      {
        "question": "Show top 10 downtime events last week for line A.",
        "sql": "SELECT TOP (10) event_id, start_time, duration_min FROM downtime_events WHERE line_id = 'A' AND start_time >= DATEADD(day, -7, SYSUTCDATETIME()) ORDER BY duration_min DESC;",
        "notes": "Uses UTC timestamps, filters to line A, orders by duration."
      }
    ]
    

6. Delivery Format

  • Consolidate the above into a single Markdown or JSON document checked into db_agent/context/.
  • Keep the file under 20 KB so it can be injected directly into prompts; otherwise plan a retrieval step.
  • Version changes alongside schema migrations so the agent stays accurate.

7. Maintenance Tips

  • Regenerate the schema summary whenever the database structure changes (use INFORMATION_SCHEMA).
  • Review logs of model-generated SQL to discover recurring gaps in the context.
  • Expand the glossary with real user questions and clarifications over time.