# Database Agent Knowledge Prep Use this checklist when preparing context for the natural-language SQL agent targeting our MSSQL sandbox. ## 1. Schema Catalog - Summarize each table: name, purpose, primary keys, row granularity. - List columns with data types and short descriptions; call out nullable fields. - Document relationships: foreign-key paths, cardinality, and join direction. - Recommended format (YAML example): ```yaml tables: production_events: description: "Aggregated production metrics per line and hour." columns: - name: event_id type: INT IDENTITY PRIMARY KEY description: "Unique row ID." nullable: false - name: line_id type: INT description: "Foreign key to lines.id." nullable: false relationships: - target: lines.id via: production_events.line_id cardinality: many-to-one ``` ## 2. Business Semantics - Provide glossary entries translating operator language to schema terms (e.g., "downtime" = `status_code` IN ('STOP','IDLE')). - Explain derived metrics (OEE, throughput, scrap rate) and point to source columns. - Note standard filters such as default time windows or equipment subsets. ## 3. Constraints & Guardrails - Specify the read-only SQL credential and allowed schemas. - List restricted tables, PII fields, or aggregates-only policies. - Define execution limits: use `TOP` N defaults, row caps, timeout expectations. - Capture timezone rules (UTC vs. local), especially for reporting dates. ## 4. Column Value Hints - Enumerate controlled vocabularies (status codes, shift codes, unit names). - Record measurement units and typical ranges to guide threshold suggestions. - Mention any sentinel values representing missing or error states. ## 5. Worked Examples - Include natural-language question, approved SQL, and a quick rationale. - Aim for 5-10 examples covering joins, filters, time windows, aggregations. - Store as Markdown table or JSON array for programmatic retrieval: ```json [ { "question": "Show top 10 downtime events last week for line A.", "sql": "SELECT TOP (10) event_id, start_time, duration_min FROM downtime_events WHERE line_id = 'A' AND start_time >= DATEADD(day, -7, SYSUTCDATETIME()) ORDER BY duration_min DESC;", "notes": "Uses UTC timestamps, filters to line A, orders by duration." } ] ``` ## 6. Delivery Format - Consolidate the above into a single Markdown or JSON document checked into `db_agent/context/`. - Keep the file under 20 KB so it can be injected directly into prompts; otherwise plan a retrieval step. - Version changes alongside schema migrations so the agent stays accurate. ## 7. Maintenance Tips - Regenerate the schema summary whenever the database structure changes (use `INFORMATION_SCHEMA`). - Review logs of model-generated SQL to discover recurring gaps in the context. - Expand the glossary with real user questions and clarifications over time.