controls-web/ai_agents/cane_agent/knowledge_prep.md

# Database Agent Knowledge Prep

Use this checklist when preparing context for the natural-language SQL agent targeting our MSSQL sandbox.

## 1. Schema Catalog
- Summarize each table: name, purpose, primary keys, row granularity.
- List columns with data types and short descriptions; call out nullable fields.
- Document relationships: foreign-key paths, cardinality, and join direction.
- Recommended format (YAML example):
  ```yaml
  tables:
    production_events:
      description: "Aggregated production metrics per line and hour."
      columns:
        - name: event_id
          type: INT IDENTITY PRIMARY KEY
          description: "Unique row ID."
          nullable: false
        - name: line_id
          type: INT
          description: "Foreign key to lines.id."
          nullable: false
      relationships:
        - target: lines.id
          via: production_events.line_id
          cardinality: many-to-one
  ```

## 2. Business Semantics
- Provide glossary entries translating operator language to schema terms (e.g., "downtime" = `status_code` IN ('STOP','IDLE')).
- Explain derived metrics (OEE, throughput, scrap rate) and point to source columns.
- Note standard filters such as default time windows or equipment subsets.

## 3. Constraints & Guardrails
- Specify the read-only SQL credential and allowed schemas.
- List restricted tables, PII fields, or aggregates-only policies.
- Define execution limits: use `TOP` N defaults, row caps, timeout expectations.
- Capture timezone rules (UTC vs. local), especially for reporting dates.

## 4. Column Value Hints
- Enumerate controlled vocabularies (status codes, shift codes, unit names).
- Record measurement units and typical ranges to guide threshold suggestions.
- Mention any sentinel values representing missing or error states.

## 5. Worked Examples
- Include natural-language question, approved SQL, and a quick rationale.
- Aim for 5-10 examples covering joins, filters, time windows, aggregations.
- Store as Markdown table or JSON array for programmatic retrieval:
  ```json
  [
    {
      "question": "Show top 10 downtime events last week for line A.",
      "sql": "SELECT TOP (10) event_id, start_time, duration_min FROM downtime_events WHERE line_id = 'A' AND start_time >= DATEADD(day, -7, SYSUTCDATETIME()) ORDER BY duration_min DESC;",
      "notes": "Uses UTC timestamps, filters to line A, orders by duration."
    }
  ]
  ```

## 6. Delivery Format
- Consolidate the above into a single Markdown or JSON document checked into `db_agent/context/`.
- Keep the file under 20 KB so it can be injected directly into prompts; otherwise plan a retrieval step.
- Version changes alongside schema migrations so the agent stays accurate.

## 7. Maintenance Tips
- Regenerate the schema summary whenever the database structure changes (use `INFORMATION_SCHEMA`).
- Review logs of model-generated SQL to discover recurring gaps in the context.
- Expand the glossary with real user questions and clarifications over time.