controls-web/ai_agents/.github/copilot-instructions.md

# Copilot Instructions

## Repo Orientation
- Two main toolkits live here: SQL agents in `db_agent/` and `cane_agent/`, plus an anomaly detection pipeline in `anomaly_detection/`.
- `cane_agent/` mirrors `db_agent/`; when altering shared logic, update both (or refactor to a shared module) to keep behavior aligned.
- Context artifacts (`context/`), prompts (`prompting/`), CLIs, and UI layers sit beside each agent for easy packaging.

## SQL Agent (db_agent/, cane_agent/)
- `prompting/prompt_builder.py` loads `context/schema.json`, `glossary.md`, `value_hints.yaml`, and `examples.json` to build the single LLM prompt; pass `table_hints` to shrink the schema section.
- `client.SqlAgentClient` wraps the TGI endpoint (`LlmConfig.base_url`), retries once with a strict formatting reminder, strips unrequested timeframe filters, and normalizes syntax (e.g., removes `ILIKE`, `NULLS LAST`).
- `sql_executor.py` enforces read-only T-SQL: it rejects multi-statements, dangerous verbs, and non-SQL Server constructs, then injects `TOP (max_rows)` before execution.
- DB credentials come from env vars (`DB_SERVER`, `DB_DATABASE`, `DB_USERNAME`, `DB_PASSWORD`) defined via `extractor/settings.DbSettings`; missing vars raise immediately.
- Logs append to `logs/query_log.jsonl` through `log_utils.log_interaction`, capturing sanitized SQL, warnings, and optional feedback tags.

## SQL Agent Workflows
- Install deps in a venv: `python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install -r db_agent/requirements.txt` (repeat for `anomaly_detection` if needed).
- Dot-source `db_agent/scripts/Set-DbEnv.ps1` to export `DB_SERVER`, `DB_DATABASE`, `DB_USERNAME`, `DB_PASSWORD`, and related flags in one step; the script prompts for the password if you omit `-Password`.
- Refresh schema context with `python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json`; adjust module name for `cane_agent`.
- Run ad hoc questions via `python -m db_agent.run_agent "How many loads closed yesterday?" --tables dbo.SugarLoadData --execute --max-rows 200`.
- Serve the UI with `uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000`; the backend lazily instantiates `default_client()` and reuses the same logging path.
- TGI must be up (see `deployment/docker-compose.yml`); model responses are expected at `<base_url>/generate` returning `generated_text` JSON.
- Set `UI_BASIC_USER`/`UI_BASIC_PASSWORD` to force HTTP Basic auth on the UI/API; leave unset for local testing. `/health` stays unauthenticated.

## Anomaly Detection (anomaly_detection/)
- `data_loader.load_timeseries` standardizes CSV ingestion, sorts by timestamp, retains ID columns, and stashes inferred features in `df.attrs["feature_columns"]` for downstream reuse.
- `feature_engineering.build_feature_matrix` is the single gateway for rolling statistics and rate-of-change features; supply `rolling_windows` in pandas offset notation (`5T`, `15T`, `60T`).
- `train.py` builds a `StandardScaler` + `IsolationForest` pipeline, persists artifacts under `ml/anomaly_detection/models/`, and writes training scores to `ml/anomaly_detection/outputs/`.
- `detect.py` reloads the artifact, rehydrates feature engineering flags, and enforces feature parity before scoring; `--keep-features`, `--alert-threshold`, and `--top-n` control outputs and diagnostics.

## Conventions & Tips
- PowerShell examples use caret continuations; mirror that style when documenting multi-line commands.
- Prefer pandas-native operations and avoid mutating inputs in place—helpers copy before augmentation.
- When extending SQL safety rules or prompt shaping, add the change in both agent copies and keep JSON-only responses from the model (`{"sql": ..., "summary": ...}`).
- No automated tests ship with the repo; verify changes by running the CLIs/UI with representative CSVs and MSSQL connections, and check the JSONL logs for regressions.