Copilot Instructions

Repo Orientation

Two main toolkits live here: SQL agents in db_agent/ and cane_agent/, plus an anomaly detection pipeline in anomaly_detection/.
cane_agent/ mirrors db_agent/; when altering shared logic, update both (or refactor to a shared module) to keep behavior aligned.
Context artifacts (context/), prompts (prompting/), CLIs, and UI layers sit beside each agent for easy packaging.

prompting/prompt_builder.py loads context/schema.json, glossary.md, value_hints.yaml, and examples.json to build the single LLM prompt; pass table_hints to shrink the schema section.
client.SqlAgentClient wraps the TGI endpoint (LlmConfig.base_url), retries once with a strict formatting reminder, strips unrequested timeframe filters, and normalizes syntax (e.g., removes ILIKE, NULLS LAST).
sql_executor.py enforces read-only T-SQL: it rejects multi-statements, dangerous verbs, and non-SQL Server constructs, then injects TOP (max_rows) before execution.
DB credentials come from env vars (DB_SERVER, DB_DATABASE, DB_USERNAME, DB_PASSWORD) defined via extractor/settings.DbSettings; missing vars raise immediately.
Logs append to logs/query_log.jsonl through log_utils.log_interaction, capturing sanitized SQL, warnings, and optional feedback tags.

Install deps in a venv: python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install -r db_agent/requirements.txt (repeat for anomaly_detection if needed).
Dot-source db_agent/scripts/Set-DbEnv.ps1 to export DB_SERVER, DB_DATABASE, DB_USERNAME, DB_PASSWORD, and related flags in one step; the script prompts for the password if you omit -Password.
Refresh schema context with python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json; adjust module name for cane_agent.
Run ad hoc questions via python -m db_agent.run_agent "How many loads closed yesterday?" --tables dbo.SugarLoadData --execute --max-rows 200.
Serve the UI with uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000; the backend lazily instantiates default_client() and reuses the same logging path.
TGI must be up (see deployment/docker-compose.yml); model responses are expected at <base_url>/generate returning generated_text JSON.
Set UI_BASIC_USER/UI_BASIC_PASSWORD to force HTTP Basic auth on the UI/API; leave unset for local testing. /health stays unauthenticated.

data_loader.load_timeseries standardizes CSV ingestion, sorts by timestamp, retains ID columns, and stashes inferred features in df.attrs["feature_columns"] for downstream reuse.
feature_engineering.build_feature_matrix is the single gateway for rolling statistics and rate-of-change features; supply rolling_windows in pandas offset notation (5T, 15T, 60T).
train.py builds a StandardScaler + IsolationForest pipeline, persists artifacts under ml/anomaly_detection/models/, and writes training scores to ml/anomaly_detection/outputs/.
detect.py reloads the artifact, rehydrates feature engineering flags, and enforces feature parity before scoring; --keep-features, --alert-threshold, and --top-n control outputs and diagnostics.

PowerShell examples use caret continuations; mirror that style when documenting multi-line commands.
Prefer pandas-native operations and avoid mutating inputs in place—helpers copy before augmentation.
When extending SQL safety rules or prompt shaping, add the change in both agent copies and keep JSON-only responses from the model ({"sql": ..., "summary": ...}).
No automated tests ship with the repo; verify changes by running the CLIs/UI with representative CSVs and MSSQL connections, and check the JSONL logs for regressions.