3.8 KiB
3.8 KiB
Copilot Instructions
Repo Orientation
- Two main toolkits live here: SQL agents in
db_agent/andcane_agent/, plus an anomaly detection pipeline inanomaly_detection/. cane_agent/mirrorsdb_agent/; when altering shared logic, update both (or refactor to a shared module) to keep behavior aligned.- Context artifacts (
context/), prompts (prompting/), CLIs, and UI layers sit beside each agent for easy packaging.
SQL Agent (db_agent/, cane_agent/)
prompting/prompt_builder.pyloadscontext/schema.json,glossary.md,value_hints.yaml, andexamples.jsonto build the single LLM prompt; passtable_hintsto shrink the schema section.client.SqlAgentClientwraps the TGI endpoint (LlmConfig.base_url), retries once with a strict formatting reminder, strips unrequested timeframe filters, and normalizes syntax (e.g., removesILIKE,NULLS LAST).sql_executor.pyenforces read-only T-SQL: it rejects multi-statements, dangerous verbs, and non-SQL Server constructs, then injectsTOP (max_rows)before execution.- DB credentials come from env vars (
DB_SERVER,DB_DATABASE,DB_USERNAME,DB_PASSWORD) defined viaextractor/settings.DbSettings; missing vars raise immediately. - Logs append to
logs/query_log.jsonlthroughlog_utils.log_interaction, capturing sanitized SQL, warnings, and optional feedback tags.
SQL Agent Workflows
- Install deps in a venv:
python -m venv .venv; .\.venv\Scripts\Activate.ps1; pip install -r db_agent/requirements.txt(repeat foranomaly_detectionif needed). - Dot-source
db_agent/scripts/Set-DbEnv.ps1to exportDB_SERVER,DB_DATABASE,DB_USERNAME,DB_PASSWORD, and related flags in one step; the script prompts for the password if you omit-Password. - Refresh schema context with
python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json; adjust module name forcane_agent. - Run ad hoc questions via
python -m db_agent.run_agent "How many loads closed yesterday?" --tables dbo.SugarLoadData --execute --max-rows 200. - Serve the UI with
uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000; the backend lazily instantiatesdefault_client()and reuses the same logging path. - TGI must be up (see
deployment/docker-compose.yml); model responses are expected at<base_url>/generatereturninggenerated_textJSON. - Set
UI_BASIC_USER/UI_BASIC_PASSWORDto force HTTP Basic auth on the UI/API; leave unset for local testing./healthstays unauthenticated.
Anomaly Detection (anomaly_detection/)
data_loader.load_timeseriesstandardizes CSV ingestion, sorts by timestamp, retains ID columns, and stashes inferred features indf.attrs["feature_columns"]for downstream reuse.feature_engineering.build_feature_matrixis the single gateway for rolling statistics and rate-of-change features; supplyrolling_windowsin pandas offset notation (5T,15T,60T).train.pybuilds aStandardScaler+IsolationForestpipeline, persists artifacts underml/anomaly_detection/models/, and writes training scores toml/anomaly_detection/outputs/.detect.pyreloads the artifact, rehydrates feature engineering flags, and enforces feature parity before scoring;--keep-features,--alert-threshold, and--top-ncontrol outputs and diagnostics.
Conventions & Tips
- PowerShell examples use caret continuations; mirror that style when documenting multi-line commands.
- Prefer pandas-native operations and avoid mutating inputs in place—helpers copy before augmentation.
- When extending SQL safety rules or prompt shaping, add the change in both agent copies and keep JSON-only responses from the model (
{"sql": ..., "summary": ...}). - No automated tests ship with the repo; verify changes by running the CLIs/UI with representative CSVs and MSSQL connections, and check the JSONL logs for regressions.