# Database Agent Toolkit This folder hosts utilities and documentation for the natural-language SQL agent. ## Folders - `extractor/`: Python tooling to pull schema metadata from the MSSQL sandbox. - `prompting/`: Prompt builders that merge curated context into LLM requests. - `context/`: Generated artifacts (schema JSON, glossary, examples) injected into LLM prompts. - `deployment/`: Docker Compose for hosting open-source SQL models via TGI. - `sql_executor.py`: Validation + execution helpers for running generated SQL safely. - `log_utils.py`: JSONL logging utilities for question/SQL traces. - `ui/`: FastAPI backend and static frontend for a simple web interface. - `knowledge_prep.md`: Checklist for curating high-quality agent context. ## Python Setup 1. **Install dependencies** (create a venv if needed): ```powershell python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt pip install -r db_agent/requirements.txt ``` Ensure the Microsoft ODBC driver (17 or 18) is installed on the machine. 2. **Configure environment variables** (never commit credentials): ```powershell . .\db_agent\scripts\Set-DbEnv.ps1 -Server "SUGARSCALE\SQLEXPRESS" -Database "SugarScale_Lasuca" -Username "SugarAI" ``` Dot-source the helper so the variables stay in your session. It prompts for the password (or accept `-Password (Read-Host -AsSecureString)`) and sets the `DB_*` values expected by `DbSettings`. Pass `-Driver`, `-Encrypt`, or `-TrustCertificate` to override defaults. 3. **Run the schema extraction job**: ```powershell python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json ``` - `--schema`: restricts the crawl to a specific SQL schema (omit to capture all). - `--output`: where to write the structured schema document consumed by the agent. 4. **Review the output** under `db_agent/context/` and commit sanitized artifacts alongside code when appropriate. ## Running the SQL Agent 1. Ensure the TGI container in `deployment/docker-compose.yml` is running (default endpoint `http://192.168.0.30:8080`). 2. Execute a question against the model: ```powershell python -m db_agent.run_agent "How many loads were completed yesterday?" --tables dbo.SugarLoadData ``` The script prints JSON with `sql` and `summary` fields. 3. Use `--execute` to run the validated SQL against MSSQL (enforced `TOP` limit via `--max-rows`, default `500`): ```powershell python -m db_agent.run_agent "Top gross loads this week" --tables dbo.SugarLoadData --execute --max-rows 200 ``` Preview rows (up to 5) and the sanitized SQL are echoed to stdout. 4. Interactions are logged to `db_agent/logs/query_log.jsonl` by default; override with `--log-path` or disable logging via `--no-log`. 5. To point at a different inference endpoint, edit `LlmConfig.base_url` in `db_agent/client.py` (the client posts to `/generate`). ## Web UI 1. Install the web dependencies (already in `db_agent/requirements.txt`). 2. Start the FastAPI server: ```powershell uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000 ``` 3. Open `http://localhost:8000/` to access the UI. Submit questions, optionally run SQL, and view results/preview tables. 4. Environment variables for MSSQL access must be set in the shell before launching the server. ## Next Steps - Add additional extractor jobs (lookup value snapshots, glossary builders) following the same pattern as `schema_snapshot.py`. - Integrate the extractor into CI/CD or a scheduled task to keep the agent context fresh. - Use `knowledge_prep.md` as a guide when enriching the context files with human-curated insights. - Extend `db_agent/client.py` with SQL validation/execution layers and logging of user questions vs. generated queries. - Consider adding automated SQL result validation (e.g., schema assertions) before surfacing answers to end users.