controls-web/ai_agents/db_agent/README.md

# Database Agent Toolkit

This folder hosts utilities and documentation for the natural-language SQL agent.

## Folders
- `extractor/`: Python tooling to pull schema metadata from the MSSQL sandbox.
- `prompting/`: Prompt builders that merge curated context into LLM requests.
- `context/`: Generated artifacts (schema JSON, glossary, examples) injected into LLM prompts.
- `deployment/`: Docker Compose for hosting open-source SQL models via TGI.
- `sql_executor.py`: Validation + execution helpers for running generated SQL safely.
- `log_utils.py`: JSONL logging utilities for question/SQL traces.
- `ui/`: FastAPI backend and static frontend for a simple web interface.
- `knowledge_prep.md`: Checklist for curating high-quality agent context.

## Python Setup
1. **Install dependencies** (create a venv if needed):
   ```powershell
   python -m venv .venv
   .\.venv\Scripts\Activate.ps1
   pip install -r requirements.txt
   pip install -r db_agent/requirements.txt
   ```
   Ensure the Microsoft ODBC driver (17 or 18) is installed on the machine.

2. **Configure environment variables** (never commit credentials):
   ```powershell
   # Dot-source so values remain in the current shell
   . .\db_agent\scripts\Set-DbEnv.ps1 -Server "SUGARSCALE\SQLEXPRESS" -Database "SugarScale_Lasuca" -Username "SugarAI"
   ```
   The script prompts for the password (or accept `-Password (Read-Host -AsSecureString)`) and exports `DB_SERVER`, `DB_DATABASE`, `DB_USERNAME`, `DB_PASSWORD`, `DB_DRIVER` (default `ODBC Driver 17 for SQL Server`), `DB_ENCRYPT`, and `DB_TRUST_CERT`. Override any parameter via the corresponding flag.

3. **Run the schema extraction job**:
   ```powershell
   python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json
   ```
   - `--schema`: restricts the crawl to a specific SQL schema (omit to capture all).
   - `--output`: where to write the structured schema document consumed by the agent.

4. **Review the output** under `db_agent/context/` and commit sanitized artifacts alongside code when appropriate.

## Running the SQL Agent
1. Ensure the TGI container in `deployment/docker-compose.yml` is running (default endpoint `http://192.168.0.30:8080`).
2. Execute a question against the model:
   ```powershell
   python -m db_agent.run_agent "How many loads were completed yesterday?" --tables dbo.SugarLoadData
   ```
   The script prints JSON with `sql` and `summary` fields.
3. Use `--execute` to run the validated SQL against MSSQL (enforced `TOP` limit via `--max-rows`, default `500`):
   ```powershell
   python -m db_agent.run_agent "Top gross loads this week" --tables dbo.SugarLoadData --execute --max-rows 200
   ```
   Preview rows (up to 5) and the sanitized SQL are echoed to stdout.
4. Interactions are logged to `db_agent/logs/query_log.jsonl` by default; override with `--log-path` or disable logging via `--no-log`.
5. To point at a different inference endpoint, edit `LlmConfig.base_url` in `db_agent/client.py` (the client posts to `<base_url>/generate`).

## Web UI
1. Install the web dependencies (already in `db_agent/requirements.txt`).
2. Start the FastAPI server:
   ```powershell
   uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000
   ```
3. Open `http://localhost:8000/` to access the UI. Submit questions, optionally run SQL, and view results/preview tables.
4. Environment variables for MSSQL access must be set in the shell before launching the server.

## Next Steps
- Add additional extractor jobs (lookup value snapshots, glossary builders) following the same pattern as `schema_snapshot.py`.
- Integrate the extractor into CI/CD or a scheduled task to keep the agent context fresh.
- Use `knowledge_prep.md` as a guide when enriching the context files with human-curated insights.
- Extend `db_agent/client.py` with SQL validation/execution layers and logging of user questions vs. generated queries.
- Consider adding automated SQL result validation (e.g., schema assertions) before surfacing answers to end users.