add all files

2026-02-17 09:29:34 -06:00
parent b8c8d67c67
commit 782d203799
21925 changed files with 2433086 additions and 0 deletions
--- a/ai_agents/db_agent/README.md
+++ b/ai_agents/db_agent/README.md
@@ -0,0 +1,70 @@
+# Database Agent Toolkit
+
+This folder hosts utilities and documentation for the natural-language SQL agent.
+
+## Folders
+- `extractor/`: Python tooling to pull schema metadata from the MSSQL sandbox.
+- `prompting/`: Prompt builders that merge curated context into LLM requests.
+- `context/`: Generated artifacts (schema JSON, glossary, examples) injected into LLM prompts.
+- `deployment/`: Docker Compose for hosting open-source SQL models via TGI.
+- `sql_executor.py`: Validation + execution helpers for running generated SQL safely.
+- `log_utils.py`: JSONL logging utilities for question/SQL traces.
+- `ui/`: FastAPI backend and static frontend for a simple web interface.
+- `knowledge_prep.md`: Checklist for curating high-quality agent context.
+
+## Python Setup
+1. **Install dependencies** (create a venv if needed):
+   ```powershell
+   python -m venv .venv
+   .\.venv\Scripts\Activate.ps1
+   pip install -r requirements.txt
+   pip install -r db_agent/requirements.txt
+   ```
+   Ensure the Microsoft ODBC driver (17 or 18) is installed on the machine.
+
+2. **Configure environment variables** (never commit credentials):
+   ```powershell
+   # Dot-source so values remain in the current shell
+   . .\db_agent\scripts\Set-DbEnv.ps1 -Server "SUGARSCALE\SQLEXPRESS" -Database "SugarScale_Lasuca" -Username "SugarAI"
+   ```
+   The script prompts for the password (or accept `-Password (Read-Host -AsSecureString)`) and exports `DB_SERVER`, `DB_DATABASE`, `DB_USERNAME`, `DB_PASSWORD`, `DB_DRIVER` (default `ODBC Driver 17 for SQL Server`), `DB_ENCRYPT`, and `DB_TRUST_CERT`. Override any parameter via the corresponding flag.
+
+3. **Run the schema extraction job**:
+   ```powershell
+   python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json
+   ```
+   - `--schema`: restricts the crawl to a specific SQL schema (omit to capture all).
+   - `--output`: where to write the structured schema document consumed by the agent.
+
+4. **Review the output** under `db_agent/context/` and commit sanitized artifacts alongside code when appropriate.
+
+## Running the SQL Agent
+1. Ensure the TGI container in `deployment/docker-compose.yml` is running (default endpoint `http://192.168.0.30:8080`).
+2. Execute a question against the model:
+   ```powershell
+   python -m db_agent.run_agent "How many loads were completed yesterday?" --tables dbo.SugarLoadData
+   ```
+   The script prints JSON with `sql` and `summary` fields.
+3. Use `--execute` to run the validated SQL against MSSQL (enforced `TOP` limit via `--max-rows`, default `500`):
+   ```powershell
+   python -m db_agent.run_agent "Top gross loads this week" --tables dbo.SugarLoadData --execute --max-rows 200
+   ```
+   Preview rows (up to 5) and the sanitized SQL are echoed to stdout.
+4. Interactions are logged to `db_agent/logs/query_log.jsonl` by default; override with `--log-path` or disable logging via `--no-log`.
+5. To point at a different inference endpoint, edit `LlmConfig.base_url` in `db_agent/client.py` (the client posts to `<base_url>/generate`).
+
+## Web UI
+1. Install the web dependencies (already in `db_agent/requirements.txt`).
+2. Start the FastAPI server:
+   ```powershell
+   uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000
+   ```
+3. Open `http://localhost:8000/` to access the UI. Submit questions, optionally run SQL, and view results/preview tables.
+4. Environment variables for MSSQL access must be set in the shell before launching the server.
+
+## Next Steps
+- Add additional extractor jobs (lookup value snapshots, glossary builders) following the same pattern as `schema_snapshot.py`.
+- Integrate the extractor into CI/CD or a scheduled task to keep the agent context fresh.
+- Use `knowledge_prep.md` as a guide when enriching the context files with human-curated insights.
+- Extend `db_agent/client.py` with SQL validation/execution layers and logging of user questions vs. generated queries.
+- Consider adding automated SQL result validation (e.g., schema assertions) before surfacing answers to end users.