add all files

This commit is contained in:
Rucus
2026-02-17 09:29:34 -06:00
parent b8c8d67c67
commit 782d203799
21925 changed files with 2433086 additions and 0 deletions

View File

@@ -0,0 +1,70 @@
# Database Agent Toolkit
This folder hosts utilities and documentation for the natural-language SQL agent.
## Folders
- `extractor/`: Python tooling to pull schema metadata from the MSSQL sandbox.
- `prompting/`: Prompt builders that merge curated context into LLM requests.
- `context/`: Generated artifacts (schema JSON, glossary, examples) injected into LLM prompts.
- `deployment/`: Docker Compose for hosting open-source SQL models via TGI.
- `sql_executor.py`: Validation + execution helpers for running generated SQL safely.
- `log_utils.py`: JSONL logging utilities for question/SQL traces.
- `ui/`: FastAPI backend and static frontend for a simple web interface.
- `knowledge_prep.md`: Checklist for curating high-quality agent context.
## Python Setup
1. **Install dependencies** (create a venv if needed):
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -r db_agent/requirements.txt
```
Ensure the Microsoft ODBC driver (17 or 18) is installed on the machine.
2. **Configure environment variables** (never commit credentials):
```powershell
# Dot-source so values remain in the current shell
. .\db_agent\scripts\Set-DbEnv.ps1 -Server "SUGARSCALE\SQLEXPRESS" -Database "SugarScale_Lasuca" -Username "SugarAI"
```
The script prompts for the password (or accept `-Password (Read-Host -AsSecureString)`) and exports `DB_SERVER`, `DB_DATABASE`, `DB_USERNAME`, `DB_PASSWORD`, `DB_DRIVER` (default `ODBC Driver 17 for SQL Server`), `DB_ENCRYPT`, and `DB_TRUST_CERT`. Override any parameter via the corresponding flag.
3. **Run the schema extraction job**:
```powershell
python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json
```
- `--schema`: restricts the crawl to a specific SQL schema (omit to capture all).
- `--output`: where to write the structured schema document consumed by the agent.
4. **Review the output** under `db_agent/context/` and commit sanitized artifacts alongside code when appropriate.
## Running the SQL Agent
1. Ensure the TGI container in `deployment/docker-compose.yml` is running (default endpoint `http://192.168.0.30:8080`).
2. Execute a question against the model:
```powershell
python -m db_agent.run_agent "How many loads were completed yesterday?" --tables dbo.SugarLoadData
```
The script prints JSON with `sql` and `summary` fields.
3. Use `--execute` to run the validated SQL against MSSQL (enforced `TOP` limit via `--max-rows`, default `500`):
```powershell
python -m db_agent.run_agent "Top gross loads this week" --tables dbo.SugarLoadData --execute --max-rows 200
```
Preview rows (up to 5) and the sanitized SQL are echoed to stdout.
4. Interactions are logged to `db_agent/logs/query_log.jsonl` by default; override with `--log-path` or disable logging via `--no-log`.
5. To point at a different inference endpoint, edit `LlmConfig.base_url` in `db_agent/client.py` (the client posts to `<base_url>/generate`).
## Web UI
1. Install the web dependencies (already in `db_agent/requirements.txt`).
2. Start the FastAPI server:
```powershell
uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000
```
3. Open `http://localhost:8000/` to access the UI. Submit questions, optionally run SQL, and view results/preview tables.
4. Environment variables for MSSQL access must be set in the shell before launching the server.
## Next Steps
- Add additional extractor jobs (lookup value snapshots, glossary builders) following the same pattern as `schema_snapshot.py`.
- Integrate the extractor into CI/CD or a scheduled task to keep the agent context fresh.
- Use `knowledge_prep.md` as a guide when enriching the context files with human-curated insights.
- Extend `db_agent/client.py` with SQL validation/execution layers and logging of user questions vs. generated queries.
- Consider adding automated SQL result validation (e.g., schema assertions) before surfacing answers to end users.