71 lines
3.9 KiB
Markdown
71 lines
3.9 KiB
Markdown
# Database Agent Toolkit
|
|
|
|
This folder hosts utilities and documentation for the natural-language SQL agent.
|
|
|
|
## Folders
|
|
- `extractor/`: Python tooling to pull schema metadata from the MSSQL sandbox.
|
|
- `prompting/`: Prompt builders that merge curated context into LLM requests.
|
|
- `context/`: Generated artifacts (schema JSON, glossary, examples) injected into LLM prompts.
|
|
- `deployment/`: Docker Compose for hosting open-source SQL models via TGI.
|
|
- `sql_executor.py`: Validation + execution helpers for running generated SQL safely.
|
|
- `log_utils.py`: JSONL logging utilities for question/SQL traces.
|
|
- `ui/`: FastAPI backend and static frontend for a simple web interface.
|
|
- `knowledge_prep.md`: Checklist for curating high-quality agent context.
|
|
|
|
## Python Setup
|
|
1. **Install dependencies** (create a venv if needed):
|
|
```powershell
|
|
python -m venv .venv
|
|
.\.venv\Scripts\Activate.ps1
|
|
pip install -r requirements.txt
|
|
pip install -r db_agent/requirements.txt
|
|
```
|
|
Ensure the Microsoft ODBC driver (17 or 18) is installed on the machine.
|
|
|
|
2. **Configure environment variables** (never commit credentials):
|
|
```powershell
|
|
# Dot-source so values remain in the current shell
|
|
. .\db_agent\scripts\Set-DbEnv.ps1 -Server "SUGARSCALE\SQLEXPRESS" -Database "SugarScale_Lasuca" -Username "SugarAI"
|
|
```
|
|
The script prompts for the password (or accept `-Password (Read-Host -AsSecureString)`) and exports `DB_SERVER`, `DB_DATABASE`, `DB_USERNAME`, `DB_PASSWORD`, `DB_DRIVER` (default `ODBC Driver 17 for SQL Server`), `DB_ENCRYPT`, and `DB_TRUST_CERT`. Override any parameter via the corresponding flag.
|
|
|
|
3. **Run the schema extraction job**:
|
|
```powershell
|
|
python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json
|
|
```
|
|
- `--schema`: restricts the crawl to a specific SQL schema (omit to capture all).
|
|
- `--output`: where to write the structured schema document consumed by the agent.
|
|
|
|
4. **Review the output** under `db_agent/context/` and commit sanitized artifacts alongside code when appropriate.
|
|
|
|
## Running the SQL Agent
|
|
1. Ensure the TGI container in `deployment/docker-compose.yml` is running (default endpoint `http://192.168.0.30:8080`).
|
|
2. Execute a question against the model:
|
|
```powershell
|
|
python -m db_agent.run_agent "How many loads were completed yesterday?" --tables dbo.SugarLoadData
|
|
```
|
|
The script prints JSON with `sql` and `summary` fields.
|
|
3. Use `--execute` to run the validated SQL against MSSQL (enforced `TOP` limit via `--max-rows`, default `500`):
|
|
```powershell
|
|
python -m db_agent.run_agent "Top gross loads this week" --tables dbo.SugarLoadData --execute --max-rows 200
|
|
```
|
|
Preview rows (up to 5) and the sanitized SQL are echoed to stdout.
|
|
4. Interactions are logged to `db_agent/logs/query_log.jsonl` by default; override with `--log-path` or disable logging via `--no-log`.
|
|
5. To point at a different inference endpoint, edit `LlmConfig.base_url` in `db_agent/client.py` (the client posts to `<base_url>/generate`).
|
|
|
|
## Web UI
|
|
1. Install the web dependencies (already in `db_agent/requirements.txt`).
|
|
2. Start the FastAPI server:
|
|
```powershell
|
|
uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000
|
|
```
|
|
3. Open `http://localhost:8000/` to access the UI. Submit questions, optionally run SQL, and view results/preview tables.
|
|
4. Environment variables for MSSQL access must be set in the shell before launching the server.
|
|
|
|
## Next Steps
|
|
- Add additional extractor jobs (lookup value snapshots, glossary builders) following the same pattern as `schema_snapshot.py`.
|
|
- Integrate the extractor into CI/CD or a scheduled task to keep the agent context fresh.
|
|
- Use `knowledge_prep.md` as a guide when enriching the context files with human-curated insights.
|
|
- Extend `db_agent/client.py` with SQL validation/execution layers and logging of user questions vs. generated queries.
|
|
- Consider adding automated SQL result validation (e.g., schema assertions) before surfacing answers to end users.
|