Database Agent Toolkit
This folder hosts utilities and documentation for the natural-language SQL agent.
Folders
extractor/: Python tooling to pull schema metadata from the MSSQL sandbox.prompting/: Prompt builders that merge curated context into LLM requests.context/: Generated artifacts (schema JSON, glossary, examples) injected into LLM prompts.deployment/: Docker Compose for hosting open-source SQL models via TGI.sql_executor.py: Validation + execution helpers for running generated SQL safely.log_utils.py: JSONL logging utilities for question/SQL traces.ui/: FastAPI backend and static frontend for a simple web interface.knowledge_prep.md: Checklist for curating high-quality agent context.
Python Setup
-
Install dependencies (create a venv if needed):
python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt pip install -r db_agent/requirements.txtEnsure the Microsoft ODBC driver (17 or 18) is installed on the machine.
-
Configure environment variables (never commit credentials):
. .\db_agent\scripts\Set-DbEnv.ps1 -Server "SUGARSCALE\SQLEXPRESS" -Database "SugarScale_Lasuca" -Username "SugarAI"Dot-source the helper so the variables stay in your session. It prompts for the password (or accept
-Password (Read-Host -AsSecureString)) and sets theDB_*values expected byDbSettings. Pass-Driver,-Encrypt, or-TrustCertificateto override defaults. -
Run the schema extraction job:
python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json--schema: restricts the crawl to a specific SQL schema (omit to capture all).--output: where to write the structured schema document consumed by the agent.
-
Review the output under
db_agent/context/and commit sanitized artifacts alongside code when appropriate.
Running the SQL Agent
- Ensure the TGI container in
deployment/docker-compose.ymlis running (default endpointhttp://192.168.0.30:8080). - Execute a question against the model:
The script prints JSON with
python -m db_agent.run_agent "How many loads were completed yesterday?" --tables dbo.SugarLoadDatasqlandsummaryfields. - Use
--executeto run the validated SQL against MSSQL (enforcedTOPlimit via--max-rows, default500):Preview rows (up to 5) and the sanitized SQL are echoed to stdout.python -m db_agent.run_agent "Top gross loads this week" --tables dbo.SugarLoadData --execute --max-rows 200 - Interactions are logged to
db_agent/logs/query_log.jsonlby default; override with--log-pathor disable logging via--no-log. - To point at a different inference endpoint, edit
LlmConfig.base_urlindb_agent/client.py(the client posts to<base_url>/generate).
Web UI
- Install the web dependencies (already in
db_agent/requirements.txt). - Start the FastAPI server:
uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000 - Open
http://localhost:8000/to access the UI. Submit questions, optionally run SQL, and view results/preview tables. - Environment variables for MSSQL access must be set in the shell before launching the server.
Next Steps
- Add additional extractor jobs (lookup value snapshots, glossary builders) following the same pattern as
schema_snapshot.py. - Integrate the extractor into CI/CD or a scheduled task to keep the agent context fresh.
- Use
knowledge_prep.mdas a guide when enriching the context files with human-curated insights. - Extend
db_agent/client.pywith SQL validation/execution layers and logging of user questions vs. generated queries. - Consider adding automated SQL result validation (e.g., schema assertions) before surfacing answers to end users.