Files
2026-02-17 09:29:34 -06:00
..
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00

Database Agent Toolkit

This folder hosts utilities and documentation for the natural-language SQL agent.

Folders

  • extractor/: Python tooling to pull schema metadata from the MSSQL sandbox.
  • prompting/: Prompt builders that merge curated context into LLM requests.
  • context/: Generated artifacts (schema JSON, glossary, examples) injected into LLM prompts.
  • deployment/: Docker Compose for hosting open-source SQL models via TGI.
  • sql_executor.py: Validation + execution helpers for running generated SQL safely.
  • log_utils.py: JSONL logging utilities for question/SQL traces.
  • ui/: FastAPI backend and static frontend for a simple web interface.
  • knowledge_prep.md: Checklist for curating high-quality agent context.

Python Setup

  1. Install dependencies (create a venv if needed):

    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
    pip install -r requirements.txt
    pip install -r db_agent/requirements.txt
    

    Ensure the Microsoft ODBC driver (17 or 18) is installed on the machine.

  2. Configure environment variables (never commit credentials):

    # Dot-source so values remain in the current shell
    . .\db_agent\scripts\Set-DbEnv.ps1 -Server "SUGARSCALE\SQLEXPRESS" -Database "SugarScale_Lasuca" -Username "SugarAI"
    

    The script prompts for the password (or accept -Password (Read-Host -AsSecureString)) and exports DB_SERVER, DB_DATABASE, DB_USERNAME, DB_PASSWORD, DB_DRIVER (default ODBC Driver 17 for SQL Server), DB_ENCRYPT, and DB_TRUST_CERT. Override any parameter via the corresponding flag.

  3. Run the schema extraction job:

    python -m db_agent.extractor --job schema --schema dbo --output db_agent/context/schema.json
    
    • --schema: restricts the crawl to a specific SQL schema (omit to capture all).
    • --output: where to write the structured schema document consumed by the agent.
  4. Review the output under db_agent/context/ and commit sanitized artifacts alongside code when appropriate.

Running the SQL Agent

  1. Ensure the TGI container in deployment/docker-compose.yml is running (default endpoint http://192.168.0.30:8080).
  2. Execute a question against the model:
    python -m db_agent.run_agent "How many loads were completed yesterday?" --tables dbo.SugarLoadData
    
    The script prints JSON with sql and summary fields.
  3. Use --execute to run the validated SQL against MSSQL (enforced TOP limit via --max-rows, default 500):
    python -m db_agent.run_agent "Top gross loads this week" --tables dbo.SugarLoadData --execute --max-rows 200
    
    Preview rows (up to 5) and the sanitized SQL are echoed to stdout.
  4. Interactions are logged to db_agent/logs/query_log.jsonl by default; override with --log-path or disable logging via --no-log.
  5. To point at a different inference endpoint, edit LlmConfig.base_url in db_agent/client.py (the client posts to <base_url>/generate).

Web UI

  1. Install the web dependencies (already in db_agent/requirements.txt).
  2. Start the FastAPI server:
    uvicorn db_agent.ui.backend.main:app --reload --host 0.0.0.0 --port 8000
    
  3. Open http://localhost:8000/ to access the UI. Submit questions, optionally run SQL, and view results/preview tables.
  4. Environment variables for MSSQL access must be set in the shell before launching the server.

Next Steps

  • Add additional extractor jobs (lookup value snapshots, glossary builders) following the same pattern as schema_snapshot.py.
  • Integrate the extractor into CI/CD or a scheduled task to keep the agent context fresh.
  • Use knowledge_prep.md as a guide when enriching the context files with human-curated insights.
  • Extend db_agent/client.py with SQL validation/execution layers and logging of user questions vs. generated queries.
  • Consider adding automated SQL result validation (e.g., schema assertions) before surfacing answers to end users.