Files
controls-web/ai_agents/db_agent/deployment
2026-02-17 09:29:34 -06:00
..
2026-02-17 09:29:34 -06:00
2026-02-17 09:29:34 -06:00

Local TGI Deployment for SQLCoder

This docker compose file runs Hugging Face Text Generation Inference (TGI) serving the defog/sqlcoder-7b-2 model.

Prerequisites

  • NVIDIA GPU with recent drivers and CUDA runtime that matches the nvidia-container-toolkit installation.
  • Docker and docker compose v2.
  • Hugging Face access token with model download permissions (HUGGING_FACE_HUB_TOKEN).

Usage

  1. Export your Hugging Face token in the shell where you run compose:
    $env:HUGGING_FACE_HUB_TOKEN = "hf_..."
    
  2. Launch the stack:
    docker compose -f db_agent/deployment/docker-compose.yml up -d
    
  3. Check logs:
    docker compose -f db_agent/deployment/docker-compose.yml logs -f
    
  4. The TGI OpenAI-compatible endpoint will be available at http://localhost:8080/v1. Use it with openai-compatible SDKs or direct HTTP calls.

Notes

  • The compose file pins CUDA_VISIBLE_DEVICES=2 to target the 24 GB RTX 3090; update if your GPU indices differ.
  • Token limits are tightened (--max-total-tokens=3072, --max-input-length=2048) to stay within 1624 GB cards.
  • Models are cached on the model-cache volume to avoid re-downloading.
  • To shut down:
    docker compose -f db_agent/deployment/docker-compose.yml down
    
  • For CPU-only testing, remove the deploy.resources block and expect very slow inference.