Local TGI Deployment for SQLCoder
This docker compose file runs Hugging Face Text Generation Inference (TGI) serving the defog/sqlcoder-7b-2 model.
Prerequisites
- NVIDIA GPU with recent drivers and CUDA runtime that matches the
nvidia-container-toolkitinstallation. - Docker and
docker composev2. - Hugging Face access token with model download permissions (
HUGGING_FACE_HUB_TOKEN).
Usage
- Export your Hugging Face token in the shell where you run compose:
$env:HUGGING_FACE_HUB_TOKEN = "hf_..." - Launch the stack:
docker compose -f db_agent/deployment/docker-compose.yml up -d - Check logs:
docker compose -f db_agent/deployment/docker-compose.yml logs -f - The TGI OpenAI-compatible endpoint will be available at
http://localhost:8080/v1. Use it withopenai-compatible SDKs or direct HTTP calls.
Notes
- The compose file pins
CUDA_VISIBLE_DEVICES=2to target the 24 GB RTX 3090; update if your GPU indices differ. - Token limits are tightened (
--max-total-tokens=3072,--max-input-length=2048) to stay within 16–24 GB cards. - Models are cached on the
model-cachevolume to avoid re-downloading. - To shut down:
docker compose -f db_agent/deployment/docker-compose.yml down - For CPU-only testing, remove the
deploy.resourcesblock and expect very slow inference.