LASUCA/controls-web

Files

History

Rucus 782d203799 add all files

2026-02-17 09:29:34 -06:00

..

docker-compose.yml

add all files

2026-02-17 09:29:34 -06:00

README.md

add all files

2026-02-17 09:29:34 -06:00

README.md

Local TGI Deployment for SQLCoder

This docker compose file runs Hugging Face Text Generation Inference (TGI) serving the defog/sqlcoder-7b-2 model.

Prerequisites

NVIDIA GPU with recent drivers and CUDA runtime that matches the nvidia-container-toolkit installation.
Docker and docker compose v2.
Hugging Face access token with model download permissions (HUGGING_FACE_HUB_TOKEN).

Usage

Export your Hugging Face token in the shell where you run compose:
```
$env:HUGGING_FACE_HUB_TOKEN = "hf_..."
```

Launch the stack:

docker compose -f db_agent/deployment/docker-compose.yml up -d

Check logs:

docker compose -f db_agent/deployment/docker-compose.yml logs -f

The TGI OpenAI-compatible endpoint will be available at http://localhost:8080/v1. Use it with openai-compatible SDKs or direct HTTP calls.

Notes

The compose file pins CUDA_VISIBLE_DEVICES=2 to target the 24 GB RTX 3090; update if your GPU indices differ.
Token limits are tightened (--max-total-tokens=3072, --max-input-length=2048) to stay within 16–24 GB cards.
Models are cached on the model-cache volume to avoid re-downloading.

To shut down:

docker compose -f db_agent/deployment/docker-compose.yml down

For CPU-only testing, remove the deploy.resources block and expect very slow inference.