Files
controls-web/ai_agents/db_agent/deployment/README.md
2026-02-17 09:29:34 -06:00

34 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Local TGI Deployment for SQLCoder
This docker compose file runs Hugging Face Text Generation Inference (TGI) serving the `defog/sqlcoder-7b-2` model.
## Prerequisites
- NVIDIA GPU with recent drivers and CUDA runtime that matches the `nvidia-container-toolkit` installation.
- Docker and `docker compose` v2.
- Hugging Face access token with model download permissions (`HUGGING_FACE_HUB_TOKEN`).
## Usage
1. Export your Hugging Face token in the shell where you run compose:
```powershell
$env:HUGGING_FACE_HUB_TOKEN = "hf_..."
```
2. Launch the stack:
```powershell
docker compose -f db_agent/deployment/docker-compose.yml up -d
```
3. Check logs:
```powershell
docker compose -f db_agent/deployment/docker-compose.yml logs -f
```
4. The TGI OpenAI-compatible endpoint will be available at `http://localhost:8080/v1`. Use it with `openai`-compatible SDKs or direct HTTP calls.
## Notes
- The compose file pins `CUDA_VISIBLE_DEVICES=2` to target the 24 GB RTX 3090; update if your GPU indices differ.
- Token limits are tightened (`--max-total-tokens=3072`, `--max-input-length=2048`) to stay within 1624 GB cards.
- Models are cached on the `model-cache` volume to avoid re-downloading.
- To shut down:
```powershell
docker compose -f db_agent/deployment/docker-compose.yml down
```
- For CPU-only testing, remove the `deploy.resources` block and expect very slow inference.