llm_council/docs/GPU_VM_SETUP.md

# GPU VM Setup - Quick Reference

## Quick Setup Steps

### 1. On GPU VM: Configure Ollama to Accept Remote Connections

```bash
# Create systemd override
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<EOF
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_KEEP_ALIVE=24h"

# Keep Ollama storage off root disk (recommended) and ensure the service user
# has a writable HOME (runners/keys/cache).
Environment="OLLAMA_MODELS=/mnt/data/ollama"
Environment="HOME=/mnt/data/ollama/home"

# IMPORTANT (GPU): on this VM, `OLLAMA_LLM_LIBRARY=cuda` caused Ollama to SKIP CUDA.
# Use the libdir selector instead.
Environment="OLLAMA_LLM_LIBRARY=cuda_v12"

# Ensure the dynamic linker can resolve Ollama's bundled CUDA + ggml libs.
Environment="LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12"
EOF

# Reload and restart
sudo systemctl daemon-reload
sudo systemctl restart ollama

# Verify
curl http://0.0.0.0:11434/api/tags
```

**Verify GPU is actually used (on GPU VM):**

```bash
ollama run qwen2:latest "Write 80 words about GPUs."
ollama ps
watch -n 0.2 nvidia-smi
```

### 2. On GPU VM: Configure Firewall

```bash
# Allow port 11434 (adjust IP/subnet as needed)
sudo ufw allow from YOUR_LOCAL_IP to any port 11434
# Or allow from entire subnet (less secure)
sudo ufw allow 11434/tcp
```

### 3. On GPU VM: Pull Required Models

```bash
ollama pull qwen2.5:7b
ollama pull llama3.1:8b
ollama pull qwen2.5:14b
ollama pull qwen2:latest
```

### 4. On Local Machine: Configure .env

```bash
USE_LOCAL_OLLAMA=false
OPENAI_COMPAT_BASE_URL=http://YOUR_GPU_VM_IP:11434
# Local (small) example:
# COUNCIL_MODELS=llama3.2:1b,qwen2.5:0.5b,gemma2:2b
# CHAIRMAN_MODEL=llama3.2:3b

# GPU (available models):
COUNCIL_MODELS=qwen2.5:7b,llama3.1:8b,qwen2:latest
CHAIRMAN_MODEL=qwen2.5:14b
```

### 5. Test Connection

```bash
# From local machine
curl http://YOUR_GPU_VM_IP:11434/api/tags
curl http://YOUR_GPU_VM_IP:11434/v1/models
```

## Troubleshooting

- **Connection timeout**: Check Ollama is listening on `0.0.0.0:11434` (not `127.0.0.1`)
- **Firewall blocking**: Check `sudo ufw status` and allow port 11434
- **CPU instead of GPU**: Run `ollama ps` and confirm it doesn't say `100% CPU`. If it does:
  - Ensure `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
  - Ensure `LD_LIBRARY_PATH` includes `/usr/local/lib/ollama` and `/usr/local/lib/ollama/cuda_v12`
  - Ensure `OLLAMA_MODELS` and `HOME` are on a disk with free space (root disk full can break runner/cache)

See [DEPLOYMENT.md](DEPLOYMENT.md) for detailed instructions.