atlas/home-voice-agent/llm-servers/4080/README.md

# 4080 LLM Server (Work Agent)

LLM server for work agent running on remote GPU VM.

## Server Information

- **Host**: 10.0.30.63
- **Port**: 11434
- **Endpoint**: http://10.0.30.63:11434
- **Service**: Ollama

## Available Models

The server has the following models available:
- `deepseek-r1:70b` - 70B model (currently configured)
- `deepseek-r1:671b` - 671B model
- `llama3.1:8b` - Llama 3.1 8B
- `qwen2.5:14b` - Qwen 2.5 14B
- And others (see `test_connection.py`)

## Configuration

Edit `config.py` to change the model:
```python
MODEL_NAME = "deepseek-r1:70b"  # or your preferred model
```

## Testing Connection

```bash
cd home-voice-agent/llm-servers/4080
python3 test_connection.py
```

This will:
1. Test server connectivity
2. List available models
3. Test chat endpoint with configured model

## API Usage

### List Models
```bash
curl http://10.0.30.63:11434/api/tags
```

### Chat Request
```bash
curl http://10.0.30.63:11434/api/chat -d '{
  "model": "deepseek-r1:70b",
  "messages": [
    {"role": "user", "content": "Hello"}
  ],
  "stream": false
}'
```

### With Function Calling
```bash
curl http://10.0.30.63:11434/api/chat -d '{
  "model": "deepseek-r1:70b",
  "messages": [
    {"role": "user", "content": "What is the weather in San Francisco?"}
  ],
  "tools": [...],
  "stream": false
}'
```

## Integration

The MCP adapter can connect to this server by setting:
```python
OLLAMA_BASE_URL = "http://10.0.30.63:11434"
```

## Notes

- The server is already running on the GPU VM
- No local installation needed - just configure the endpoint
- Model selection can be changed in `config.py`
- If you need `llama3.1:70b-q4_0`, pull it on the server:
  ```bash
  # On the GPU VM
  ollama pull llama3.1:70b-q4_0
  ```