atlas/home-voice-agent/llm-servers/1050/README.md

# 1050 LLM Server (Family Agent)

LLM server for family agent running Phi-3 Mini 3.8B Q4 on RTX 1050.

## Setup

### Using Ollama (Recommended)

```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Download model
ollama pull phi3:mini-q4_0

# Start server
ollama serve --host 0.0.0.0
# Runs on http://<1050-ip>:11434
```

## Configuration

- **Model**: Phi-3 Mini 3.8B Q4
- **Context Window**: 8K tokens (practical limit)
- **VRAM Usage**: ~2.5GB
- **Concurrency**: 1-2 requests max

## API

Ollama uses OpenAI-compatible API:

```bash
curl http://<1050-ip>:11434/api/chat -d '{
  "model": "phi3:mini-q4_0",
  "messages": [
    {"role": "user", "content": "Hello"}
  ],
  "stream": false
}'
```

## Systemd Service

See `ollama-1050.service` for systemd configuration.