atlas/tickets/backlog/TICKET-021_setup-4080-llm-server.md

# Ticket: Stand Up 4080 LLM Service

## Ticket Information

- **ID**: TICKET-021
- **Title**: Stand Up 4080 LLM Service
- **Type**: Feature
- **Priority**: High
- **Status**: Backlog
- **Track**: LLM Infra
- **Milestone**: Milestone 2 - Voice Chat MVP
- **Created**: 2024-01-XX

## Description

Set up LLM service on 4080:
- Use Ollama/vLLM/llama.cpp-based server
- Expose HTTP/gRPC API
- Support function-calling/tool use
- Load selected work agent model
- Configure for optimal performance

## Acceptance Criteria

- [ ] LLM server running on 4080
- [ ] HTTP/gRPC endpoint exposed
- [ ] Work agent model loaded
- [ ] Function-calling support working
- [ ] Basic health check endpoint
- [ ] Performance acceptable

## Technical Details

Server options:
- Ollama: Easy setup, good tool support
- vLLM: High throughput, batching
- llama.cpp: Lightweight, efficient

Requirements:
- HTTP API for simple requests
- gRPC for streaming (optional)
- Function calling format (OpenAI-compatible)

## Dependencies

- TICKET-019 (work agent model selection)
- TICKET-004 (architecture)

## Related Files

- `home-voice-agent/llm-servers/4080/` (to be created)

## Notes

Independent of MCP/tool design - just needs common API. Can proceed after model selection.