# Ticket: Stand Up 4080 LLM Service ## Ticket Information - **ID**: TICKET-021 - **Title**: Stand Up 4080 LLM Service - **Type**: Feature - **Priority**: High - **Status**: Backlog - **Track**: LLM Infra - **Milestone**: Milestone 2 - Voice Chat MVP - **Created**: 2024-01-XX ## Description Set up LLM service on 4080: - Use Ollama/vLLM/llama.cpp-based server - Expose HTTP/gRPC API - Support function-calling/tool use - Load selected work agent model - Configure for optimal performance ## Acceptance Criteria - [ ] LLM server running on 4080 - [ ] HTTP/gRPC endpoint exposed - [ ] Work agent model loaded - [ ] Function-calling support working - [ ] Basic health check endpoint - [ ] Performance acceptable ## Technical Details Server options: - Ollama: Easy setup, good tool support - vLLM: High throughput, batching - llama.cpp: Lightweight, efficient Requirements: - HTTP API for simple requests - gRPC for streaming (optional) - Function calling format (OpenAI-compatible) ## Dependencies - TICKET-019 (work agent model selection) - TICKET-004 (architecture) ## Related Files - `home-voice-agent/llm-servers/4080/` (to be created) ## Notes Independent of MCP/tool design - just needs common API. Can proceed after model selection.