atlas/tickets/backlog/TICKET-021_setup-4080-llm-server.md
ilia 7c633a02ed Initialize project structure with essential files and documentation
- Added .cursorrules for project guidelines and context
- Created README.md for project overview and goals
- Established ARCHITECTURE.md for architectural documentation
- Set up tickets directory with initial ticket management files
- Included .gitignore to manage ignored files and directories

This commit lays the foundation for the Atlas project, ensuring a clear structure for development and collaboration.
2026-01-05 20:09:44 -05:00

1.2 KiB

Ticket: Stand Up 4080 LLM Service

Ticket Information

  • ID: TICKET-021
  • Title: Stand Up 4080 LLM Service
  • Type: Feature
  • Priority: High
  • Status: Backlog
  • Track: LLM Infra
  • Milestone: Milestone 2 - Voice Chat MVP
  • Created: 2024-01-XX

Description

Set up LLM service on 4080:

  • Use Ollama/vLLM/llama.cpp-based server
  • Expose HTTP/gRPC API
  • Support function-calling/tool use
  • Load selected work agent model
  • Configure for optimal performance

Acceptance Criteria

  • LLM server running on 4080
  • HTTP/gRPC endpoint exposed
  • Work agent model loaded
  • Function-calling support working
  • Basic health check endpoint
  • Performance acceptable

Technical Details

Server options:

  • Ollama: Easy setup, good tool support
  • vLLM: High throughput, batching
  • llama.cpp: Lightweight, efficient

Requirements:

  • HTTP API for simple requests
  • gRPC for streaming (optional)
  • Function calling format (OpenAI-compatible)

Dependencies

  • TICKET-019 (work agent model selection)
  • TICKET-004 (architecture)
  • home-voice-agent/llm-servers/4080/ (to be created)

Notes

Independent of MCP/tool design - just needs common API. Can proceed after model selection.