- Added .cursorrules for project guidelines and context - Created README.md for project overview and goals - Established ARCHITECTURE.md for architectural documentation - Set up tickets directory with initial ticket management files - Included .gitignore to manage ignored files and directories This commit lays the foundation for the Atlas project, ensuring a clear structure for development and collaboration.
56 lines
1.2 KiB
Markdown
56 lines
1.2 KiB
Markdown
# Ticket: Stand Up 4080 LLM Service
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-021
|
|
- **Title**: Stand Up 4080 LLM Service
|
|
- **Type**: Feature
|
|
- **Priority**: High
|
|
- **Status**: Backlog
|
|
- **Track**: LLM Infra
|
|
- **Milestone**: Milestone 2 - Voice Chat MVP
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Set up LLM service on 4080:
|
|
- Use Ollama/vLLM/llama.cpp-based server
|
|
- Expose HTTP/gRPC API
|
|
- Support function-calling/tool use
|
|
- Load selected work agent model
|
|
- Configure for optimal performance
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] LLM server running on 4080
|
|
- [ ] HTTP/gRPC endpoint exposed
|
|
- [ ] Work agent model loaded
|
|
- [ ] Function-calling support working
|
|
- [ ] Basic health check endpoint
|
|
- [ ] Performance acceptable
|
|
|
|
## Technical Details
|
|
|
|
Server options:
|
|
- Ollama: Easy setup, good tool support
|
|
- vLLM: High throughput, batching
|
|
- llama.cpp: Lightweight, efficient
|
|
|
|
Requirements:
|
|
- HTTP API for simple requests
|
|
- gRPC for streaming (optional)
|
|
- Function calling format (OpenAI-compatible)
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-019 (work agent model selection)
|
|
- TICKET-004 (architecture)
|
|
|
|
## Related Files
|
|
|
|
- `home-voice-agent/llm-servers/4080/` (to be created)
|
|
|
|
## Notes
|
|
|
|
Independent of MCP/tool design - just needs common API. Can proceed after model selection.
|