# Architecture Documentation ## Overview This document describes the architecture, design patterns, and technical decisions for the Atlas home voice agent project. Atlas is a local, privacy-focused voice agent system with separate work and family agents, running on dedicated hardware (RTX 4080 for work agent, RTX 1050 for family agent). ## System Architecture ### High-Level Design The system consists of 5 parallel tracks: 1. **Voice I/O**: Wake-word detection, ASR (Automatic Speech Recognition), TTS (Text-to-Speech) 2. **LLM Infrastructure**: Two separate LLM servers (4080 for work, 1050 for family) 3. **Tools/MCP**: Model Context Protocol (MCP) tool servers for weather, tasks, notes, etc. 4. **Clients/UI**: Phone PWA and web LAN dashboard 5. **Safety/Memory**: Long-term memory, conversation management, safety controls ### Component Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Clients Layer │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Phone PWA │ │ Web Dashboard│ │ │ └──────┬───────┘ └──────┬───────┘ │ └─────────┼──────────────────────────────┼──────────────────┘ │ │ │ WebSocket/HTTP │ │ │ ┌─────────┼──────────────────────────────┼──────────────────┐ │ │ Voice Stack │ │ │ ┌──────▼──────┐ ┌──────────┐ ┌─────▼──────┐ │ │ │ Wake-Word │ │ ASR │ │ TTS │ │ │ │ Node │─▶│ Service │ │ Service │ │ │ └─────────────┘ └────┬─────┘ └────────────┘ │ └────────────────────────┼────────────────────────────────┘ │ ┌────────────────────────┼────────────────────────────────┐ │ │ LLM Infrastructure │ │ ┌──────▼──────┐ ┌──────────────┐ │ │ │ 4080 Server │ │ 1050 Server │ │ │ │ (Work Agent)│ │(Family Agent)│ │ │ └──────┬──────┘ └──────┬───────┘ │ │ │ │ │ │ └────────────┬───────────────┘ │ │ │ │ │ ┌───────▼────────┐ │ │ │ Routing Layer │ │ │ └───────┬────────┘ │ └──────────────────────┼─────────────────────────────────┘ │ ┌──────────────────────┼─────────────────────────────────┐ │ │ MCP Tools Layer │ │ ┌──────▼──────────┐ │ │ │ MCP Server │ │ │ │ ┌───────────┐ │ │ │ │ │ Weather │ │ │ │ │ │ Tasks │ │ │ │ │ │ Timers │ │ │ │ │ │ Notes │ │ │ │ │ └───────────┘ │ │ │ └──────────────────┘ │ └────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────┐ │ │ Safety & Memory │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Memory │ │ Boundaries │ │ │ │ Store │ │ Enforcement │ │ │ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────┘ ``` ### Technology Stack - **Languages**: Python (backend services), TypeScript/JavaScript (clients) - **LLM Servers**: Ollama, vLLM, or llama.cpp - **ASR**: faster-whisper or Whisper.cpp - **TTS**: Piper, Mimic 3, or Coqui TTS - **Wake-Word**: openWakeWord or Porcupine - **Protocols**: MCP (Model Context Protocol), WebSocket, HTTP/gRPC - **Storage**: SQLite (memory, sessions), Markdown files (tasks, notes) - **Infrastructure**: Docker, systemd, Linux ## Design Patterns ### Core Patterns - **Microservices Architecture**: Separate services for wake-word, ASR, TTS, LLM servers, MCP tools - **Event-Driven**: Wake-word events trigger ASR capture, tool calls trigger actions - **API Gateway Pattern**: Routing layer directs requests to appropriate LLM server - **Repository Pattern**: Separate config repo for family agent (no work content) - **Tool Pattern**: MCP tools as independent, composable services ### Architectural Patterns - **Separation of Concerns**: Clear boundaries between work and family agents - **Layered Architecture**: Clients → Voice Stack → LLM Infrastructure → MCP Tools → Safety/Memory - **Service-Oriented**: Each component is an independent service with defined APIs - **Privacy by Design**: Local processing, minimal external dependencies ## Project Structure ### Repository Structure ``` home-voice-agent/ # Main mono-repo ├── llm-servers/ │ ├── 4080/ # Work agent server │ └── 1050/ # Family agent server ├── mcp-server/ # MCP tool server │ └── tools/ # Individual tool implementations ├── wake-word/ # Wake-word detection node ├── asr/ # ASR service ├── tts/ # TTS service ├── clients/ │ ├── phone/ # Phone PWA │ └── web-dashboard/ # Web dashboard ├── routing/ # LLM routing layer ├── conversation/ # Conversation management ├── memory/ # Long-term memory ├── safety/ # Safety and boundary enforcement └── admin/ # Admin tools family-agent-config/ # Separate config repo ├── prompts/ # System prompts ├── tools/ # Tool configurations ├── secrets/ # Credentials (no work stuff) └── tasks/ # Home Kanban board └── home/ # Home tasks only ``` ### Atlas Project (This Repo) ``` atlas/ ├── tickets/ # Kanban tickets │ ├── backlog/ # Future work │ ├── todo/ # Ready to work on │ ├── in-progress/ # Active work │ ├── review/ # Awaiting review │ └── done/ # Completed ├── docs/ # Documentation ├── ARCHITECTURE.md # This file └── README.md # Project overview ``` ## Data Models ### Memory Schema Long-term memory stores personal facts, preferences, and routines: ```python MemoryEntry: - id: str - category: str # personal, family, preferences, routines - content: str - timestamp: datetime - confidence: float - source: str # conversation, explicit, inferred ``` ### Conversation Session ```python Session: - session_id: str - agent_type: str # "work" or "family" - messages: List[Message] - created_at: datetime - last_activity: datetime - summary: str # after summarization ``` ### Task Model (Markdown Kanban) ```yaml --- id: TICKET-XXX title: Task title status: backlog|todo|in-progress|review|done priority: high|medium|low created: YYYY-MM-DD updated: YYYY-MM-DD assignee: name tags: [tag1, tag2] --- Task description... ``` ### MCP Tool Definition ```json { "name": "tool_name", "description": "Tool description", "inputSchema": { "type": "object", "properties": {...} } } ``` ## API Design ### LLM Server API **Endpoint**: `POST /v1/chat/completions` ```json { "model": "work-agent" | "family-agent", "messages": [...], "tools": [...], "temperature": 0.7 } ``` ### ASR Service API **Endpoint**: `WebSocket /asr/stream` - Input: Audio stream (PCM, 16kHz, mono) - Output: Text segments with timestamps ```json { "text": "transcribed text", "timestamp": 1234.56, "confidence": 0.95, "is_final": false } ``` ### TTS Service API **Endpoint**: `POST /tts/synthesize` ```json { "text": "Text to speak", "voice": "family-voice", "stream": true } ``` Response: Audio stream (WAV or MP3) ### MCP Server API **Protocol**: JSON-RPC 2.0 **Methods**: - `tools/list`: List available tools - `tools/call`: Execute a tool ```json { "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "weather", "arguments": {...} }, "id": 1 } ``` ## Security Considerations ### Privacy Policy - **Core Principle**: No external APIs for ASR/LLM processing - **Exception**: Weather API (documented exception) - **Local Processing**: All voice and LLM processing runs locally - **Data Retention**: Configurable retention policies for conversations ### Boundary Enforcement - **Repository Separation**: Family agent config in separate repo, no work content - **Path Whitelists**: Tools can only access whitelisted directories - **Network Isolation**: Containers/namespaces prevent cross-access - **Firewall Rules**: Block family agent from accessing work repo paths - **Static Analysis**: CI/CD checks reject code that would grant cross-access ### Confirmation Flows - **High-Risk Actions**: Email send, calendar changes, file edits outside safe areas - **Confirmation Tokens**: Signed tokens required from client, not just model intent - **User Approval**: Explicit "Yes/No" confirmation for sensitive operations - **Audit Logging**: All confirmations and high-risk actions logged ### Authentication & Authorization - **Token-Based**: Separate tokens for work vs family agents - **Revocation**: System to disable compromised tokens/devices - **Admin Controls**: Kill switches for services, tools, or entire agent ## Performance Considerations ### Latency Targets - **Wake-Word Detection**: < 200ms - **ASR Processing**: < 2s end-to-end (audio in → text out) - **LLM Response**: < 3s for family agent (1050), < 5s for work agent (4080) - **TTS Synthesis**: < 500ms first chunk, streaming thereafter - **Tool Execution**: < 1s for simple tools (weather, time) ### Resource Allocation - **4080 (Work Agent)**: - Model: 8-14B or 30B quantized (Q4-Q6) - Context: 8K-16K tokens - Concurrency: 2-4 requests - **1050 (Family Agent)**: - Model: 1B-3B quantized (Q4-Q5) - Context: 4K-8K tokens - Concurrency: 1-2 requests (always-on, low latency) ### Optimization Strategies - **Model Quantization**: Q4-Q6 for 4080, Q4-Q5 for 1050 - **Context Management**: Summarization and pruning for long conversations - **Caching**: Weather API responses, tool results - **Streaming**: ASR and TTS use streaming for lower perceived latency - **Batching**: LLM requests where possible (work agent) ## Deployment ### Hardware Requirements - **RTX 4080 Server**: Work agent LLM, ASR (optional) - **RTX 1050 Server**: Family agent LLM (always-on) - **Wake-Word Node**: Raspberry Pi 4+, NUC, or SFF PC - **Microphones**: USB mics or array mic for living room/office - **Storage**: SSD for logs, HDD for archives ### Service Deployment - **LLM Servers**: Systemd services or Docker containers - **MCP Server**: Systemd service with auto-restart - **Voice Services**: ASR and TTS as systemd services - **Wake-Word Node**: Standalone service on dedicated hardware - **Clients**: PWA served via web server, web dashboard on LAN ### Configuration Management - **Family Agent Config**: Separate `family-agent-config/` repo - **Secrets**: Environment variables, separate .env files - **Prompts**: Version-controlled in config repo - **Tool Configs**: YAML/JSON files in config repo ### Monitoring & Logging - **Structured Logging**: JSON logs for all services - **Metrics**: GPU usage, latency, error rates - **Admin Dashboard**: Web UI for logs, metrics, controls - **Alerting**: System notifications for errors or high resource usage ## Development Workflow ### Ticket-Based Development 1. **Select Ticket**: Choose from `tickets/backlog/` or `tickets/todo/` 2. **Check Dependencies**: Review ticket dependencies before starting 3. **Move to In-Progress**: Move ticket to `tickets/in-progress/` 4. **Implement**: Follow architecture patterns and conventions 5. **Test**: Write and run tests 6. **Document**: Update relevant documentation 7. **Move to Review**: Move ticket to `tickets/review/` when complete 8. **Move to Done**: Move to `tickets/done/` after approval ### Parallel Development Many tickets can be worked on simultaneously: - **Voice I/O**: Independent of LLM and MCP - **LLM Infrastructure**: Can proceed after model selection - **MCP Tools**: Can start with minimal server, add tools incrementally - **Clients/UI**: Can mock APIs early, integrate later ### Milestone Progression - **Milestone 1**: Foundation and surveys (TICKET-002 through TICKET-029) - **Milestone 2**: MVP with voice chat, weather, tasks (core functionality) - **Milestone 3**: Memory, reminders, safety features - **Milestone 4**: Optional integrations (email, calendar, smart home) ## Future Considerations ### Planned Enhancements - **Semantic Search**: Add embeddings for note search (beyond ripgrep) - **Routine Learning**: Automatically learn and suggest routines from memory - **Multi-Device**: Support multiple wake-word nodes and clients - **Offline Mode**: Enhanced offline capabilities for clients - **Voice Cloning**: Custom voice profiles for family members ### Technical Debt - Start with basic implementations, optimize later - Initial memory system: simple schema, enhance with better retrieval - Tool permissions: Start with whitelists, add more granular control later - Logging: Start with files, migrate to time-series DB if needed ### Scalability - Current design supports single household - Future: Multi-user support with user-specific memory and preferences - Future: Distributed deployment across multiple nodes ## Related Documentation - **Tickets**: See `tickets/TICKETS_SUMMARY.md` for all 46 tickets - **Quick Start**: See `tickets/QUICK_START.md` for recommended starting order - **Ticket Template**: See `tickets/TICKET_TEMPLATE.md` for creating new tickets --- **Note**: Update this document as the architecture evolves.