- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
19 KiB
Architecture Documentation
Overview
This document describes the architecture, design patterns, and technical decisions for the Atlas home voice agent project.
Atlas is a local, privacy-focused voice agent system with separate work and family agents, running on dedicated hardware (RTX 4080 for work agent, RTX 1050 for family agent).
System Architecture
High-Level Design
The system consists of 5 parallel tracks:
- Voice I/O: Wake-word detection, ASR (Automatic Speech Recognition), TTS (Text-to-Speech)
- LLM Infrastructure: Two separate LLM servers (4080 for work, 1050 for family)
- Tools/MCP: Model Context Protocol (MCP) tool servers for weather, tasks, notes, etc.
- Clients/UI: Phone PWA and web LAN dashboard
- Safety/Memory: Long-term memory, conversation management, safety controls
Component Architecture
┌─────────────────────────────────────────────────────────────┐
│ Clients Layer │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Phone PWA │ │ Web Dashboard│ │
│ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────────────────┼──────────────────┘
│ │
│ WebSocket/HTTP │
│ │
┌─────────┼──────────────────────────────┼──────────────────┐
│ │ Voice Stack │ │
│ ┌──────▼──────┐ ┌──────────┐ ┌─────▼──────┐ │
│ │ Wake-Word │ │ ASR │ │ TTS │ │
│ │ Node │─▶│ Service │ │ Service │ │
│ └─────────────┘ └────┬─────┘ └────────────┘ │
└────────────────────────┼────────────────────────────────┘
│
┌────────────────────────┼────────────────────────────────┐
│ │ LLM Infrastructure │
│ ┌──────▼──────┐ ┌──────────────┐ │
│ │ 4080 Server │ │ 1050 Server │ │
│ │ (Work Agent)│ │(Family Agent)│ │
│ └──────┬──────┘ └──────┬───────┘ │
│ │ │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌───────▼────────┐ │
│ │ Routing Layer │ │
│ └───────┬────────┘ │
└──────────────────────┼─────────────────────────────────┘
│
┌──────────────────────┼─────────────────────────────────┐
│ │ MCP Tools Layer │
│ ┌──────▼──────────┐ │
│ │ MCP Server │ │
│ │ ┌───────────┐ │ │
│ │ │ Weather │ │ │
│ │ │ Tasks │ │ │
│ │ │ Timers │ │ │
│ │ │ Notes │ │ │
│ │ └───────────┘ │ │
│ └──────────────────┘ │
└────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ │ Safety & Memory │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Memory │ │ Boundaries │ │
│ │ Store │ │ Enforcement │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
Technology Stack
- Languages: Python (backend services), TypeScript/JavaScript (clients)
- LLM Servers: Ollama, vLLM, or llama.cpp
- Work Agent (4080): Llama 3.1 70B Q4 (see
docs/LLM_MODEL_SURVEY.md) - Family Agent (1050): Phi-3 Mini 3.8B Q4 (see
docs/LLM_MODEL_SURVEY.md)
- Work Agent (4080): Llama 3.1 70B Q4 (see
- ASR: faster-whisper or Whisper.cpp
- TTS: Piper, Mimic 3, or Coqui TTS
- Wake-Word: openWakeWord (see
docs/WAKE_WORD_EVALUATION.mdfor details) - Protocols: MCP (Model Context Protocol), WebSocket, HTTP/gRPC
- MCP: JSON-RPC 2.0 protocol for tool integration (see
docs/MCP_ARCHITECTURE.md)
- MCP: JSON-RPC 2.0 protocol for tool integration (see
- ASR: faster-whisper (see
docs/ASR_EVALUATION.mdfor details) - Storage: SQLite (memory, sessions), Markdown files (tasks, notes)
- Infrastructure: Docker, systemd, Linux
LLM Model Selection
Model selection has been completed based on hardware capacity and requirements:
- Work Agent (RTX 4080): Llama 3.1 70B Q4 - Best overall capabilities for coding and research
- Family Agent (RTX 1050): Phi-3 Mini 3.8B Q4 - Excellent instruction following, low latency
See docs/LLM_MODEL_SURVEY.md for detailed model comparison and docs/LLM_CAPACITY.md for VRAM and context window analysis.
TTS Selection
For initial development, Piper has been selected as the primary Text-to-Speech (TTS) engine. This decision is based on its high performance, low resource requirements, and permissive license, which are ideal for prototyping and early-stage implementation. Coqui TTS is identified as a potential future upgrade for a high-quality voice when more resources can be allocated.
For a detailed comparison of all evaluated options, see the TTS Evaluation document.
Design Patterns
Core Patterns
- Microservices Architecture: Separate services for wake-word, ASR, TTS, LLM servers, MCP tools
- Event-Driven: Wake-word events trigger ASR capture, tool calls trigger actions
- API Gateway Pattern: Routing layer directs requests to appropriate LLM server
- Repository Pattern: Separate config repo for family agent (no work content)
- Tool Pattern: MCP tools as independent, composable services
Architectural Patterns
- Separation of Concerns: Clear boundaries between work and family agents
- Layered Architecture: Clients → Voice Stack → LLM Infrastructure → MCP Tools → Safety/Memory
- Service-Oriented: Each component is an independent service with defined APIs
- Privacy by Design: Local processing, minimal external dependencies
Project Structure
Repository Structure
This project uses a mono-repo for the main application code and a separate repository for family-specific configurations, ensuring a clean separation of concerns.
home-voice-agent (Mono-repo)
This repository contains all the code for the voice agent, its services, and clients.
home-voice-agent/
├── llm-servers/ # LLM inference servers
│ ├── 4080/ # Work agent server (e.g., Llama 70B)
│ └── 1050/ # Family agent server (e.g., Phi-2)
├── mcp-server/ # MCP (Model Context Protocol) tool server
│ └── tools/ # Individual tool implementations (e.g., weather, time)
├── wake-word/ # Wake-word detection node
├── asr/ # ASR (Automatic Speech Recognition) service
├── tts/ # TTS (Text-to-Speech) service
├── clients/ # Front-end applications
│ ├── phone/ # Phone PWA (Progressive Web App)
│ └── web-dashboard/ # Web-based administration dashboard
├── routing/ # LLM routing layer to direct requests
├── conversation/ # Conversation management and history
├── memory/ # Long-term memory storage and retrieval
├── safety/ # Safety, boundary enforcement, and content filtering
├── admin/ # Administration and monitoring tools
└── infrastructure/ # Deployment scripts, Dockerfiles, and IaC
family-agent-config (Configuration Repo)
This repository stores all personal and family-related configurations. It is kept separate to maintain privacy and prevent work-related data from mixing with family data.
family-agent-config/
├── prompts/ # System prompts and character definitions
├── tools/ # Tool configurations and settings
├── secrets/ # Credentials and API keys (e.g., weather API)
└── tasks/ # Markdown-based Kanban board for home tasks
└── home/ # Tasks for the home
Atlas Project (This Repo)
atlas/
├── tickets/ # Kanban tickets
│ ├── backlog/ # Future work
│ ├── todo/ # Ready to work on
│ ├── in-progress/ # Active work
│ ├── review/ # Awaiting review
│ └── done/ # Completed
├── docs/ # Documentation
├── ARCHITECTURE.md # This file
└── README.md # Project overview
Data Models
Memory Schema
Long-term memory stores personal facts, preferences, and routines:
MemoryEntry:
- id: str
- category: str # personal, family, preferences, routines
- content: str
- timestamp: datetime
- confidence: float
- source: str # conversation, explicit, inferred
Conversation Session
Session:
- session_id: str
- agent_type: str # "work" or "family"
- messages: List[Message]
- created_at: datetime
- last_activity: datetime
- summary: str # after summarization
Task Model (Markdown Kanban)
---
id: TICKET-XXX
title: Task title
status: backlog|todo|in-progress|review|done
priority: high|medium|low
created: YYYY-MM-DD
updated: YYYY-MM-DD
assignee: name
tags: [tag1, tag2]
---
Task description...
MCP Tool Definition
{
"name": "tool_name",
"description": "Tool description",
"inputSchema": {
"type": "object",
"properties": {...}
}
}
API Design
LLM Server API
Endpoint: POST /v1/chat/completions
{
"model": "work-agent" | "family-agent",
"messages": [...],
"tools": [...],
"temperature": 0.7
}
ASR Service API
Endpoint: WebSocket /asr/stream
- Input: Audio stream (PCM, 16kHz, mono)
- Output: Text segments with timestamps
{
"text": "transcribed text",
"timestamp": 1234.56,
"confidence": 0.95,
"is_final": false
}
TTS Service API
Endpoint: POST /tts/synthesize
{
"text": "Text to speak",
"voice": "family-voice",
"stream": true
}
Response: Audio stream (WAV or MP3)
MCP Server API
Protocol: JSON-RPC 2.0
Methods:
tools/list: List available toolstools/call: Execute a tool
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "weather",
"arguments": {...}
},
"id": 1
}
Security Considerations
Privacy Policy
- Core Principle: No external APIs for ASR/LLM processing
- Exception: Weather API (documented exception)
- Local Processing: All voice and LLM processing runs locally
- Data Retention: Configurable retention policies for conversations
Boundary Enforcement
- Repository Separation: Family agent config in separate repo, no work content
- Path Whitelists: Tools can only access whitelisted directories
- Network Isolation: Containers/namespaces prevent cross-access
- Firewall Rules: Block family agent from accessing work repo paths
- Static Analysis: CI/CD checks reject code that would grant cross-access
Confirmation Flows
- High-Risk Actions: Email send, calendar changes, file edits outside safe areas
- Confirmation Tokens: Signed tokens required from client, not just model intent
- User Approval: Explicit "Yes/No" confirmation for sensitive operations
- Audit Logging: All confirmations and high-risk actions logged
Authentication & Authorization
- Token-Based: Separate tokens for work vs family agents
- Revocation: System to disable compromised tokens/devices
- Admin Controls: Kill switches for services, tools, or entire agent
Performance Considerations
Latency Targets
- Wake-Word Detection: < 200ms
- ASR Processing: < 2s end-to-end (audio in → text out)
- LLM Response: < 3s for family agent (1050), < 5s for work agent (4080)
- TTS Synthesis: < 500ms first chunk, streaming thereafter
- Tool Execution: < 1s for simple tools (weather, time)
Resource Allocation
-
4080 (Work Agent):
- Model: 8-14B or 30B quantized (Q4-Q6)
- Context: 8K-16K tokens
- Concurrency: 2-4 requests
-
1050 (Family Agent):
- Model: 1B-3B quantized (Q4-Q5)
- Context: 4K-8K tokens
- Concurrency: 1-2 requests (always-on, low latency)
Optimization Strategies
- Model Quantization: Q4-Q6 for 4080, Q4-Q5 for 1050
- Context Management: Summarization and pruning for long conversations
- Caching: Weather API responses, tool results
- Streaming: ASR and TTS use streaming for lower perceived latency
- Batching: LLM requests where possible (work agent)
Deployment
Hardware Requirements
- RTX 4080 Server: Work agent LLM, ASR (optional)
- RTX 1050 Server: Family agent LLM (always-on)
- Wake-Word Node: Raspberry Pi 4+, NUC, or SFF PC
- Microphones: USB mics or array mic for living room/office
- Storage: SSD for logs, HDD for archives
Service Deployment
- LLM Servers: Systemd services or Docker containers
- MCP Server: Systemd service with auto-restart
- Voice Services: ASR and TTS as systemd services
- Wake-Word Node: Standalone service on dedicated hardware
- Clients: PWA served via web server, web dashboard on LAN
Configuration Management
- Family Agent Config: Separate
family-agent-config/repo - Secrets: Environment variables, separate .env files
- Prompts: Version-controlled in config repo
- Tool Configs: YAML/JSON files in config repo
Monitoring & Logging
- Structured Logging: JSON logs for all services
- Metrics: GPU usage, latency, error rates
- Admin Dashboard: Web UI for logs, metrics, controls
- Alerting: System notifications for errors or high resource usage
Development Workflow
Ticket-Based Development
- Select Ticket: Choose from
tickets/backlog/ortickets/todo/ - Check Dependencies: Review ticket dependencies before starting
- Move to In-Progress: Move ticket to
tickets/in-progress/ - Implement: Follow architecture patterns and conventions
- Test: Write and run tests
- Document: Update relevant documentation
- Move to Review: Move ticket to
tickets/review/when complete - Move to Done: Move to
tickets/done/after approval
Parallel Development
Many tickets can be worked on simultaneously:
- Voice I/O: Independent of LLM and MCP
- LLM Infrastructure: Can proceed after model selection
- MCP Tools: Can start with minimal server, add tools incrementally
- Clients/UI: Can mock APIs early, integrate later
Milestone Progression
- Milestone 1: Foundation and surveys (TICKET-002 through TICKET-029)
- Milestone 2: MVP with voice chat, weather, tasks (core functionality)
- Milestone 3: Memory, reminders, safety features
- Milestone 4: Optional integrations (email, calendar, smart home)
Future Considerations
Planned Enhancements
- Semantic Search: Add embeddings for note search (beyond ripgrep)
- Routine Learning: Automatically learn and suggest routines from memory
- Multi-Device: Support multiple wake-word nodes and clients
- Offline Mode: Enhanced offline capabilities for clients
- Voice Cloning: Custom voice profiles for family members
Technical Debt
- Start with basic implementations, optimize later
- Initial memory system: simple schema, enhance with better retrieval
- Tool permissions: Start with whitelists, add more granular control later
- Logging: Start with files, migrate to time-series DB if needed
Scalability
- Current design supports single household
- Future: Multi-user support with user-specific memory and preferences
- Future: Distributed deployment across multiple nodes
Related Documentation
Project Management
- Tickets: See
tickets/TICKETS_SUMMARY.mdfor all 46 tickets - Quick Start: See
tickets/QUICK_START.mdfor recommended starting order - Next Steps: See
tickets/NEXT_STEPS.mdfor current recommendations - Ticket Template: See
tickets/TICKET_TEMPLATE.mdfor creating new tickets
Technology Evaluations
- LLM Model Survey: See
docs/LLM_MODEL_SURVEY.mdfor model selection and comparison - LLM Capacity: See
docs/LLM_CAPACITY.mdfor VRAM and context window analysis - LLM Usage & Costs: See
docs/LLM_USAGE_AND_COSTS.mdfor operational cost analysis - Model Selection: See
docs/MODEL_SELECTION.mdfor final model choices - ASR Evaluation: See
docs/ASR_EVALUATION.mdfor ASR engine selection - MCP Architecture: See
docs/MCP_ARCHITECTURE.mdfor MCP protocol and integration - Implementation Guide: See
docs/IMPLEMENTATION_GUIDE.mdfor Milestone 2 implementation steps
Planning & Requirements
- Hardware: See
docs/HARDWARE.mdfor hardware requirements and purchase plan - Privacy Policy: See
docs/PRIVACY_POLICY.mdfor details on data handling - Safety Constraints: See
docs/SAFETY_CONSTRAINTS.mdfor details on security boundaries
Note: Update this document as the architecture evolves.