- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
473 lines
19 KiB
Markdown
473 lines
19 KiB
Markdown
# Architecture Documentation
|
|
|
|
## Overview
|
|
|
|
This document describes the architecture, design patterns, and technical decisions for the Atlas home voice agent project.
|
|
|
|
Atlas is a local, privacy-focused voice agent system with separate work and family agents, running on dedicated hardware (RTX 4080 for work agent, RTX 1050 for family agent).
|
|
|
|
## System Architecture
|
|
|
|
### High-Level Design
|
|
|
|
The system consists of 5 parallel tracks:
|
|
|
|
1. **Voice I/O**: Wake-word detection, ASR (Automatic Speech Recognition), TTS (Text-to-Speech)
|
|
2. **LLM Infrastructure**: Two separate LLM servers (4080 for work, 1050 for family)
|
|
3. **Tools/MCP**: Model Context Protocol (MCP) tool servers for weather, tasks, notes, etc.
|
|
4. **Clients/UI**: Phone PWA and web LAN dashboard
|
|
5. **Safety/Memory**: Long-term memory, conversation management, safety controls
|
|
|
|
### Component Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Clients Layer │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Phone PWA │ │ Web Dashboard│ │
|
|
│ └──────┬───────┘ └──────┬───────┘ │
|
|
└─────────┼──────────────────────────────┼──────────────────┘
|
|
│ │
|
|
│ WebSocket/HTTP │
|
|
│ │
|
|
┌─────────┼──────────────────────────────┼──────────────────┐
|
|
│ │ Voice Stack │ │
|
|
│ ┌──────▼──────┐ ┌──────────┐ ┌─────▼──────┐ │
|
|
│ │ Wake-Word │ │ ASR │ │ TTS │ │
|
|
│ │ Node │─▶│ Service │ │ Service │ │
|
|
│ └─────────────┘ └────┬─────┘ └────────────┘ │
|
|
└────────────────────────┼────────────────────────────────┘
|
|
│
|
|
┌────────────────────────┼────────────────────────────────┐
|
|
│ │ LLM Infrastructure │
|
|
│ ┌──────▼──────┐ ┌──────────────┐ │
|
|
│ │ 4080 Server │ │ 1050 Server │ │
|
|
│ │ (Work Agent)│ │(Family Agent)│ │
|
|
│ └──────┬──────┘ └──────┬───────┘ │
|
|
│ │ │ │
|
|
│ └────────────┬───────────────┘ │
|
|
│ │ │
|
|
│ ┌───────▼────────┐ │
|
|
│ │ Routing Layer │ │
|
|
│ └───────┬────────┘ │
|
|
└──────────────────────┼─────────────────────────────────┘
|
|
│
|
|
┌──────────────────────┼─────────────────────────────────┐
|
|
│ │ MCP Tools Layer │
|
|
│ ┌──────▼──────────┐ │
|
|
│ │ MCP Server │ │
|
|
│ │ ┌───────────┐ │ │
|
|
│ │ │ Weather │ │ │
|
|
│ │ │ Tasks │ │ │
|
|
│ │ │ Timers │ │ │
|
|
│ │ │ Notes │ │ │
|
|
│ │ └───────────┘ │ │
|
|
│ └──────────────────┘ │
|
|
└────────────────────────────────────────────────────────┘
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ │ Safety & Memory │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Memory │ │ Boundaries │ │
|
|
│ │ Store │ │ Enforcement │ │
|
|
│ └──────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Technology Stack
|
|
|
|
- **Languages**: Python (backend services), TypeScript/JavaScript (clients)
|
|
- **LLM Servers**: Ollama, vLLM, or llama.cpp
|
|
- **Work Agent (4080)**: Llama 3.1 70B Q4 (see `docs/LLM_MODEL_SURVEY.md`)
|
|
- **Family Agent (1050)**: Phi-3 Mini 3.8B Q4 (see `docs/LLM_MODEL_SURVEY.md`)
|
|
- **ASR**: faster-whisper or Whisper.cpp
|
|
- **TTS**: Piper, Mimic 3, or Coqui TTS
|
|
- **Wake-Word**: openWakeWord (see `docs/WAKE_WORD_EVALUATION.md` for details)
|
|
- **Protocols**: MCP (Model Context Protocol), WebSocket, HTTP/gRPC
|
|
- **MCP**: JSON-RPC 2.0 protocol for tool integration (see `docs/MCP_ARCHITECTURE.md`)
|
|
- **ASR**: faster-whisper (see `docs/ASR_EVALUATION.md` for details)
|
|
- **Storage**: SQLite (memory, sessions), Markdown files (tasks, notes)
|
|
- **Infrastructure**: Docker, systemd, Linux
|
|
|
|
### LLM Model Selection
|
|
|
|
Model selection has been completed based on hardware capacity and requirements:
|
|
|
|
- **Work Agent (RTX 4080)**: Llama 3.1 70B Q4 - Best overall capabilities for coding and research
|
|
- **Family Agent (RTX 1050)**: Phi-3 Mini 3.8B Q4 - Excellent instruction following, low latency
|
|
|
|
See `docs/LLM_MODEL_SURVEY.md` for detailed model comparison and `docs/LLM_CAPACITY.md` for VRAM and context window analysis.
|
|
|
|
### TTS Selection
|
|
|
|
For initial development, **Piper** has been selected as the primary Text-to-Speech (TTS) engine. This decision is based on its high performance, low resource requirements, and permissive license, which are ideal for prototyping and early-stage implementation. **Coqui TTS** is identified as a potential future upgrade for a high-quality voice when more resources can be allocated.
|
|
|
|
For a detailed comparison of all evaluated options, see the [TTS Evaluation document](docs/TTS_EVALUATION.md).
|
|
|
|
## Design Patterns
|
|
|
|
### Core Patterns
|
|
|
|
- **Microservices Architecture**: Separate services for wake-word, ASR, TTS, LLM servers, MCP tools
|
|
- **Event-Driven**: Wake-word events trigger ASR capture, tool calls trigger actions
|
|
- **API Gateway Pattern**: Routing layer directs requests to appropriate LLM server
|
|
- **Repository Pattern**: Separate config repo for family agent (no work content)
|
|
- **Tool Pattern**: MCP tools as independent, composable services
|
|
|
|
### Architectural Patterns
|
|
|
|
- **Separation of Concerns**: Clear boundaries between work and family agents
|
|
- **Layered Architecture**: Clients → Voice Stack → LLM Infrastructure → MCP Tools → Safety/Memory
|
|
- **Service-Oriented**: Each component is an independent service with defined APIs
|
|
- **Privacy by Design**: Local processing, minimal external dependencies
|
|
|
|
## Project Structure
|
|
|
|
### Repository Structure
|
|
|
|
This project uses a mono-repo for the main application code and a separate repository for family-specific configurations, ensuring a clean separation of concerns.
|
|
|
|
#### `home-voice-agent` (Mono-repo)
|
|
|
|
This repository contains all the code for the voice agent, its services, and clients.
|
|
|
|
```
|
|
home-voice-agent/
|
|
├── llm-servers/ # LLM inference servers
|
|
│ ├── 4080/ # Work agent server (e.g., Llama 70B)
|
|
│ └── 1050/ # Family agent server (e.g., Phi-2)
|
|
├── mcp-server/ # MCP (Model Context Protocol) tool server
|
|
│ └── tools/ # Individual tool implementations (e.g., weather, time)
|
|
├── wake-word/ # Wake-word detection node
|
|
├── asr/ # ASR (Automatic Speech Recognition) service
|
|
├── tts/ # TTS (Text-to-Speech) service
|
|
├── clients/ # Front-end applications
|
|
│ ├── phone/ # Phone PWA (Progressive Web App)
|
|
│ └── web-dashboard/ # Web-based administration dashboard
|
|
├── routing/ # LLM routing layer to direct requests
|
|
├── conversation/ # Conversation management and history
|
|
├── memory/ # Long-term memory storage and retrieval
|
|
├── safety/ # Safety, boundary enforcement, and content filtering
|
|
├── admin/ # Administration and monitoring tools
|
|
└── infrastructure/ # Deployment scripts, Dockerfiles, and IaC
|
|
```
|
|
|
|
#### `family-agent-config` (Configuration Repo)
|
|
|
|
This repository stores all personal and family-related configurations. It is kept separate to maintain privacy and prevent work-related data from mixing with family data.
|
|
|
|
```
|
|
family-agent-config/
|
|
├── prompts/ # System prompts and character definitions
|
|
├── tools/ # Tool configurations and settings
|
|
├── secrets/ # Credentials and API keys (e.g., weather API)
|
|
└── tasks/ # Markdown-based Kanban board for home tasks
|
|
└── home/ # Tasks for the home
|
|
```
|
|
|
|
### Atlas Project (This Repo)
|
|
|
|
```
|
|
atlas/
|
|
├── tickets/ # Kanban tickets
|
|
│ ├── backlog/ # Future work
|
|
│ ├── todo/ # Ready to work on
|
|
│ ├── in-progress/ # Active work
|
|
│ ├── review/ # Awaiting review
|
|
│ └── done/ # Completed
|
|
├── docs/ # Documentation
|
|
├── ARCHITECTURE.md # This file
|
|
└── README.md # Project overview
|
|
```
|
|
|
|
## Data Models
|
|
|
|
### Memory Schema
|
|
|
|
Long-term memory stores personal facts, preferences, and routines:
|
|
|
|
```python
|
|
MemoryEntry:
|
|
- id: str
|
|
- category: str # personal, family, preferences, routines
|
|
- content: str
|
|
- timestamp: datetime
|
|
- confidence: float
|
|
- source: str # conversation, explicit, inferred
|
|
```
|
|
|
|
### Conversation Session
|
|
|
|
```python
|
|
Session:
|
|
- session_id: str
|
|
- agent_type: str # "work" or "family"
|
|
- messages: List[Message]
|
|
- created_at: datetime
|
|
- last_activity: datetime
|
|
- summary: str # after summarization
|
|
```
|
|
|
|
### Task Model (Markdown Kanban)
|
|
|
|
```yaml
|
|
---
|
|
id: TICKET-XXX
|
|
title: Task title
|
|
status: backlog|todo|in-progress|review|done
|
|
priority: high|medium|low
|
|
created: YYYY-MM-DD
|
|
updated: YYYY-MM-DD
|
|
assignee: name
|
|
tags: [tag1, tag2]
|
|
---
|
|
|
|
Task description...
|
|
```
|
|
|
|
### MCP Tool Definition
|
|
|
|
```json
|
|
{
|
|
"name": "tool_name",
|
|
"description": "Tool description",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {...}
|
|
}
|
|
}
|
|
```
|
|
|
|
## API Design
|
|
|
|
### LLM Server API
|
|
|
|
**Endpoint**: `POST /v1/chat/completions`
|
|
|
|
```json
|
|
{
|
|
"model": "work-agent" | "family-agent",
|
|
"messages": [...],
|
|
"tools": [...],
|
|
"temperature": 0.7
|
|
}
|
|
```
|
|
|
|
### ASR Service API
|
|
|
|
**Endpoint**: `WebSocket /asr/stream`
|
|
|
|
- Input: Audio stream (PCM, 16kHz, mono)
|
|
- Output: Text segments with timestamps
|
|
|
|
```json
|
|
{
|
|
"text": "transcribed text",
|
|
"timestamp": 1234.56,
|
|
"confidence": 0.95,
|
|
"is_final": false
|
|
}
|
|
```
|
|
|
|
### TTS Service API
|
|
|
|
**Endpoint**: `POST /tts/synthesize`
|
|
|
|
```json
|
|
{
|
|
"text": "Text to speak",
|
|
"voice": "family-voice",
|
|
"stream": true
|
|
}
|
|
```
|
|
|
|
Response: Audio stream (WAV or MP3)
|
|
|
|
### MCP Server API
|
|
|
|
**Protocol**: JSON-RPC 2.0
|
|
|
|
**Methods**:
|
|
- `tools/list`: List available tools
|
|
- `tools/call`: Execute a tool
|
|
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"method": "tools/call",
|
|
"params": {
|
|
"name": "weather",
|
|
"arguments": {...}
|
|
},
|
|
"id": 1
|
|
}
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Privacy Policy
|
|
|
|
- **Core Principle**: No external APIs for ASR/LLM processing
|
|
- **Exception**: Weather API (documented exception)
|
|
- **Local Processing**: All voice and LLM processing runs locally
|
|
- **Data Retention**: Configurable retention policies for conversations
|
|
|
|
### Boundary Enforcement
|
|
|
|
- **Repository Separation**: Family agent config in separate repo, no work content
|
|
- **Path Whitelists**: Tools can only access whitelisted directories
|
|
- **Network Isolation**: Containers/namespaces prevent cross-access
|
|
- **Firewall Rules**: Block family agent from accessing work repo paths
|
|
- **Static Analysis**: CI/CD checks reject code that would grant cross-access
|
|
|
|
### Confirmation Flows
|
|
|
|
- **High-Risk Actions**: Email send, calendar changes, file edits outside safe areas
|
|
- **Confirmation Tokens**: Signed tokens required from client, not just model intent
|
|
- **User Approval**: Explicit "Yes/No" confirmation for sensitive operations
|
|
- **Audit Logging**: All confirmations and high-risk actions logged
|
|
|
|
### Authentication & Authorization
|
|
|
|
- **Token-Based**: Separate tokens for work vs family agents
|
|
- **Revocation**: System to disable compromised tokens/devices
|
|
- **Admin Controls**: Kill switches for services, tools, or entire agent
|
|
|
|
## Performance Considerations
|
|
|
|
### Latency Targets
|
|
|
|
- **Wake-Word Detection**: < 200ms
|
|
- **ASR Processing**: < 2s end-to-end (audio in → text out)
|
|
- **LLM Response**: < 3s for family agent (1050), < 5s for work agent (4080)
|
|
- **TTS Synthesis**: < 500ms first chunk, streaming thereafter
|
|
- **Tool Execution**: < 1s for simple tools (weather, time)
|
|
|
|
### Resource Allocation
|
|
|
|
- **4080 (Work Agent)**:
|
|
- Model: 8-14B or 30B quantized (Q4-Q6)
|
|
- Context: 8K-16K tokens
|
|
- Concurrency: 2-4 requests
|
|
|
|
- **1050 (Family Agent)**:
|
|
- Model: 1B-3B quantized (Q4-Q5)
|
|
- Context: 4K-8K tokens
|
|
- Concurrency: 1-2 requests (always-on, low latency)
|
|
|
|
### Optimization Strategies
|
|
|
|
- **Model Quantization**: Q4-Q6 for 4080, Q4-Q5 for 1050
|
|
- **Context Management**: Summarization and pruning for long conversations
|
|
- **Caching**: Weather API responses, tool results
|
|
- **Streaming**: ASR and TTS use streaming for lower perceived latency
|
|
- **Batching**: LLM requests where possible (work agent)
|
|
|
|
## Deployment
|
|
|
|
### Hardware Requirements
|
|
|
|
- **RTX 4080 Server**: Work agent LLM, ASR (optional)
|
|
- **RTX 1050 Server**: Family agent LLM (always-on)
|
|
- **Wake-Word Node**: Raspberry Pi 4+, NUC, or SFF PC
|
|
- **Microphones**: USB mics or array mic for living room/office
|
|
- **Storage**: SSD for logs, HDD for archives
|
|
|
|
### Service Deployment
|
|
|
|
- **LLM Servers**: Systemd services or Docker containers
|
|
- **MCP Server**: Systemd service with auto-restart
|
|
- **Voice Services**: ASR and TTS as systemd services
|
|
- **Wake-Word Node**: Standalone service on dedicated hardware
|
|
- **Clients**: PWA served via web server, web dashboard on LAN
|
|
|
|
### Configuration Management
|
|
|
|
- **Family Agent Config**: Separate `family-agent-config/` repo
|
|
- **Secrets**: Environment variables, separate .env files
|
|
- **Prompts**: Version-controlled in config repo
|
|
- **Tool Configs**: YAML/JSON files in config repo
|
|
|
|
### Monitoring & Logging
|
|
|
|
- **Structured Logging**: JSON logs for all services
|
|
- **Metrics**: GPU usage, latency, error rates
|
|
- **Admin Dashboard**: Web UI for logs, metrics, controls
|
|
- **Alerting**: System notifications for errors or high resource usage
|
|
|
|
## Development Workflow
|
|
|
|
### Ticket-Based Development
|
|
|
|
1. **Select Ticket**: Choose from `tickets/backlog/` or `tickets/todo/`
|
|
2. **Check Dependencies**: Review ticket dependencies before starting
|
|
3. **Move to In-Progress**: Move ticket to `tickets/in-progress/`
|
|
4. **Implement**: Follow architecture patterns and conventions
|
|
5. **Test**: Write and run tests
|
|
6. **Document**: Update relevant documentation
|
|
7. **Move to Review**: Move ticket to `tickets/review/` when complete
|
|
8. **Move to Done**: Move to `tickets/done/` after approval
|
|
|
|
### Parallel Development
|
|
|
|
Many tickets can be worked on simultaneously:
|
|
- **Voice I/O**: Independent of LLM and MCP
|
|
- **LLM Infrastructure**: Can proceed after model selection
|
|
- **MCP Tools**: Can start with minimal server, add tools incrementally
|
|
- **Clients/UI**: Can mock APIs early, integrate later
|
|
|
|
### Milestone Progression
|
|
|
|
- **Milestone 1**: Foundation and surveys (TICKET-002 through TICKET-029)
|
|
- **Milestone 2**: MVP with voice chat, weather, tasks (core functionality)
|
|
- **Milestone 3**: Memory, reminders, safety features
|
|
- **Milestone 4**: Optional integrations (email, calendar, smart home)
|
|
|
|
## Future Considerations
|
|
|
|
### Planned Enhancements
|
|
|
|
- **Semantic Search**: Add embeddings for note search (beyond ripgrep)
|
|
- **Routine Learning**: Automatically learn and suggest routines from memory
|
|
- **Multi-Device**: Support multiple wake-word nodes and clients
|
|
- **Offline Mode**: Enhanced offline capabilities for clients
|
|
- **Voice Cloning**: Custom voice profiles for family members
|
|
|
|
### Technical Debt
|
|
|
|
- Start with basic implementations, optimize later
|
|
- Initial memory system: simple schema, enhance with better retrieval
|
|
- Tool permissions: Start with whitelists, add more granular control later
|
|
- Logging: Start with files, migrate to time-series DB if needed
|
|
|
|
### Scalability
|
|
|
|
- Current design supports single household
|
|
- Future: Multi-user support with user-specific memory and preferences
|
|
- Future: Distributed deployment across multiple nodes
|
|
|
|
## Related Documentation
|
|
|
|
### Project Management
|
|
- **Tickets**: See `tickets/TICKETS_SUMMARY.md` for all 46 tickets
|
|
- **Quick Start**: See `tickets/QUICK_START.md` for recommended starting order
|
|
- **Next Steps**: See `tickets/NEXT_STEPS.md` for current recommendations
|
|
- **Ticket Template**: See `tickets/TICKET_TEMPLATE.md` for creating new tickets
|
|
|
|
### Technology Evaluations
|
|
- **LLM Model Survey**: See `docs/LLM_MODEL_SURVEY.md` for model selection and comparison
|
|
- **LLM Capacity**: See `docs/LLM_CAPACITY.md` for VRAM and context window analysis
|
|
- **LLM Usage & Costs**: See `docs/LLM_USAGE_AND_COSTS.md` for operational cost analysis
|
|
- **Model Selection**: See `docs/MODEL_SELECTION.md` for final model choices
|
|
- **ASR Evaluation**: See `docs/ASR_EVALUATION.md` for ASR engine selection
|
|
- **MCP Architecture**: See `docs/MCP_ARCHITECTURE.md` for MCP protocol and integration
|
|
- **Implementation Guide**: See `docs/IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps
|
|
|
|
### Planning & Requirements
|
|
- **Hardware**: See `docs/HARDWARE.md` for hardware requirements and purchase plan
|
|
- **Privacy Policy**: See `docs/PRIVACY_POLICY.md` for details on data handling
|
|
- **Safety Constraints**: See `docs/SAFETY_CONSTRAINTS.md` for details on security boundaries
|
|
|
|
---
|
|
|
|
**Note**: Update this document as the architecture evolves.
|