440 lines
17 KiB
Markdown
440 lines
17 KiB
Markdown
# Architecture Documentation
|
|
|
|
## Overview
|
|
|
|
This document describes the architecture, design patterns, and technical decisions for the Atlas home voice agent project.
|
|
|
|
Atlas is a local, privacy-focused voice agent system with separate work and family agents, running on dedicated hardware (RTX 4080 for work agent, RTX 1050 for family agent).
|
|
|
|
## System Architecture
|
|
|
|
### High-Level Design
|
|
|
|
The system consists of 5 parallel tracks:
|
|
|
|
1. **Voice I/O**: Wake-word detection, ASR (Automatic Speech Recognition), TTS (Text-to-Speech)
|
|
2. **LLM Infrastructure**: Two separate LLM servers (4080 for work, 1050 for family)
|
|
3. **Tools/MCP**: Model Context Protocol (MCP) tool servers for weather, tasks, notes, etc.
|
|
4. **Clients/UI**: Phone PWA and web LAN dashboard
|
|
5. **Safety/Memory**: Long-term memory, conversation management, safety controls
|
|
|
|
### Component Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Clients Layer │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Phone PWA │ │ Web Dashboard│ │
|
|
│ └──────┬───────┘ └──────┬───────┘ │
|
|
└─────────┼──────────────────────────────┼──────────────────┘
|
|
│ │
|
|
│ WebSocket/HTTP │
|
|
│ │
|
|
┌─────────┼──────────────────────────────┼──────────────────┐
|
|
│ │ Voice Stack │ │
|
|
│ ┌──────▼──────┐ ┌──────────┐ ┌─────▼──────┐ │
|
|
│ │ Wake-Word │ │ ASR │ │ TTS │ │
|
|
│ │ Node │─▶│ Service │ │ Service │ │
|
|
│ └─────────────┘ └────┬─────┘ └────────────┘ │
|
|
└────────────────────────┼────────────────────────────────┘
|
|
│
|
|
┌────────────────────────┼────────────────────────────────┐
|
|
│ │ LLM Infrastructure │
|
|
│ ┌──────▼──────┐ ┌──────────────┐ │
|
|
│ │ 4080 Server │ │ 1050 Server │ │
|
|
│ │ (Work Agent)│ │(Family Agent)│ │
|
|
│ └──────┬──────┘ └──────┬───────┘ │
|
|
│ │ │ │
|
|
│ └────────────┬───────────────┘ │
|
|
│ │ │
|
|
│ ┌───────▼────────┐ │
|
|
│ │ Routing Layer │ │
|
|
│ └───────┬────────┘ │
|
|
└──────────────────────┼─────────────────────────────────┘
|
|
│
|
|
┌──────────────────────┼─────────────────────────────────┐
|
|
│ │ MCP Tools Layer │
|
|
│ ┌──────▼──────────┐ │
|
|
│ │ MCP Server │ │
|
|
│ │ ┌───────────┐ │ │
|
|
│ │ │ Weather │ │ │
|
|
│ │ │ Tasks │ │ │
|
|
│ │ │ Timers │ │ │
|
|
│ │ │ Notes │ │ │
|
|
│ │ └───────────┘ │ │
|
|
│ └──────────────────┘ │
|
|
└────────────────────────────────────────────────────────┘
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ │ Safety & Memory │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Memory │ │ Boundaries │ │
|
|
│ │ Store │ │ Enforcement │ │
|
|
│ └──────────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Technology Stack
|
|
|
|
- **Languages**: Python (backend services), TypeScript/JavaScript (clients)
|
|
- **LLM Servers**: Ollama, vLLM, or llama.cpp
|
|
- **ASR**: faster-whisper or Whisper.cpp
|
|
- **TTS**: Piper, Mimic 3, or Coqui TTS
|
|
- **Wake-Word**: openWakeWord or Porcupine
|
|
- **Protocols**: MCP (Model Context Protocol), WebSocket, HTTP/gRPC
|
|
- **Storage**: SQLite (memory, sessions), Markdown files (tasks, notes)
|
|
- **Infrastructure**: Docker, systemd, Linux
|
|
|
|
## Design Patterns
|
|
|
|
### Core Patterns
|
|
|
|
- **Microservices Architecture**: Separate services for wake-word, ASR, TTS, LLM servers, MCP tools
|
|
- **Event-Driven**: Wake-word events trigger ASR capture, tool calls trigger actions
|
|
- **API Gateway Pattern**: Routing layer directs requests to appropriate LLM server
|
|
- **Repository Pattern**: Separate config repo for family agent (no work content)
|
|
- **Tool Pattern**: MCP tools as independent, composable services
|
|
|
|
### Architectural Patterns
|
|
|
|
- **Separation of Concerns**: Clear boundaries between work and family agents
|
|
- **Layered Architecture**: Clients → Voice Stack → LLM Infrastructure → MCP Tools → Safety/Memory
|
|
- **Service-Oriented**: Each component is an independent service with defined APIs
|
|
- **Privacy by Design**: Local processing, minimal external dependencies
|
|
|
|
## Project Structure
|
|
|
|
### Repository Structure
|
|
|
|
This project uses a mono-repo for the main application code and a separate repository for family-specific configurations, ensuring a clean separation of concerns.
|
|
|
|
#### `home-voice-agent` (Mono-repo)
|
|
|
|
This repository contains all the code for the voice agent, its services, and clients.
|
|
|
|
```
|
|
home-voice-agent/
|
|
├── llm-servers/ # LLM inference servers
|
|
│ ├── 4080/ # Work agent server (e.g., Llama 70B)
|
|
│ └── 1050/ # Family agent server (e.g., Phi-2)
|
|
├── mcp-server/ # MCP (Model Context Protocol) tool server
|
|
│ └── tools/ # Individual tool implementations (e.g., weather, time)
|
|
├── wake-word/ # Wake-word detection node
|
|
├── asr/ # ASR (Automatic Speech Recognition) service
|
|
├── tts/ # TTS (Text-to-Speech) service
|
|
├── clients/ # Front-end applications
|
|
│ ├── phone/ # Phone PWA (Progressive Web App)
|
|
│ └── web-dashboard/ # Web-based administration dashboard
|
|
├── routing/ # LLM routing layer to direct requests
|
|
├── conversation/ # Conversation management and history
|
|
├── memory/ # Long-term memory storage and retrieval
|
|
├── safety/ # Safety, boundary enforcement, and content filtering
|
|
├── admin/ # Administration and monitoring tools
|
|
└── infrastructure/ # Deployment scripts, Dockerfiles, and IaC
|
|
```
|
|
|
|
#### `family-agent-config` (Configuration Repo)
|
|
|
|
This repository stores all personal and family-related configurations. It is kept separate to maintain privacy and prevent work-related data from mixing with family data.
|
|
|
|
```
|
|
family-agent-config/
|
|
├── prompts/ # System prompts and character definitions
|
|
├── tools/ # Tool configurations and settings
|
|
├── secrets/ # Credentials and API keys (e.g., weather API)
|
|
└── tasks/ # Markdown-based Kanban board for home tasks
|
|
└── home/ # Tasks for the home
|
|
```
|
|
|
|
### Atlas Project (This Repo)
|
|
|
|
```
|
|
atlas/
|
|
├── tickets/ # Kanban tickets
|
|
│ ├── backlog/ # Future work
|
|
│ ├── todo/ # Ready to work on
|
|
│ ├── in-progress/ # Active work
|
|
│ ├── review/ # Awaiting review
|
|
│ └── done/ # Completed
|
|
├── docs/ # Documentation
|
|
├── ARCHITECTURE.md # This file
|
|
└── README.md # Project overview
|
|
```
|
|
|
|
## Data Models
|
|
|
|
### Memory Schema
|
|
|
|
Long-term memory stores personal facts, preferences, and routines:
|
|
|
|
```python
|
|
MemoryEntry:
|
|
- id: str
|
|
- category: str # personal, family, preferences, routines
|
|
- content: str
|
|
- timestamp: datetime
|
|
- confidence: float
|
|
- source: str # conversation, explicit, inferred
|
|
```
|
|
|
|
### Conversation Session
|
|
|
|
```python
|
|
Session:
|
|
- session_id: str
|
|
- agent_type: str # "work" or "family"
|
|
- messages: List[Message]
|
|
- created_at: datetime
|
|
- last_activity: datetime
|
|
- summary: str # after summarization
|
|
```
|
|
|
|
### Task Model (Markdown Kanban)
|
|
|
|
```yaml
|
|
---
|
|
id: TICKET-XXX
|
|
title: Task title
|
|
status: backlog|todo|in-progress|review|done
|
|
priority: high|medium|low
|
|
created: YYYY-MM-DD
|
|
updated: YYYY-MM-DD
|
|
assignee: name
|
|
tags: [tag1, tag2]
|
|
---
|
|
|
|
Task description...
|
|
```
|
|
|
|
### MCP Tool Definition
|
|
|
|
```json
|
|
{
|
|
"name": "tool_name",
|
|
"description": "Tool description",
|
|
"inputSchema": {
|
|
"type": "object",
|
|
"properties": {...}
|
|
}
|
|
}
|
|
```
|
|
|
|
## API Design
|
|
|
|
### LLM Server API
|
|
|
|
**Endpoint**: `POST /v1/chat/completions`
|
|
|
|
```json
|
|
{
|
|
"model": "work-agent" | "family-agent",
|
|
"messages": [...],
|
|
"tools": [...],
|
|
"temperature": 0.7
|
|
}
|
|
```
|
|
|
|
### ASR Service API
|
|
|
|
**Endpoint**: `WebSocket /asr/stream`
|
|
|
|
- Input: Audio stream (PCM, 16kHz, mono)
|
|
- Output: Text segments with timestamps
|
|
|
|
```json
|
|
{
|
|
"text": "transcribed text",
|
|
"timestamp": 1234.56,
|
|
"confidence": 0.95,
|
|
"is_final": false
|
|
}
|
|
```
|
|
|
|
### TTS Service API
|
|
|
|
**Endpoint**: `POST /tts/synthesize`
|
|
|
|
```json
|
|
{
|
|
"text": "Text to speak",
|
|
"voice": "family-voice",
|
|
"stream": true
|
|
}
|
|
```
|
|
|
|
Response: Audio stream (WAV or MP3)
|
|
|
|
### MCP Server API
|
|
|
|
**Protocol**: JSON-RPC 2.0
|
|
|
|
**Methods**:
|
|
- `tools/list`: List available tools
|
|
- `tools/call`: Execute a tool
|
|
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"method": "tools/call",
|
|
"params": {
|
|
"name": "weather",
|
|
"arguments": {...}
|
|
},
|
|
"id": 1
|
|
}
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Privacy Policy
|
|
|
|
- **Core Principle**: No external APIs for ASR/LLM processing
|
|
- **Exception**: Weather API (documented exception)
|
|
- **Local Processing**: All voice and LLM processing runs locally
|
|
- **Data Retention**: Configurable retention policies for conversations
|
|
|
|
### Boundary Enforcement
|
|
|
|
- **Repository Separation**: Family agent config in separate repo, no work content
|
|
- **Path Whitelists**: Tools can only access whitelisted directories
|
|
- **Network Isolation**: Containers/namespaces prevent cross-access
|
|
- **Firewall Rules**: Block family agent from accessing work repo paths
|
|
- **Static Analysis**: CI/CD checks reject code that would grant cross-access
|
|
|
|
### Confirmation Flows
|
|
|
|
- **High-Risk Actions**: Email send, calendar changes, file edits outside safe areas
|
|
- **Confirmation Tokens**: Signed tokens required from client, not just model intent
|
|
- **User Approval**: Explicit "Yes/No" confirmation for sensitive operations
|
|
- **Audit Logging**: All confirmations and high-risk actions logged
|
|
|
|
### Authentication & Authorization
|
|
|
|
- **Token-Based**: Separate tokens for work vs family agents
|
|
- **Revocation**: System to disable compromised tokens/devices
|
|
- **Admin Controls**: Kill switches for services, tools, or entire agent
|
|
|
|
## Performance Considerations
|
|
|
|
### Latency Targets
|
|
|
|
- **Wake-Word Detection**: < 200ms
|
|
- **ASR Processing**: < 2s end-to-end (audio in → text out)
|
|
- **LLM Response**: < 3s for family agent (1050), < 5s for work agent (4080)
|
|
- **TTS Synthesis**: < 500ms first chunk, streaming thereafter
|
|
- **Tool Execution**: < 1s for simple tools (weather, time)
|
|
|
|
### Resource Allocation
|
|
|
|
- **4080 (Work Agent)**:
|
|
- Model: 8-14B or 30B quantized (Q4-Q6)
|
|
- Context: 8K-16K tokens
|
|
- Concurrency: 2-4 requests
|
|
|
|
- **1050 (Family Agent)**:
|
|
- Model: 1B-3B quantized (Q4-Q5)
|
|
- Context: 4K-8K tokens
|
|
- Concurrency: 1-2 requests (always-on, low latency)
|
|
|
|
### Optimization Strategies
|
|
|
|
- **Model Quantization**: Q4-Q6 for 4080, Q4-Q5 for 1050
|
|
- **Context Management**: Summarization and pruning for long conversations
|
|
- **Caching**: Weather API responses, tool results
|
|
- **Streaming**: ASR and TTS use streaming for lower perceived latency
|
|
- **Batching**: LLM requests where possible (work agent)
|
|
|
|
## Deployment
|
|
|
|
### Hardware Requirements
|
|
|
|
- **RTX 4080 Server**: Work agent LLM, ASR (optional)
|
|
- **RTX 1050 Server**: Family agent LLM (always-on)
|
|
- **Wake-Word Node**: Raspberry Pi 4+, NUC, or SFF PC
|
|
- **Microphones**: USB mics or array mic for living room/office
|
|
- **Storage**: SSD for logs, HDD for archives
|
|
|
|
### Service Deployment
|
|
|
|
- **LLM Servers**: Systemd services or Docker containers
|
|
- **MCP Server**: Systemd service with auto-restart
|
|
- **Voice Services**: ASR and TTS as systemd services
|
|
- **Wake-Word Node**: Standalone service on dedicated hardware
|
|
- **Clients**: PWA served via web server, web dashboard on LAN
|
|
|
|
### Configuration Management
|
|
|
|
- **Family Agent Config**: Separate `family-agent-config/` repo
|
|
- **Secrets**: Environment variables, separate .env files
|
|
- **Prompts**: Version-controlled in config repo
|
|
- **Tool Configs**: YAML/JSON files in config repo
|
|
|
|
### Monitoring & Logging
|
|
|
|
- **Structured Logging**: JSON logs for all services
|
|
- **Metrics**: GPU usage, latency, error rates
|
|
- **Admin Dashboard**: Web UI for logs, metrics, controls
|
|
- **Alerting**: System notifications for errors or high resource usage
|
|
|
|
## Development Workflow
|
|
|
|
### Ticket-Based Development
|
|
|
|
1. **Select Ticket**: Choose from `tickets/backlog/` or `tickets/todo/`
|
|
2. **Check Dependencies**: Review ticket dependencies before starting
|
|
3. **Move to In-Progress**: Move ticket to `tickets/in-progress/`
|
|
4. **Implement**: Follow architecture patterns and conventions
|
|
5. **Test**: Write and run tests
|
|
6. **Document**: Update relevant documentation
|
|
7. **Move to Review**: Move ticket to `tickets/review/` when complete
|
|
8. **Move to Done**: Move to `tickets/done/` after approval
|
|
|
|
### Parallel Development
|
|
|
|
Many tickets can be worked on simultaneously:
|
|
- **Voice I/O**: Independent of LLM and MCP
|
|
- **LLM Infrastructure**: Can proceed after model selection
|
|
- **MCP Tools**: Can start with minimal server, add tools incrementally
|
|
- **Clients/UI**: Can mock APIs early, integrate later
|
|
|
|
### Milestone Progression
|
|
|
|
- **Milestone 1**: Foundation and surveys (TICKET-002 through TICKET-029)
|
|
- **Milestone 2**: MVP with voice chat, weather, tasks (core functionality)
|
|
- **Milestone 3**: Memory, reminders, safety features
|
|
- **Milestone 4**: Optional integrations (email, calendar, smart home)
|
|
|
|
## Future Considerations
|
|
|
|
### Planned Enhancements
|
|
|
|
- **Semantic Search**: Add embeddings for note search (beyond ripgrep)
|
|
- **Routine Learning**: Automatically learn and suggest routines from memory
|
|
- **Multi-Device**: Support multiple wake-word nodes and clients
|
|
- **Offline Mode**: Enhanced offline capabilities for clients
|
|
- **Voice Cloning**: Custom voice profiles for family members
|
|
|
|
### Technical Debt
|
|
|
|
- Start with basic implementations, optimize later
|
|
- Initial memory system: simple schema, enhance with better retrieval
|
|
- Tool permissions: Start with whitelists, add more granular control later
|
|
- Logging: Start with files, migrate to time-series DB if needed
|
|
|
|
### Scalability
|
|
|
|
- Current design supports single household
|
|
- Future: Multi-user support with user-specific memory and preferences
|
|
- Future: Distributed deployment across multiple nodes
|
|
|
|
## Related Documentation
|
|
|
|
- **Tickets**: See `tickets/TICKETS_SUMMARY.md` for all 46 tickets
|
|
- **Quick Start**: See `tickets/QUICK_START.md` for recommended starting order
|
|
- **Ticket Template**: See `tickets/TICKET_TEMPLATE.md` for creating new tickets
|
|
- **Privacy Policy**: See `docs/PRIVACY_POLICY.md` for details on data handling.
|
|
- **Safety Constraints**: See `docs/SAFETY_CONSTRAINTS.md` for details on security boundaries.
|
|
|
|
---
|
|
|
|
**Note**: Update this document as the architecture evolves.
|