atlas/ARCHITECTURE.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

19 KiB

Architecture Documentation

Overview

This document describes the architecture, design patterns, and technical decisions for the Atlas home voice agent project.

Atlas is a local, privacy-focused voice agent system with separate work and family agents, running on dedicated hardware (RTX 4080 for work agent, RTX 1050 for family agent).

System Architecture

High-Level Design

The system consists of 5 parallel tracks:

  1. Voice I/O: Wake-word detection, ASR (Automatic Speech Recognition), TTS (Text-to-Speech)
  2. LLM Infrastructure: Two separate LLM servers (4080 for work, 1050 for family)
  3. Tools/MCP: Model Context Protocol (MCP) tool servers for weather, tasks, notes, etc.
  4. Clients/UI: Phone PWA and web LAN dashboard
  5. Safety/Memory: Long-term memory, conversation management, safety controls

Component Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Clients Layer                            │
│  ┌──────────────┐              ┌──────────────┐           │
│  │ Phone PWA    │              │ Web Dashboard│           │
│  └──────┬───────┘              └──────┬───────┘           │
└─────────┼──────────────────────────────┼──────────────────┘
          │                              │
          │ WebSocket/HTTP               │
          │                              │
┌─────────┼──────────────────────────────┼──────────────────┐
│         │      Voice Stack             │                  │
│  ┌──────▼──────┐  ┌──────────┐  ┌─────▼──────┐          │
│  │ Wake-Word   │  │   ASR    │  │    TTS     │          │
│  │   Node      │─▶│ Service  │  │  Service   │          │
│  └─────────────┘  └────┬─────┘  └────────────┘          │
└────────────────────────┼────────────────────────────────┘
                         │
┌────────────────────────┼────────────────────────────────┐
│         │      LLM Infrastructure                       │
│  ┌──────▼──────┐              ┌──────────────┐         │
│  │ 4080 Server │              │ 1050 Server  │         │
│  │ (Work Agent)│              │(Family Agent)│         │
│  └──────┬──────┘              └──────┬───────┘         │
│         │                            │                 │
│         └────────────┬───────────────┘                 │
│                      │                                 │
│              ┌───────▼────────┐                        │
│              │ Routing Layer  │                        │
│              └───────┬────────┘                        │
└──────────────────────┼─────────────────────────────────┘
                       │
┌──────────────────────┼─────────────────────────────────┐
│         │      MCP Tools Layer                        │
│  ┌──────▼──────────┐                                  │
│  │  MCP Server     │                                  │
│  │  ┌───────────┐  │                                  │
│  │  │ Weather   │  │                                  │
│  │  │ Tasks     │  │                                  │
│  │  │ Timers    │  │                                  │
│  │  │ Notes     │  │                                  │
│  │  └───────────┘  │                                  │
│  └──────────────────┘                                  │
└────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│         │      Safety & Memory                          │
│  ┌──────────────┐  ┌──────────────┐                    │
│  │   Memory     │  │  Boundaries  │                    │
│  │   Store      │  │  Enforcement │                    │
│  └──────────────┘  └──────────────┘                    │
└─────────────────────────────────────────────────────────┘

Technology Stack

  • Languages: Python (backend services), TypeScript/JavaScript (clients)
  • LLM Servers: Ollama, vLLM, or llama.cpp
    • Work Agent (4080): Llama 3.1 70B Q4 (see docs/LLM_MODEL_SURVEY.md)
    • Family Agent (1050): Phi-3 Mini 3.8B Q4 (see docs/LLM_MODEL_SURVEY.md)
  • ASR: faster-whisper or Whisper.cpp
  • TTS: Piper, Mimic 3, or Coqui TTS
  • Wake-Word: openWakeWord (see docs/WAKE_WORD_EVALUATION.md for details)
  • Protocols: MCP (Model Context Protocol), WebSocket, HTTP/gRPC
    • MCP: JSON-RPC 2.0 protocol for tool integration (see docs/MCP_ARCHITECTURE.md)
  • ASR: faster-whisper (see docs/ASR_EVALUATION.md for details)
  • Storage: SQLite (memory, sessions), Markdown files (tasks, notes)
  • Infrastructure: Docker, systemd, Linux

LLM Model Selection

Model selection has been completed based on hardware capacity and requirements:

  • Work Agent (RTX 4080): Llama 3.1 70B Q4 - Best overall capabilities for coding and research
  • Family Agent (RTX 1050): Phi-3 Mini 3.8B Q4 - Excellent instruction following, low latency

See docs/LLM_MODEL_SURVEY.md for detailed model comparison and docs/LLM_CAPACITY.md for VRAM and context window analysis.

TTS Selection

For initial development, Piper has been selected as the primary Text-to-Speech (TTS) engine. This decision is based on its high performance, low resource requirements, and permissive license, which are ideal for prototyping and early-stage implementation. Coqui TTS is identified as a potential future upgrade for a high-quality voice when more resources can be allocated.

For a detailed comparison of all evaluated options, see the TTS Evaluation document.

Design Patterns

Core Patterns

  • Microservices Architecture: Separate services for wake-word, ASR, TTS, LLM servers, MCP tools
  • Event-Driven: Wake-word events trigger ASR capture, tool calls trigger actions
  • API Gateway Pattern: Routing layer directs requests to appropriate LLM server
  • Repository Pattern: Separate config repo for family agent (no work content)
  • Tool Pattern: MCP tools as independent, composable services

Architectural Patterns

  • Separation of Concerns: Clear boundaries between work and family agents
  • Layered Architecture: Clients → Voice Stack → LLM Infrastructure → MCP Tools → Safety/Memory
  • Service-Oriented: Each component is an independent service with defined APIs
  • Privacy by Design: Local processing, minimal external dependencies

Project Structure

Repository Structure

This project uses a mono-repo for the main application code and a separate repository for family-specific configurations, ensuring a clean separation of concerns.

home-voice-agent (Mono-repo)

This repository contains all the code for the voice agent, its services, and clients.

home-voice-agent/
├── llm-servers/          # LLM inference servers
│   ├── 4080/             # Work agent server (e.g., Llama 70B)
│   └── 1050/             # Family agent server (e.g., Phi-2)
├── mcp-server/           # MCP (Model Context Protocol) tool server
│   └── tools/            # Individual tool implementations (e.g., weather, time)
├── wake-word/            # Wake-word detection node
├── asr/                  # ASR (Automatic Speech Recognition) service
├── tts/                  # TTS (Text-to-Speech) service
├── clients/              # Front-end applications
│   ├── phone/            # Phone PWA (Progressive Web App)
│   └── web-dashboard/    # Web-based administration dashboard
├── routing/              # LLM routing layer to direct requests
├── conversation/         # Conversation management and history
├── memory/               # Long-term memory storage and retrieval
├── safety/               # Safety, boundary enforcement, and content filtering
├── admin/                # Administration and monitoring tools
└── infrastructure/       # Deployment scripts, Dockerfiles, and IaC

family-agent-config (Configuration Repo)

This repository stores all personal and family-related configurations. It is kept separate to maintain privacy and prevent work-related data from mixing with family data.

family-agent-config/
├── prompts/              # System prompts and character definitions
├── tools/                # Tool configurations and settings
├── secrets/              # Credentials and API keys (e.g., weather API)
└── tasks/                # Markdown-based Kanban board for home tasks
    └── home/             # Tasks for the home

Atlas Project (This Repo)

atlas/
├── tickets/              # Kanban tickets
│   ├── backlog/         # Future work
│   ├── todo/            # Ready to work on
│   ├── in-progress/     # Active work
│   ├── review/          # Awaiting review
│   └── done/            # Completed
├── docs/                 # Documentation
├── ARCHITECTURE.md       # This file
└── README.md             # Project overview

Data Models

Memory Schema

Long-term memory stores personal facts, preferences, and routines:

MemoryEntry:
  - id: str
  - category: str  # personal, family, preferences, routines
  - content: str
  - timestamp: datetime
  - confidence: float
  - source: str  # conversation, explicit, inferred

Conversation Session

Session:
  - session_id: str
  - agent_type: str  # "work" or "family"
  - messages: List[Message]
  - created_at: datetime
  - last_activity: datetime
  - summary: str  # after summarization

Task Model (Markdown Kanban)

---
id: TICKET-XXX
title: Task title
status: backlog|todo|in-progress|review|done
priority: high|medium|low
created: YYYY-MM-DD
updated: YYYY-MM-DD
assignee: name
tags: [tag1, tag2]
---

Task description...

MCP Tool Definition

{
  "name": "tool_name",
  "description": "Tool description",
  "inputSchema": {
    "type": "object",
    "properties": {...}
  }
}

API Design

LLM Server API

Endpoint: POST /v1/chat/completions

{
  "model": "work-agent" | "family-agent",
  "messages": [...],
  "tools": [...],
  "temperature": 0.7
}

ASR Service API

Endpoint: WebSocket /asr/stream

  • Input: Audio stream (PCM, 16kHz, mono)
  • Output: Text segments with timestamps
{
  "text": "transcribed text",
  "timestamp": 1234.56,
  "confidence": 0.95,
  "is_final": false
}

TTS Service API

Endpoint: POST /tts/synthesize

{
  "text": "Text to speak",
  "voice": "family-voice",
  "stream": true
}

Response: Audio stream (WAV or MP3)

MCP Server API

Protocol: JSON-RPC 2.0

Methods:

  • tools/list: List available tools
  • tools/call: Execute a tool
{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "weather",
    "arguments": {...}
  },
  "id": 1
}

Security Considerations

Privacy Policy

  • Core Principle: No external APIs for ASR/LLM processing
  • Exception: Weather API (documented exception)
  • Local Processing: All voice and LLM processing runs locally
  • Data Retention: Configurable retention policies for conversations

Boundary Enforcement

  • Repository Separation: Family agent config in separate repo, no work content
  • Path Whitelists: Tools can only access whitelisted directories
  • Network Isolation: Containers/namespaces prevent cross-access
  • Firewall Rules: Block family agent from accessing work repo paths
  • Static Analysis: CI/CD checks reject code that would grant cross-access

Confirmation Flows

  • High-Risk Actions: Email send, calendar changes, file edits outside safe areas
  • Confirmation Tokens: Signed tokens required from client, not just model intent
  • User Approval: Explicit "Yes/No" confirmation for sensitive operations
  • Audit Logging: All confirmations and high-risk actions logged

Authentication & Authorization

  • Token-Based: Separate tokens for work vs family agents
  • Revocation: System to disable compromised tokens/devices
  • Admin Controls: Kill switches for services, tools, or entire agent

Performance Considerations

Latency Targets

  • Wake-Word Detection: < 200ms
  • ASR Processing: < 2s end-to-end (audio in → text out)
  • LLM Response: < 3s for family agent (1050), < 5s for work agent (4080)
  • TTS Synthesis: < 500ms first chunk, streaming thereafter
  • Tool Execution: < 1s for simple tools (weather, time)

Resource Allocation

  • 4080 (Work Agent):

    • Model: 8-14B or 30B quantized (Q4-Q6)
    • Context: 8K-16K tokens
    • Concurrency: 2-4 requests
  • 1050 (Family Agent):

    • Model: 1B-3B quantized (Q4-Q5)
    • Context: 4K-8K tokens
    • Concurrency: 1-2 requests (always-on, low latency)

Optimization Strategies

  • Model Quantization: Q4-Q6 for 4080, Q4-Q5 for 1050
  • Context Management: Summarization and pruning for long conversations
  • Caching: Weather API responses, tool results
  • Streaming: ASR and TTS use streaming for lower perceived latency
  • Batching: LLM requests where possible (work agent)

Deployment

Hardware Requirements

  • RTX 4080 Server: Work agent LLM, ASR (optional)
  • RTX 1050 Server: Family agent LLM (always-on)
  • Wake-Word Node: Raspberry Pi 4+, NUC, or SFF PC
  • Microphones: USB mics or array mic for living room/office
  • Storage: SSD for logs, HDD for archives

Service Deployment

  • LLM Servers: Systemd services or Docker containers
  • MCP Server: Systemd service with auto-restart
  • Voice Services: ASR and TTS as systemd services
  • Wake-Word Node: Standalone service on dedicated hardware
  • Clients: PWA served via web server, web dashboard on LAN

Configuration Management

  • Family Agent Config: Separate family-agent-config/ repo
  • Secrets: Environment variables, separate .env files
  • Prompts: Version-controlled in config repo
  • Tool Configs: YAML/JSON files in config repo

Monitoring & Logging

  • Structured Logging: JSON logs for all services
  • Metrics: GPU usage, latency, error rates
  • Admin Dashboard: Web UI for logs, metrics, controls
  • Alerting: System notifications for errors or high resource usage

Development Workflow

Ticket-Based Development

  1. Select Ticket: Choose from tickets/backlog/ or tickets/todo/
  2. Check Dependencies: Review ticket dependencies before starting
  3. Move to In-Progress: Move ticket to tickets/in-progress/
  4. Implement: Follow architecture patterns and conventions
  5. Test: Write and run tests
  6. Document: Update relevant documentation
  7. Move to Review: Move ticket to tickets/review/ when complete
  8. Move to Done: Move to tickets/done/ after approval

Parallel Development

Many tickets can be worked on simultaneously:

  • Voice I/O: Independent of LLM and MCP
  • LLM Infrastructure: Can proceed after model selection
  • MCP Tools: Can start with minimal server, add tools incrementally
  • Clients/UI: Can mock APIs early, integrate later

Milestone Progression

  • Milestone 1: Foundation and surveys (TICKET-002 through TICKET-029)
  • Milestone 2: MVP with voice chat, weather, tasks (core functionality)
  • Milestone 3: Memory, reminders, safety features
  • Milestone 4: Optional integrations (email, calendar, smart home)

Future Considerations

Planned Enhancements

  • Semantic Search: Add embeddings for note search (beyond ripgrep)
  • Routine Learning: Automatically learn and suggest routines from memory
  • Multi-Device: Support multiple wake-word nodes and clients
  • Offline Mode: Enhanced offline capabilities for clients
  • Voice Cloning: Custom voice profiles for family members

Technical Debt

  • Start with basic implementations, optimize later
  • Initial memory system: simple schema, enhance with better retrieval
  • Tool permissions: Start with whitelists, add more granular control later
  • Logging: Start with files, migrate to time-series DB if needed

Scalability

  • Current design supports single household
  • Future: Multi-user support with user-specific memory and preferences
  • Future: Distributed deployment across multiple nodes

Project Management

  • Tickets: See tickets/TICKETS_SUMMARY.md for all 46 tickets
  • Quick Start: See tickets/QUICK_START.md for recommended starting order
  • Next Steps: See tickets/NEXT_STEPS.md for current recommendations
  • Ticket Template: See tickets/TICKET_TEMPLATE.md for creating new tickets

Technology Evaluations

  • LLM Model Survey: See docs/LLM_MODEL_SURVEY.md for model selection and comparison
  • LLM Capacity: See docs/LLM_CAPACITY.md for VRAM and context window analysis
  • LLM Usage & Costs: See docs/LLM_USAGE_AND_COSTS.md for operational cost analysis
  • Model Selection: See docs/MODEL_SELECTION.md for final model choices
  • ASR Evaluation: See docs/ASR_EVALUATION.md for ASR engine selection
  • MCP Architecture: See docs/MCP_ARCHITECTURE.md for MCP protocol and integration
  • Implementation Guide: See docs/IMPLEMENTATION_GUIDE.md for Milestone 2 implementation steps

Planning & Requirements

  • Hardware: See docs/HARDWARE.md for hardware requirements and purchase plan
  • Privacy Policy: See docs/PRIVACY_POLICY.md for details on data handling
  • Safety Constraints: See docs/SAFETY_CONSTRAINTS.md for details on security boundaries

Note: Update this document as the architecture evolves.