# Architecture Documentation

## Overview

This document describes the architecture, design patterns, and technical decisions for the Atlas home voice agent project.

Atlas is a local, privacy-focused voice agent system with separate work and family agents, running on dedicated hardware (RTX 4080 for work agent, RTX 1050 for family agent).

## System Architecture

### High-Level Design

The system consists of 5 parallel tracks:

1. **Voice I/O**: Wake-word detection, ASR (Automatic Speech Recognition), TTS (Text-to-Speech)
2. **LLM Infrastructure**: Two separate LLM servers (4080 for work, 1050 for family)
3. **Tools/MCP**: Model Context Protocol (MCP) tool servers for weather, tasks, notes, etc.
4. **Clients/UI**: Phone PWA and web LAN dashboard
5. **Safety/Memory**: Long-term memory, conversation management, safety controls

### Component Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    Clients Layer                            │
│  ┌──────────────┐              ┌──────────────┐           │
│  │ Phone PWA    │              │ Web Dashboard│           │
│  └──────┬───────┘              └──────┬───────┘           │
└─────────┼──────────────────────────────┼──────────────────┘
          │                              │
          │ WebSocket/HTTP               │
          │                              │
┌─────────┼──────────────────────────────┼──────────────────┐
│         │      Voice Stack             │                  │
│  ┌──────▼──────┐  ┌──────────┐  ┌─────▼──────┐          │
│  │ Wake-Word   │  │   ASR    │  │    TTS     │          │
│  │   Node      │─▶│ Service  │  │  Service   │          │
│  └─────────────┘  └────┬─────┘  └────────────┘          │
└────────────────────────┼────────────────────────────────┘
                         │
┌────────────────────────┼────────────────────────────────┐
│         │      LLM Infrastructure                       │
│  ┌──────▼──────┐              ┌──────────────┐         │
│  │ 4080 Server │              │ 1050 Server  │         │
│  │ (Work Agent)│              │(Family Agent)│         │
│  └──────┬──────┘              └──────┬───────┘         │
│         │                            │                 │
│         └────────────┬───────────────┘                 │
│                      │                                 │
│              ┌───────▼────────┐                        │
│              │ Routing Layer  │                        │
│              └───────┬────────┘                        │
└──────────────────────┼─────────────────────────────────┘
                       │
┌──────────────────────┼─────────────────────────────────┐
│         │      MCP Tools Layer                        │
│  ┌──────▼──────────┐                                  │
│  │  MCP Server     │                                  │
│  │  ┌───────────┐  │                                  │
│  │  │ Weather   │  │                                  │
│  │  │ Tasks     │  │                                  │
│  │  │ Timers    │  │                                  │
│  │  │ Notes     │  │                                  │
│  │  └───────────┘  │                                  │
│  └──────────────────┘                                  │
└────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│         │      Safety & Memory                          │
│  ┌──────────────┐  ┌──────────────┐                    │
│  │   Memory     │  │  Boundaries  │                    │
│  │   Store      │  │  Enforcement │                    │
│  └──────────────┘  └──────────────┘                    │
└─────────────────────────────────────────────────────────┘
```

### Technology Stack

- **Languages**: Python (backend services), TypeScript/JavaScript (clients)
- **LLM Servers**: Ollama, vLLM, or llama.cpp
- **ASR**: faster-whisper or Whisper.cpp
- **TTS**: Piper, Mimic 3, or Coqui TTS
- **Wake-Word**: openWakeWord or Porcupine
- **Protocols**: MCP (Model Context Protocol), WebSocket, HTTP/gRPC
- **Storage**: SQLite (memory, sessions), Markdown files (tasks, notes)
- **Infrastructure**: Docker, systemd, Linux

## Design Patterns

### Core Patterns

- **Microservices Architecture**: Separate services for wake-word, ASR, TTS, LLM servers, MCP tools
- **Event-Driven**: Wake-word events trigger ASR capture, tool calls trigger actions
- **API Gateway Pattern**: Routing layer directs requests to appropriate LLM server
- **Repository Pattern**: Separate config repo for family agent (no work content)
- **Tool Pattern**: MCP tools as independent, composable services

### Architectural Patterns

- **Separation of Concerns**: Clear boundaries between work and family agents
- **Layered Architecture**: Clients → Voice Stack → LLM Infrastructure → MCP Tools → Safety/Memory
- **Service-Oriented**: Each component is an independent service with defined APIs
- **Privacy by Design**: Local processing, minimal external dependencies

## Project Structure

### Repository Structure

```
home-voice-agent/          # Main mono-repo
├── llm-servers/
│   ├── 4080/             # Work agent server
│   └── 1050/             # Family agent server
├── mcp-server/           # MCP tool server
│   └── tools/            # Individual tool implementations
├── wake-word/            # Wake-word detection node
├── asr/                  # ASR service
├── tts/                  # TTS service
├── clients/
│   ├── phone/            # Phone PWA
│   └── web-dashboard/    # Web dashboard
├── routing/              # LLM routing layer
├── conversation/         # Conversation management
├── memory/               # Long-term memory
├── safety/               # Safety and boundary enforcement
└── admin/                # Admin tools

family-agent-config/       # Separate config repo
├── prompts/              # System prompts
├── tools/                 # Tool configurations
├── secrets/              # Credentials (no work stuff)
└── tasks/                # Home Kanban board
    └── home/             # Home tasks only
```

### Atlas Project (This Repo)

```
atlas/
├── tickets/              # Kanban tickets
│   ├── backlog/         # Future work
│   ├── todo/            # Ready to work on
│   ├── in-progress/     # Active work
│   ├── review/          # Awaiting review
│   └── done/            # Completed
├── docs/                 # Documentation
├── ARCHITECTURE.md       # This file
└── README.md             # Project overview
```

## Data Models

### Memory Schema

Long-term memory stores personal facts, preferences, and routines:

```python
MemoryEntry:
  - id: str
  - category: str  # personal, family, preferences, routines
  - content: str
  - timestamp: datetime
  - confidence: float
  - source: str  # conversation, explicit, inferred
```

### Conversation Session

```python
Session:
  - session_id: str
  - agent_type: str  # "work" or "family"
  - messages: List[Message]
  - created_at: datetime
  - last_activity: datetime
  - summary: str  # after summarization
```

### Task Model (Markdown Kanban)

```yaml
---
id: TICKET-XXX
title: Task title
status: backlog|todo|in-progress|review|done
priority: high|medium|low
created: YYYY-MM-DD
updated: YYYY-MM-DD
assignee: name
tags: [tag1, tag2]
---

Task description...
```

### MCP Tool Definition

```json
{
  "name": "tool_name",
  "description": "Tool description",
  "inputSchema": {
    "type": "object",
    "properties": {...}
  }
}
```

## API Design

### LLM Server API

**Endpoint**: `POST /v1/chat/completions`

```json
{
  "model": "work-agent" | "family-agent",
  "messages": [...],
  "tools": [...],
  "temperature": 0.7
}
```

### ASR Service API

**Endpoint**: `WebSocket /asr/stream`

- Input: Audio stream (PCM, 16kHz, mono)
- Output: Text segments with timestamps

```json
{
  "text": "transcribed text",
  "timestamp": 1234.56,
  "confidence": 0.95,
  "is_final": false
}
```

### TTS Service API

**Endpoint**: `POST /tts/synthesize`

```json
{
  "text": "Text to speak",
  "voice": "family-voice",
  "stream": true
}
```

Response: Audio stream (WAV or MP3)

### MCP Server API

**Protocol**: JSON-RPC 2.0

**Methods**:
- `tools/list`: List available tools
- `tools/call`: Execute a tool

```json
{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "weather",
    "arguments": {...}
  },
  "id": 1
}
```

## Security Considerations

### Privacy Policy

- **Core Principle**: No external APIs for ASR/LLM processing
- **Exception**: Weather API (documented exception)
- **Local Processing**: All voice and LLM processing runs locally
- **Data Retention**: Configurable retention policies for conversations

### Boundary Enforcement

- **Repository Separation**: Family agent config in separate repo, no work content
- **Path Whitelists**: Tools can only access whitelisted directories
- **Network Isolation**: Containers/namespaces prevent cross-access
- **Firewall Rules**: Block family agent from accessing work repo paths
- **Static Analysis**: CI/CD checks reject code that would grant cross-access

### Confirmation Flows

- **High-Risk Actions**: Email send, calendar changes, file edits outside safe areas
- **Confirmation Tokens**: Signed tokens required from client, not just model intent
- **User Approval**: Explicit "Yes/No" confirmation for sensitive operations
- **Audit Logging**: All confirmations and high-risk actions logged

### Authentication & Authorization

- **Token-Based**: Separate tokens for work vs family agents
- **Revocation**: System to disable compromised tokens/devices
- **Admin Controls**: Kill switches for services, tools, or entire agent

## Performance Considerations

### Latency Targets

- **Wake-Word Detection**: < 200ms
- **ASR Processing**: < 2s end-to-end (audio in → text out)
- **LLM Response**: < 3s for family agent (1050), < 5s for work agent (4080)
- **TTS Synthesis**: < 500ms first chunk, streaming thereafter
- **Tool Execution**: < 1s for simple tools (weather, time)

### Resource Allocation

- **4080 (Work Agent)**:
  - Model: 8-14B or 30B quantized (Q4-Q6)
  - Context: 8K-16K tokens
  - Concurrency: 2-4 requests

- **1050 (Family Agent)**:
  - Model: 1B-3B quantized (Q4-Q5)
  - Context: 4K-8K tokens
  - Concurrency: 1-2 requests (always-on, low latency)

### Optimization Strategies

- **Model Quantization**: Q4-Q6 for 4080, Q4-Q5 for 1050
- **Context Management**: Summarization and pruning for long conversations
- **Caching**: Weather API responses, tool results
- **Streaming**: ASR and TTS use streaming for lower perceived latency
- **Batching**: LLM requests where possible (work agent)

## Deployment

### Hardware Requirements

- **RTX 4080 Server**: Work agent LLM, ASR (optional)
- **RTX 1050 Server**: Family agent LLM (always-on)
- **Wake-Word Node**: Raspberry Pi 4+, NUC, or SFF PC
- **Microphones**: USB mics or array mic for living room/office
- **Storage**: SSD for logs, HDD for archives

### Service Deployment

- **LLM Servers**: Systemd services or Docker containers
- **MCP Server**: Systemd service with auto-restart
- **Voice Services**: ASR and TTS as systemd services
- **Wake-Word Node**: Standalone service on dedicated hardware
- **Clients**: PWA served via web server, web dashboard on LAN

### Configuration Management

- **Family Agent Config**: Separate `family-agent-config/` repo
- **Secrets**: Environment variables, separate .env files
- **Prompts**: Version-controlled in config repo
- **Tool Configs**: YAML/JSON files in config repo

### Monitoring & Logging

- **Structured Logging**: JSON logs for all services
- **Metrics**: GPU usage, latency, error rates
- **Admin Dashboard**: Web UI for logs, metrics, controls
- **Alerting**: System notifications for errors or high resource usage

## Development Workflow

### Ticket-Based Development

1. **Select Ticket**: Choose from `tickets/backlog/` or `tickets/todo/`
2. **Check Dependencies**: Review ticket dependencies before starting
3. **Move to In-Progress**: Move ticket to `tickets/in-progress/`
4. **Implement**: Follow architecture patterns and conventions
5. **Test**: Write and run tests
6. **Document**: Update relevant documentation
7. **Move to Review**: Move ticket to `tickets/review/` when complete
8. **Move to Done**: Move to `tickets/done/` after approval

### Parallel Development

Many tickets can be worked on simultaneously:
- **Voice I/O**: Independent of LLM and MCP
- **LLM Infrastructure**: Can proceed after model selection
- **MCP Tools**: Can start with minimal server, add tools incrementally
- **Clients/UI**: Can mock APIs early, integrate later

### Milestone Progression

- **Milestone 1**: Foundation and surveys (TICKET-002 through TICKET-029)
- **Milestone 2**: MVP with voice chat, weather, tasks (core functionality)
- **Milestone 3**: Memory, reminders, safety features
- **Milestone 4**: Optional integrations (email, calendar, smart home)

## Future Considerations

### Planned Enhancements

- **Semantic Search**: Add embeddings for note search (beyond ripgrep)
- **Routine Learning**: Automatically learn and suggest routines from memory
- **Multi-Device**: Support multiple wake-word nodes and clients
- **Offline Mode**: Enhanced offline capabilities for clients
- **Voice Cloning**: Custom voice profiles for family members

### Technical Debt

- Start with basic implementations, optimize later
- Initial memory system: simple schema, enhance with better retrieval
- Tool permissions: Start with whitelists, add more granular control later
- Logging: Start with files, migrate to time-series DB if needed

### Scalability

- Current design supports single household
- Future: Multi-user support with user-specific memory and preferences
- Future: Distributed deployment across multiple nodes

## Related Documentation

- **Tickets**: See `tickets/TICKETS_SUMMARY.md` for all 46 tickets
- **Quick Start**: See `tickets/QUICK_START.md` for recommended starting order
- **Ticket Template**: See `tickets/TICKET_TEMPLATE.md` for creating new tickets

---

**Note**: Update this document as the architecture evolves.