ilia/atlas

ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP

- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.

2026-01-05 23:44:16 -05:00

7.5 KiB

Raw Blame History

Implementation Guide - Milestone 2

Overview

This guide provides step-by-step instructions for implementing Milestone 2 core infrastructure. All planning and evaluation work is complete - ready to build!

Prerequisites

✅ Completed:

Model selections finalized (Llama 3.1 70B Q4, Phi-3 Mini 3.8B Q4)
ASR engine selected (faster-whisper)
MCP architecture documented
Hardware plan ready

Implementation Order

Phase 1: Core Infrastructure (Priority 1)

1. LLM Servers (TICKET-021, TICKET-022)

Why First: Everything else depends on LLM infrastructure

TICKET-021: 4080 LLM Service

Recommended Approach: Ollama

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Download Model

ollama pull llama3.1:70b-q4_0
# Or use custom quantized model

Start Ollama Service

ollama serve
# Runs on http://localhost:11434

Test Function Calling

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1:70b-q4_0",
  "messages": [{"role": "user", "content": "Hello"}],
  "tools": [...]
}'

Create Systemd Service (for auto-start)

[Unit]
Description=Ollama LLM Server (4080)
After=network.target

[Service]
Type=simple
User=atlas
ExecStart=/usr/local/bin/ollama serve
Restart=always

[Install]
WantedBy=multi-user.target

Alternative: vLLM (if you need batching/higher throughput)

More complex setup
Better for multiple concurrent requests
See vLLM documentation

TICKET-022: 1050 LLM Service

Recommended Approach: Ollama (same as 4080)

Install Ollama (on 1050 machine)
Download Model
```
ollama pull phi3:mini-q4_0
```

Start Service

ollama serve --host 0.0.0.0
# Runs on http://<1050-ip>:11434

Test

curl http://<1050-ip>:11434/api/chat -d '{
  "model": "phi3:mini-q4_0",
  "messages": [{"role": "user", "content": "Hello"}]
}'

Key Differences:

Different model (Phi-3 Mini vs Llama 3.1)
Different port or IP binding
Lower resource usage

2. MCP Server (TICKET-029)

Why Second: Foundation for all tools

Implementation Steps:

Create Project Structure

home-voice-agent/
└── mcp-server/
    ├── __init__.py
    ├── server.py          # Main JSON-RPC server
    ├── tools/
    │   ├── __init__.py
    │   ├── weather.py
    │   └── echo.py
    └── requirements.txt

Install Dependencies

pip install jsonrpc-base jsonrpc-websocket fastapi uvicorn

Implement JSON-RPC 2.0 Server
- Use jsonrpc-base or implement manually
- Handle tools/list and tools/call methods
- Error handling with proper JSON-RPC error codes
Create Example Tools
- Echo Tool: Simple echo for testing
- Weather Tool: Stub implementation (real API later)

Test Server

# Start server
python mcp-server/server.py

# Test tools/list
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'

# Test tools/call
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {"name": "echo", "arguments": {"text": "hello"}},
    "id": 2
  }'

Phase 2: Voice I/O Services (Priority 2)

3. Wake-Word Node (TICKET-006)

Prerequisites: Hardware (microphone, always-on node)

Implementation Steps:

Install openWakeWord (or selected engine)
```
pip install openwakeword
```
Create Wake-Word Service
- Audio capture (PyAudio)
- Wake-word detection loop
- Event emission (WebSocket/MQTT/HTTP)
Test Detection
- Train/configure "Hey Atlas" wake-word
- Test false positive/negative rates

4. ASR Service (TICKET-010)

Prerequisites: faster-whisper selected

Implementation Steps:

Install faster-whisper
```
pip install faster-whisper
```

Download Model

from faster_whisper import WhisperModel
model = WhisperModel("small", device="cuda", compute_type="float16")

Create WebSocket Service
- Audio streaming endpoint
- Real-time transcription
- Text segment output
Integrate with Wake-Word
- Start ASR on wake-word event
- Stop on silence or user command

5. TTS Service (TICKET-014)

Prerequisites: TTS evaluation complete

Implementation Steps:

Install Piper (or selected TTS)

# Install Piper
wget https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz
tar -xzf piper_amd64.tar.gz

Download Voice Model

# Download voice model
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx

Create HTTP Service
- Text input → audio output
- Streaming support
- Voice selection

Quick Start Checklist

Week 1: Core Infrastructure

Set up 4080 LLM server (TICKET-021)
Set up 1050 LLM server (TICKET-022)
Test both servers independently
Implement minimal MCP server (TICKET-029)
Test MCP server with echo tool

Week 2: Voice Services

Prototype wake-word node (TICKET-006) - if hardware ready
Implement ASR service (TICKET-010)
Implement TTS service (TICKET-014)
Test voice pipeline end-to-end

Week 3: Integration

Implement MCP-LLM adapter (TICKET-030)
Add core tools (weather, time, tasks)
Create routing layer (TICKET-023)
Test full system

Common Issues & Solutions

LLM Server Issues

Problem: Model doesn't fit in VRAM

Solution: Use Q4 quantization, reduce context window

Problem: Slow inference

Solution: Check GPU utilization, use GPU-accelerated inference

Problem: Function calling not working

Solution: Verify model supports function calling, check prompt format

MCP Server Issues

Problem: JSON-RPC errors

Solution: Validate request format, check error codes

Problem: Tools not discovered

Solution: Verify tool registration, check tools/list response

Voice Services Issues

Problem: High latency

Solution: Use GPU for ASR, optimize model size

Problem: Poor accuracy

Solution: Use larger model, improve audio quality

Testing Strategy

Unit Tests

Test each service independently
Mock dependencies where needed

Integration Tests

Test LLM → MCP → Tool flow
Test Wake-word → ASR → LLM → TTS flow

End-to-End Tests

Full voice interaction
Tool calling scenarios
Error handling

Next Steps After Milestone 2

Once core infrastructure is working:

Add more MCP tools (TICKET-031, TICKET-032, TICKET-033, TICKET-034)
Implement phone client (TICKET-039)
Add system prompts (TICKET-025)
Implement conversation handling (TICKET-027)

References

Ollama Docs: https://ollama.com/docs
vLLM Docs: https://docs.vllm.ai
faster-whisper: https://github.com/guillaumekln/faster-whisper
MCP Spec: https://modelcontextprotocol.io/specification
Model Selection: docs/MODEL_SELECTION.md
ASR Evaluation: docs/ASR_EVALUATION.md
MCP Architecture: docs/MCP_ARCHITECTURE.md

Last Updated: 2024-01-XX Status: Ready for Implementation

7.5 KiB Raw Blame History

Implementation Guide - Milestone 2

Overview

Prerequisites

Implementation Order

Phase 1: Core Infrastructure (Priority 1)

1. LLM Servers (TICKET-021, TICKET-022)

2. MCP Server (TICKET-029)

Phase 2: Voice I/O Services (Priority 2)

3. Wake-Word Node (TICKET-006)

4. ASR Service (TICKET-010)

5. TTS Service (TICKET-014)

Quick Start Checklist

Week 1: Core Infrastructure

Week 2: Voice Services

Week 3: Integration

Common Issues & Solutions

LLM Server Issues

MCP Server Issues

Voice Services Issues

Testing Strategy

Unit Tests

Integration Tests

End-to-End Tests

Next Steps After Milestone 2

References

7.5 KiB

Raw Blame History