atlas/docs/IMPLEMENTATION_GUIDE.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

7.5 KiB

Implementation Guide - Milestone 2

Overview

This guide provides step-by-step instructions for implementing Milestone 2 core infrastructure. All planning and evaluation work is complete - ready to build!

Prerequisites

Completed:

  • Model selections finalized (Llama 3.1 70B Q4, Phi-3 Mini 3.8B Q4)
  • ASR engine selected (faster-whisper)
  • MCP architecture documented
  • Hardware plan ready

Implementation Order

Phase 1: Core Infrastructure (Priority 1)

1. LLM Servers (TICKET-021, TICKET-022)

Why First: Everything else depends on LLM infrastructure

TICKET-021: 4080 LLM Service

Recommended Approach: Ollama

  1. Install Ollama

    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Download Model

    ollama pull llama3.1:70b-q4_0
    # Or use custom quantized model
    
  3. Start Ollama Service

    ollama serve
    # Runs on http://localhost:11434
    
  4. Test Function Calling

    curl http://localhost:11434/api/chat -d '{
      "model": "llama3.1:70b-q4_0",
      "messages": [{"role": "user", "content": "Hello"}],
      "tools": [...]
    }'
    
  5. Create Systemd Service (for auto-start)

    [Unit]
    Description=Ollama LLM Server (4080)
    After=network.target
    
    [Service]
    Type=simple
    User=atlas
    ExecStart=/usr/local/bin/ollama serve
    Restart=always
    
    [Install]
    WantedBy=multi-user.target
    

Alternative: vLLM (if you need batching/higher throughput)

  • More complex setup
  • Better for multiple concurrent requests
  • See vLLM documentation

TICKET-022: 1050 LLM Service

Recommended Approach: Ollama (same as 4080)

  1. Install Ollama (on 1050 machine)

  2. Download Model

    ollama pull phi3:mini-q4_0
    
  3. Start Service

    ollama serve --host 0.0.0.0
    # Runs on http://<1050-ip>:11434
    
  4. Test

    curl http://<1050-ip>:11434/api/chat -d '{
      "model": "phi3:mini-q4_0",
      "messages": [{"role": "user", "content": "Hello"}]
    }'
    

Key Differences:

  • Different model (Phi-3 Mini vs Llama 3.1)
  • Different port or IP binding
  • Lower resource usage

2. MCP Server (TICKET-029)

Why Second: Foundation for all tools

Implementation Steps:

  1. Create Project Structure

    home-voice-agent/
    └── mcp-server/
        ├── __init__.py
        ├── server.py          # Main JSON-RPC server
        ├── tools/
        │   ├── __init__.py
        │   ├── weather.py
        │   └── echo.py
        └── requirements.txt
    
  2. Install Dependencies

    pip install jsonrpc-base jsonrpc-websocket fastapi uvicorn
    
  3. Implement JSON-RPC 2.0 Server

    • Use jsonrpc-base or implement manually
    • Handle tools/list and tools/call methods
    • Error handling with proper JSON-RPC error codes
  4. Create Example Tools

    • Echo Tool: Simple echo for testing
    • Weather Tool: Stub implementation (real API later)
  5. Test Server

    # Start server
    python mcp-server/server.py
    
    # Test tools/list
    curl -X POST http://localhost:8000/mcp \
      -H "Content-Type: application/json" \
      -d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'
    
    # Test tools/call
    curl -X POST http://localhost:8000/mcp \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {"name": "echo", "arguments": {"text": "hello"}},
        "id": 2
      }'
    

Phase 2: Voice I/O Services (Priority 2)

3. Wake-Word Node (TICKET-006)

Prerequisites: Hardware (microphone, always-on node)

Implementation Steps:

  1. Install openWakeWord (or selected engine)

    pip install openwakeword
    
  2. Create Wake-Word Service

    • Audio capture (PyAudio)
    • Wake-word detection loop
    • Event emission (WebSocket/MQTT/HTTP)
  3. Test Detection

    • Train/configure "Hey Atlas" wake-word
    • Test false positive/negative rates

4. ASR Service (TICKET-010)

Prerequisites: faster-whisper selected

Implementation Steps:

  1. Install faster-whisper

    pip install faster-whisper
    
  2. Download Model

    from faster_whisper import WhisperModel
    model = WhisperModel("small", device="cuda", compute_type="float16")
    
  3. Create WebSocket Service

    • Audio streaming endpoint
    • Real-time transcription
    • Text segment output
  4. Integrate with Wake-Word

    • Start ASR on wake-word event
    • Stop on silence or user command

5. TTS Service (TICKET-014)

Prerequisites: TTS evaluation complete

Implementation Steps:

  1. Install Piper (or selected TTS)

    # Install Piper
    wget https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz
    tar -xzf piper_amd64.tar.gz
    
  2. Download Voice Model

    # Download voice model
    wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
    
  3. Create HTTP Service

    • Text input → audio output
    • Streaming support
    • Voice selection

Quick Start Checklist

Week 1: Core Infrastructure

  • Set up 4080 LLM server (TICKET-021)
  • Set up 1050 LLM server (TICKET-022)
  • Test both servers independently
  • Implement minimal MCP server (TICKET-029)
  • Test MCP server with echo tool

Week 2: Voice Services

  • Prototype wake-word node (TICKET-006) - if hardware ready
  • Implement ASR service (TICKET-010)
  • Implement TTS service (TICKET-014)
  • Test voice pipeline end-to-end

Week 3: Integration

  • Implement MCP-LLM adapter (TICKET-030)
  • Add core tools (weather, time, tasks)
  • Create routing layer (TICKET-023)
  • Test full system

Common Issues & Solutions

LLM Server Issues

Problem: Model doesn't fit in VRAM

  • Solution: Use Q4 quantization, reduce context window

Problem: Slow inference

  • Solution: Check GPU utilization, use GPU-accelerated inference

Problem: Function calling not working

  • Solution: Verify model supports function calling, check prompt format

MCP Server Issues

Problem: JSON-RPC errors

  • Solution: Validate request format, check error codes

Problem: Tools not discovered

  • Solution: Verify tool registration, check tools/list response

Voice Services Issues

Problem: High latency

  • Solution: Use GPU for ASR, optimize model size

Problem: Poor accuracy

  • Solution: Use larger model, improve audio quality

Testing Strategy

Unit Tests

  • Test each service independently
  • Mock dependencies where needed

Integration Tests

  • Test LLM → MCP → Tool flow
  • Test Wake-word → ASR → LLM → TTS flow

End-to-End Tests

  • Full voice interaction
  • Tool calling scenarios
  • Error handling

Next Steps After Milestone 2

Once core infrastructure is working:

  1. Add more MCP tools (TICKET-031, TICKET-032, TICKET-033, TICKET-034)
  2. Implement phone client (TICKET-039)
  3. Add system prompts (TICKET-025)
  4. Implement conversation handling (TICKET-027)

References


Last Updated: 2024-01-XX Status: Ready for Implementation