# Implementation Guide - Milestone 2 ## Overview This guide provides step-by-step instructions for implementing Milestone 2 core infrastructure. All planning and evaluation work is complete - ready to build! ## Prerequisites ✅ **Completed:** - Model selections finalized (Llama 3.1 70B Q4, Phi-3 Mini 3.8B Q4) - ASR engine selected (faster-whisper) - MCP architecture documented - Hardware plan ready ## Implementation Order ### Phase 1: Core Infrastructure (Priority 1) #### 1. LLM Servers (TICKET-021, TICKET-022) **Why First:** Everything else depends on LLM infrastructure **TICKET-021: 4080 LLM Service** **Recommended Approach: Ollama** 1. **Install Ollama** ```bash curl -fsSL https://ollama.com/install.sh | sh ``` 2. **Download Model** ```bash ollama pull llama3.1:70b-q4_0 # Or use custom quantized model ``` 3. **Start Ollama Service** ```bash ollama serve # Runs on http://localhost:11434 ``` 4. **Test Function Calling** ```bash curl http://localhost:11434/api/chat -d '{ "model": "llama3.1:70b-q4_0", "messages": [{"role": "user", "content": "Hello"}], "tools": [...] }' ``` 5. **Create Systemd Service** (for auto-start) ```ini [Unit] Description=Ollama LLM Server (4080) After=network.target [Service] Type=simple User=atlas ExecStart=/usr/local/bin/ollama serve Restart=always [Install] WantedBy=multi-user.target ``` **Alternative: vLLM** (if you need batching/higher throughput) - More complex setup - Better for multiple concurrent requests - See vLLM documentation **TICKET-022: 1050 LLM Service** **Recommended Approach: Ollama (same as 4080)** 1. **Install Ollama** (on 1050 machine) 2. **Download Model** ```bash ollama pull phi3:mini-q4_0 ``` 3. **Start Service** ```bash ollama serve --host 0.0.0.0 # Runs on http://<1050-ip>:11434 ``` 4. **Test** ```bash curl http://<1050-ip>:11434/api/chat -d '{ "model": "phi3:mini-q4_0", "messages": [{"role": "user", "content": "Hello"}] }' ``` **Key Differences:** - Different model (Phi-3 Mini vs Llama 3.1) - Different port or IP binding - Lower resource usage #### 2. MCP Server (TICKET-029) **Why Second:** Foundation for all tools **Implementation Steps:** 1. **Create Project Structure** ``` home-voice-agent/ └── mcp-server/ ├── __init__.py ├── server.py # Main JSON-RPC server ├── tools/ │ ├── __init__.py │ ├── weather.py │ └── echo.py └── requirements.txt ``` 2. **Install Dependencies** ```bash pip install jsonrpc-base jsonrpc-websocket fastapi uvicorn ``` 3. **Implement JSON-RPC 2.0 Server** - Use `jsonrpc-base` or implement manually - Handle `tools/list` and `tools/call` methods - Error handling with proper JSON-RPC error codes 4. **Create Example Tools** - **Echo Tool**: Simple echo for testing - **Weather Tool**: Stub implementation (real API later) 5. **Test Server** ```bash # Start server python mcp-server/server.py # Test tools/list curl -X POST http://localhost:8000/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}' # Test tools/call curl -X POST http://localhost:8000/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "tools/call", "params": {"name": "echo", "arguments": {"text": "hello"}}, "id": 2 }' ``` ### Phase 2: Voice I/O Services (Priority 2) #### 3. Wake-Word Node (TICKET-006) **Prerequisites:** Hardware (microphone, always-on node) **Implementation Steps:** 1. **Install openWakeWord** (or selected engine) ```bash pip install openwakeword ``` 2. **Create Wake-Word Service** - Audio capture (PyAudio) - Wake-word detection loop - Event emission (WebSocket/MQTT/HTTP) 3. **Test Detection** - Train/configure "Hey Atlas" wake-word - Test false positive/negative rates #### 4. ASR Service (TICKET-010) **Prerequisites:** faster-whisper selected **Implementation Steps:** 1. **Install faster-whisper** ```bash pip install faster-whisper ``` 2. **Download Model** ```python from faster_whisper import WhisperModel model = WhisperModel("small", device="cuda", compute_type="float16") ``` 3. **Create WebSocket Service** - Audio streaming endpoint - Real-time transcription - Text segment output 4. **Integrate with Wake-Word** - Start ASR on wake-word event - Stop on silence or user command #### 5. TTS Service (TICKET-014) **Prerequisites:** TTS evaluation complete **Implementation Steps:** 1. **Install Piper** (or selected TTS) ```bash # Install Piper wget https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz tar -xzf piper_amd64.tar.gz ``` 2. **Download Voice Model** ```bash # Download voice model wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx ``` 3. **Create HTTP Service** - Text input → audio output - Streaming support - Voice selection ## Quick Start Checklist ### Week 1: Core Infrastructure - [ ] Set up 4080 LLM server (TICKET-021) - [ ] Set up 1050 LLM server (TICKET-022) - [ ] Test both servers independently - [ ] Implement minimal MCP server (TICKET-029) - [ ] Test MCP server with echo tool ### Week 2: Voice Services - [ ] Prototype wake-word node (TICKET-006) - if hardware ready - [ ] Implement ASR service (TICKET-010) - [ ] Implement TTS service (TICKET-014) - [ ] Test voice pipeline end-to-end ### Week 3: Integration - [ ] Implement MCP-LLM adapter (TICKET-030) - [ ] Add core tools (weather, time, tasks) - [ ] Create routing layer (TICKET-023) - [ ] Test full system ## Common Issues & Solutions ### LLM Server Issues **Problem:** Model doesn't fit in VRAM - **Solution:** Use Q4 quantization, reduce context window **Problem:** Slow inference - **Solution:** Check GPU utilization, use GPU-accelerated inference **Problem:** Function calling not working - **Solution:** Verify model supports function calling, check prompt format ### MCP Server Issues **Problem:** JSON-RPC errors - **Solution:** Validate request format, check error codes **Problem:** Tools not discovered - **Solution:** Verify tool registration, check `tools/list` response ### Voice Services Issues **Problem:** High latency - **Solution:** Use GPU for ASR, optimize model size **Problem:** Poor accuracy - **Solution:** Use larger model, improve audio quality ## Testing Strategy ### Unit Tests - Test each service independently - Mock dependencies where needed ### Integration Tests - Test LLM → MCP → Tool flow - Test Wake-word → ASR → LLM → TTS flow ### End-to-End Tests - Full voice interaction - Tool calling scenarios - Error handling ## Next Steps After Milestone 2 Once core infrastructure is working: 1. Add more MCP tools (TICKET-031, TICKET-032, TICKET-033, TICKET-034) 2. Implement phone client (TICKET-039) 3. Add system prompts (TICKET-025) 4. Implement conversation handling (TICKET-027) ## References - **Ollama Docs**: https://ollama.com/docs - **vLLM Docs**: https://docs.vllm.ai - **faster-whisper**: https://github.com/guillaumekln/faster-whisper - **MCP Spec**: https://modelcontextprotocol.io/specification - **Model Selection**: `docs/MODEL_SELECTION.md` - **ASR Evaluation**: `docs/ASR_EVALUATION.md` - **MCP Architecture**: `docs/MCP_ARCHITECTURE.md` --- **Last Updated**: 2024-01-XX **Status**: Ready for Implementation