# Implementation Guide - Milestone 2

## Overview

This guide provides step-by-step instructions for implementing Milestone 2 core infrastructure. All planning and evaluation work is complete - ready to build!

## Prerequisites

✅ **Completed:**
- Model selections finalized (Llama 3.1 70B Q4, Phi-3 Mini 3.8B Q4)
- ASR engine selected (faster-whisper)
- MCP architecture documented
- Hardware plan ready

## Implementation Order

### Phase 1: Core Infrastructure (Priority 1)

#### 1. LLM Servers (TICKET-021, TICKET-022)

**Why First:** Everything else depends on LLM infrastructure

**TICKET-021: 4080 LLM Service**

**Recommended Approach: Ollama**

1. **Install Ollama**
   ```bash
   curl -fsSL https://ollama.com/install.sh | sh
   ```

2. **Download Model**
   ```bash
   ollama pull llama3.1:70b-q4_0
   # Or use custom quantized model
   ```

3. **Start Ollama Service**
   ```bash
   ollama serve
   # Runs on http://localhost:11434
   ```

4. **Test Function Calling**
   ```bash
   curl http://localhost:11434/api/chat -d '{
     "model": "llama3.1:70b-q4_0",
     "messages": [{"role": "user", "content": "Hello"}],
     "tools": [...]
   }'
   ```

5. **Create Systemd Service** (for auto-start)
   ```ini
   [Unit]
   Description=Ollama LLM Server (4080)
   After=network.target

   [Service]
   Type=simple
   User=atlas
   ExecStart=/usr/local/bin/ollama serve
   Restart=always

   [Install]
   WantedBy=multi-user.target
   ```

**Alternative: vLLM** (if you need batching/higher throughput)
- More complex setup
- Better for multiple concurrent requests
- See vLLM documentation

**TICKET-022: 1050 LLM Service**

**Recommended Approach: Ollama (same as 4080)**

1. **Install Ollama** (on 1050 machine)
2. **Download Model**
   ```bash
   ollama pull phi3:mini-q4_0
   ```

3. **Start Service**
   ```bash
   ollama serve --host 0.0.0.0
   # Runs on http://<1050-ip>:11434
   ```

4. **Test**
   ```bash
   curl http://<1050-ip>:11434/api/chat -d '{
     "model": "phi3:mini-q4_0",
     "messages": [{"role": "user", "content": "Hello"}]
   }'
   ```

**Key Differences:**
- Different model (Phi-3 Mini vs Llama 3.1)
- Different port or IP binding
- Lower resource usage

#### 2. MCP Server (TICKET-029)

**Why Second:** Foundation for all tools

**Implementation Steps:**

1. **Create Project Structure**
   ```
   home-voice-agent/
   └── mcp-server/
       ├── __init__.py
       ├── server.py          # Main JSON-RPC server
       ├── tools/
       │   ├── __init__.py
       │   ├── weather.py
       │   └── echo.py
       └── requirements.txt
   ```

2. **Install Dependencies**
   ```bash
   pip install jsonrpc-base jsonrpc-websocket fastapi uvicorn
   ```

3. **Implement JSON-RPC 2.0 Server**
   - Use `jsonrpc-base` or implement manually
   - Handle `tools/list` and `tools/call` methods
   - Error handling with proper JSON-RPC error codes

4. **Create Example Tools**
   - **Echo Tool**: Simple echo for testing
   - **Weather Tool**: Stub implementation (real API later)

5. **Test Server**
   ```bash
   # Start server
   python mcp-server/server.py

   # Test tools/list
   curl -X POST http://localhost:8000/mcp \
     -H "Content-Type: application/json" \
     -d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'

   # Test tools/call
   curl -X POST http://localhost:8000/mcp \
     -H "Content-Type: application/json" \
     -d '{
       "jsonrpc": "2.0",
       "method": "tools/call",
       "params": {"name": "echo", "arguments": {"text": "hello"}},
       "id": 2
     }'
   ```

### Phase 2: Voice I/O Services (Priority 2)

#### 3. Wake-Word Node (TICKET-006)

**Prerequisites:** Hardware (microphone, always-on node)

**Implementation Steps:**

1. **Install openWakeWord** (or selected engine)
   ```bash
   pip install openwakeword
   ```

2. **Create Wake-Word Service**
   - Audio capture (PyAudio)
   - Wake-word detection loop
   - Event emission (WebSocket/MQTT/HTTP)

3. **Test Detection**
   - Train/configure "Hey Atlas" wake-word
   - Test false positive/negative rates

#### 4. ASR Service (TICKET-010)

**Prerequisites:** faster-whisper selected

**Implementation Steps:**

1. **Install faster-whisper**
   ```bash
   pip install faster-whisper
   ```

2. **Download Model**
   ```python
   from faster_whisper import WhisperModel
   model = WhisperModel("small", device="cuda", compute_type="float16")
   ```

3. **Create WebSocket Service**
   - Audio streaming endpoint
   - Real-time transcription
   - Text segment output

4. **Integrate with Wake-Word**
   - Start ASR on wake-word event
   - Stop on silence or user command

#### 5. TTS Service (TICKET-014)

**Prerequisites:** TTS evaluation complete

**Implementation Steps:**

1. **Install Piper** (or selected TTS)
   ```bash
   # Install Piper
   wget https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz
   tar -xzf piper_amd64.tar.gz
   ```

2. **Download Voice Model**
   ```bash
   # Download voice model
   wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
   ```

3. **Create HTTP Service**
   - Text input → audio output
   - Streaming support
   - Voice selection

## Quick Start Checklist

### Week 1: Core Infrastructure
- [ ] Set up 4080 LLM server (TICKET-021)
- [ ] Set up 1050 LLM server (TICKET-022)
- [ ] Test both servers independently
- [ ] Implement minimal MCP server (TICKET-029)
- [ ] Test MCP server with echo tool

### Week 2: Voice Services
- [ ] Prototype wake-word node (TICKET-006) - if hardware ready
- [ ] Implement ASR service (TICKET-010)
- [ ] Implement TTS service (TICKET-014)
- [ ] Test voice pipeline end-to-end

### Week 3: Integration
- [ ] Implement MCP-LLM adapter (TICKET-030)
- [ ] Add core tools (weather, time, tasks)
- [ ] Create routing layer (TICKET-023)
- [ ] Test full system

## Common Issues & Solutions

### LLM Server Issues

**Problem:** Model doesn't fit in VRAM
- **Solution:** Use Q4 quantization, reduce context window

**Problem:** Slow inference
- **Solution:** Check GPU utilization, use GPU-accelerated inference

**Problem:** Function calling not working
- **Solution:** Verify model supports function calling, check prompt format

### MCP Server Issues

**Problem:** JSON-RPC errors
- **Solution:** Validate request format, check error codes

**Problem:** Tools not discovered
- **Solution:** Verify tool registration, check `tools/list` response

### Voice Services Issues

**Problem:** High latency
- **Solution:** Use GPU for ASR, optimize model size

**Problem:** Poor accuracy
- **Solution:** Use larger model, improve audio quality

## Testing Strategy

### Unit Tests
- Test each service independently
- Mock dependencies where needed

### Integration Tests
- Test LLM → MCP → Tool flow
- Test Wake-word → ASR → LLM → TTS flow

### End-to-End Tests
- Full voice interaction
- Tool calling scenarios
- Error handling

## Next Steps After Milestone 2

Once core infrastructure is working:
1. Add more MCP tools (TICKET-031, TICKET-032, TICKET-033, TICKET-034)
2. Implement phone client (TICKET-039)
3. Add system prompts (TICKET-025)
4. Implement conversation handling (TICKET-027)

## References

- **Ollama Docs**: https://ollama.com/docs
- **vLLM Docs**: https://docs.vllm.ai
- **faster-whisper**: https://github.com/guillaumekln/faster-whisper
- **MCP Spec**: https://modelcontextprotocol.io/specification
- **Model Selection**: `docs/MODEL_SELECTION.md`
- **ASR Evaluation**: `docs/ASR_EVALUATION.md`
- **MCP Architecture**: `docs/MCP_ARCHITECTURE.md`

---

**Last Updated**: 2024-01-XX
**Status**: Ready for Implementation