atlas/docs/IMPLEMENTATION_GUIDE.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

316 lines
7.5 KiB
Markdown

# Implementation Guide - Milestone 2
## Overview
This guide provides step-by-step instructions for implementing Milestone 2 core infrastructure. All planning and evaluation work is complete - ready to build!
## Prerequisites
**Completed:**
- Model selections finalized (Llama 3.1 70B Q4, Phi-3 Mini 3.8B Q4)
- ASR engine selected (faster-whisper)
- MCP architecture documented
- Hardware plan ready
## Implementation Order
### Phase 1: Core Infrastructure (Priority 1)
#### 1. LLM Servers (TICKET-021, TICKET-022)
**Why First:** Everything else depends on LLM infrastructure
**TICKET-021: 4080 LLM Service**
**Recommended Approach: Ollama**
1. **Install Ollama**
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
2. **Download Model**
```bash
ollama pull llama3.1:70b-q4_0
# Or use custom quantized model
```
3. **Start Ollama Service**
```bash
ollama serve
# Runs on http://localhost:11434
```
4. **Test Function Calling**
```bash
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1:70b-q4_0",
"messages": [{"role": "user", "content": "Hello"}],
"tools": [...]
}'
```
5. **Create Systemd Service** (for auto-start)
```ini
[Unit]
Description=Ollama LLM Server (4080)
After=network.target
[Service]
Type=simple
User=atlas
ExecStart=/usr/local/bin/ollama serve
Restart=always
[Install]
WantedBy=multi-user.target
```
**Alternative: vLLM** (if you need batching/higher throughput)
- More complex setup
- Better for multiple concurrent requests
- See vLLM documentation
**TICKET-022: 1050 LLM Service**
**Recommended Approach: Ollama (same as 4080)**
1. **Install Ollama** (on 1050 machine)
2. **Download Model**
```bash
ollama pull phi3:mini-q4_0
```
3. **Start Service**
```bash
ollama serve --host 0.0.0.0
# Runs on http://<1050-ip>:11434
```
4. **Test**
```bash
curl http://<1050-ip>:11434/api/chat -d '{
"model": "phi3:mini-q4_0",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
**Key Differences:**
- Different model (Phi-3 Mini vs Llama 3.1)
- Different port or IP binding
- Lower resource usage
#### 2. MCP Server (TICKET-029)
**Why Second:** Foundation for all tools
**Implementation Steps:**
1. **Create Project Structure**
```
home-voice-agent/
└── mcp-server/
├── __init__.py
├── server.py # Main JSON-RPC server
├── tools/
│ ├── __init__.py
│ ├── weather.py
│ └── echo.py
└── requirements.txt
```
2. **Install Dependencies**
```bash
pip install jsonrpc-base jsonrpc-websocket fastapi uvicorn
```
3. **Implement JSON-RPC 2.0 Server**
- Use `jsonrpc-base` or implement manually
- Handle `tools/list` and `tools/call` methods
- Error handling with proper JSON-RPC error codes
4. **Create Example Tools**
- **Echo Tool**: Simple echo for testing
- **Weather Tool**: Stub implementation (real API later)
5. **Test Server**
```bash
# Start server
python mcp-server/server.py
# Test tools/list
curl -X POST http://localhost:8000/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'
# Test tools/call
curl -X POST http://localhost:8000/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {"name": "echo", "arguments": {"text": "hello"}},
"id": 2
}'
```
### Phase 2: Voice I/O Services (Priority 2)
#### 3. Wake-Word Node (TICKET-006)
**Prerequisites:** Hardware (microphone, always-on node)
**Implementation Steps:**
1. **Install openWakeWord** (or selected engine)
```bash
pip install openwakeword
```
2. **Create Wake-Word Service**
- Audio capture (PyAudio)
- Wake-word detection loop
- Event emission (WebSocket/MQTT/HTTP)
3. **Test Detection**
- Train/configure "Hey Atlas" wake-word
- Test false positive/negative rates
#### 4. ASR Service (TICKET-010)
**Prerequisites:** faster-whisper selected
**Implementation Steps:**
1. **Install faster-whisper**
```bash
pip install faster-whisper
```
2. **Download Model**
```python
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cuda", compute_type="float16")
```
3. **Create WebSocket Service**
- Audio streaming endpoint
- Real-time transcription
- Text segment output
4. **Integrate with Wake-Word**
- Start ASR on wake-word event
- Stop on silence or user command
#### 5. TTS Service (TICKET-014)
**Prerequisites:** TTS evaluation complete
**Implementation Steps:**
1. **Install Piper** (or selected TTS)
```bash
# Install Piper
wget https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz
tar -xzf piper_amd64.tar.gz
```
2. **Download Voice Model**
```bash
# Download voice model
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
```
3. **Create HTTP Service**
- Text input → audio output
- Streaming support
- Voice selection
## Quick Start Checklist
### Week 1: Core Infrastructure
- [ ] Set up 4080 LLM server (TICKET-021)
- [ ] Set up 1050 LLM server (TICKET-022)
- [ ] Test both servers independently
- [ ] Implement minimal MCP server (TICKET-029)
- [ ] Test MCP server with echo tool
### Week 2: Voice Services
- [ ] Prototype wake-word node (TICKET-006) - if hardware ready
- [ ] Implement ASR service (TICKET-010)
- [ ] Implement TTS service (TICKET-014)
- [ ] Test voice pipeline end-to-end
### Week 3: Integration
- [ ] Implement MCP-LLM adapter (TICKET-030)
- [ ] Add core tools (weather, time, tasks)
- [ ] Create routing layer (TICKET-023)
- [ ] Test full system
## Common Issues & Solutions
### LLM Server Issues
**Problem:** Model doesn't fit in VRAM
- **Solution:** Use Q4 quantization, reduce context window
**Problem:** Slow inference
- **Solution:** Check GPU utilization, use GPU-accelerated inference
**Problem:** Function calling not working
- **Solution:** Verify model supports function calling, check prompt format
### MCP Server Issues
**Problem:** JSON-RPC errors
- **Solution:** Validate request format, check error codes
**Problem:** Tools not discovered
- **Solution:** Verify tool registration, check `tools/list` response
### Voice Services Issues
**Problem:** High latency
- **Solution:** Use GPU for ASR, optimize model size
**Problem:** Poor accuracy
- **Solution:** Use larger model, improve audio quality
## Testing Strategy
### Unit Tests
- Test each service independently
- Mock dependencies where needed
### Integration Tests
- Test LLM → MCP → Tool flow
- Test Wake-word → ASR → LLM → TTS flow
### End-to-End Tests
- Full voice interaction
- Tool calling scenarios
- Error handling
## Next Steps After Milestone 2
Once core infrastructure is working:
1. Add more MCP tools (TICKET-031, TICKET-032, TICKET-033, TICKET-034)
2. Implement phone client (TICKET-039)
3. Add system prompts (TICKET-025)
4. Implement conversation handling (TICKET-027)
## References
- **Ollama Docs**: https://ollama.com/docs
- **vLLM Docs**: https://docs.vllm.ai
- **faster-whisper**: https://github.com/guillaumekln/faster-whisper
- **MCP Spec**: https://modelcontextprotocol.io/specification
- **Model Selection**: `docs/MODEL_SELECTION.md`
- **ASR Evaluation**: `docs/ASR_EVALUATION.md`
- **MCP Architecture**: `docs/MCP_ARCHITECTURE.md`
---
**Last Updated**: 2024-01-XX
**Status**: Ready for Implementation