- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
316 lines
7.5 KiB
Markdown
316 lines
7.5 KiB
Markdown
# Implementation Guide - Milestone 2
|
|
|
|
## Overview
|
|
|
|
This guide provides step-by-step instructions for implementing Milestone 2 core infrastructure. All planning and evaluation work is complete - ready to build!
|
|
|
|
## Prerequisites
|
|
|
|
✅ **Completed:**
|
|
- Model selections finalized (Llama 3.1 70B Q4, Phi-3 Mini 3.8B Q4)
|
|
- ASR engine selected (faster-whisper)
|
|
- MCP architecture documented
|
|
- Hardware plan ready
|
|
|
|
## Implementation Order
|
|
|
|
### Phase 1: Core Infrastructure (Priority 1)
|
|
|
|
#### 1. LLM Servers (TICKET-021, TICKET-022)
|
|
|
|
**Why First:** Everything else depends on LLM infrastructure
|
|
|
|
**TICKET-021: 4080 LLM Service**
|
|
|
|
**Recommended Approach: Ollama**
|
|
|
|
1. **Install Ollama**
|
|
```bash
|
|
curl -fsSL https://ollama.com/install.sh | sh
|
|
```
|
|
|
|
2. **Download Model**
|
|
```bash
|
|
ollama pull llama3.1:70b-q4_0
|
|
# Or use custom quantized model
|
|
```
|
|
|
|
3. **Start Ollama Service**
|
|
```bash
|
|
ollama serve
|
|
# Runs on http://localhost:11434
|
|
```
|
|
|
|
4. **Test Function Calling**
|
|
```bash
|
|
curl http://localhost:11434/api/chat -d '{
|
|
"model": "llama3.1:70b-q4_0",
|
|
"messages": [{"role": "user", "content": "Hello"}],
|
|
"tools": [...]
|
|
}'
|
|
```
|
|
|
|
5. **Create Systemd Service** (for auto-start)
|
|
```ini
|
|
[Unit]
|
|
Description=Ollama LLM Server (4080)
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=atlas
|
|
ExecStart=/usr/local/bin/ollama serve
|
|
Restart=always
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
**Alternative: vLLM** (if you need batching/higher throughput)
|
|
- More complex setup
|
|
- Better for multiple concurrent requests
|
|
- See vLLM documentation
|
|
|
|
**TICKET-022: 1050 LLM Service**
|
|
|
|
**Recommended Approach: Ollama (same as 4080)**
|
|
|
|
1. **Install Ollama** (on 1050 machine)
|
|
2. **Download Model**
|
|
```bash
|
|
ollama pull phi3:mini-q4_0
|
|
```
|
|
|
|
3. **Start Service**
|
|
```bash
|
|
ollama serve --host 0.0.0.0
|
|
# Runs on http://<1050-ip>:11434
|
|
```
|
|
|
|
4. **Test**
|
|
```bash
|
|
curl http://<1050-ip>:11434/api/chat -d '{
|
|
"model": "phi3:mini-q4_0",
|
|
"messages": [{"role": "user", "content": "Hello"}]
|
|
}'
|
|
```
|
|
|
|
**Key Differences:**
|
|
- Different model (Phi-3 Mini vs Llama 3.1)
|
|
- Different port or IP binding
|
|
- Lower resource usage
|
|
|
|
#### 2. MCP Server (TICKET-029)
|
|
|
|
**Why Second:** Foundation for all tools
|
|
|
|
**Implementation Steps:**
|
|
|
|
1. **Create Project Structure**
|
|
```
|
|
home-voice-agent/
|
|
└── mcp-server/
|
|
├── __init__.py
|
|
├── server.py # Main JSON-RPC server
|
|
├── tools/
|
|
│ ├── __init__.py
|
|
│ ├── weather.py
|
|
│ └── echo.py
|
|
└── requirements.txt
|
|
```
|
|
|
|
2. **Install Dependencies**
|
|
```bash
|
|
pip install jsonrpc-base jsonrpc-websocket fastapi uvicorn
|
|
```
|
|
|
|
3. **Implement JSON-RPC 2.0 Server**
|
|
- Use `jsonrpc-base` or implement manually
|
|
- Handle `tools/list` and `tools/call` methods
|
|
- Error handling with proper JSON-RPC error codes
|
|
|
|
4. **Create Example Tools**
|
|
- **Echo Tool**: Simple echo for testing
|
|
- **Weather Tool**: Stub implementation (real API later)
|
|
|
|
5. **Test Server**
|
|
```bash
|
|
# Start server
|
|
python mcp-server/server.py
|
|
|
|
# Test tools/list
|
|
curl -X POST http://localhost:8000/mcp \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'
|
|
|
|
# Test tools/call
|
|
curl -X POST http://localhost:8000/mcp \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"jsonrpc": "2.0",
|
|
"method": "tools/call",
|
|
"params": {"name": "echo", "arguments": {"text": "hello"}},
|
|
"id": 2
|
|
}'
|
|
```
|
|
|
|
### Phase 2: Voice I/O Services (Priority 2)
|
|
|
|
#### 3. Wake-Word Node (TICKET-006)
|
|
|
|
**Prerequisites:** Hardware (microphone, always-on node)
|
|
|
|
**Implementation Steps:**
|
|
|
|
1. **Install openWakeWord** (or selected engine)
|
|
```bash
|
|
pip install openwakeword
|
|
```
|
|
|
|
2. **Create Wake-Word Service**
|
|
- Audio capture (PyAudio)
|
|
- Wake-word detection loop
|
|
- Event emission (WebSocket/MQTT/HTTP)
|
|
|
|
3. **Test Detection**
|
|
- Train/configure "Hey Atlas" wake-word
|
|
- Test false positive/negative rates
|
|
|
|
#### 4. ASR Service (TICKET-010)
|
|
|
|
**Prerequisites:** faster-whisper selected
|
|
|
|
**Implementation Steps:**
|
|
|
|
1. **Install faster-whisper**
|
|
```bash
|
|
pip install faster-whisper
|
|
```
|
|
|
|
2. **Download Model**
|
|
```python
|
|
from faster_whisper import WhisperModel
|
|
model = WhisperModel("small", device="cuda", compute_type="float16")
|
|
```
|
|
|
|
3. **Create WebSocket Service**
|
|
- Audio streaming endpoint
|
|
- Real-time transcription
|
|
- Text segment output
|
|
|
|
4. **Integrate with Wake-Word**
|
|
- Start ASR on wake-word event
|
|
- Stop on silence or user command
|
|
|
|
#### 5. TTS Service (TICKET-014)
|
|
|
|
**Prerequisites:** TTS evaluation complete
|
|
|
|
**Implementation Steps:**
|
|
|
|
1. **Install Piper** (or selected TTS)
|
|
```bash
|
|
# Install Piper
|
|
wget https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz
|
|
tar -xzf piper_amd64.tar.gz
|
|
```
|
|
|
|
2. **Download Voice Model**
|
|
```bash
|
|
# Download voice model
|
|
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx
|
|
```
|
|
|
|
3. **Create HTTP Service**
|
|
- Text input → audio output
|
|
- Streaming support
|
|
- Voice selection
|
|
|
|
## Quick Start Checklist
|
|
|
|
### Week 1: Core Infrastructure
|
|
- [ ] Set up 4080 LLM server (TICKET-021)
|
|
- [ ] Set up 1050 LLM server (TICKET-022)
|
|
- [ ] Test both servers independently
|
|
- [ ] Implement minimal MCP server (TICKET-029)
|
|
- [ ] Test MCP server with echo tool
|
|
|
|
### Week 2: Voice Services
|
|
- [ ] Prototype wake-word node (TICKET-006) - if hardware ready
|
|
- [ ] Implement ASR service (TICKET-010)
|
|
- [ ] Implement TTS service (TICKET-014)
|
|
- [ ] Test voice pipeline end-to-end
|
|
|
|
### Week 3: Integration
|
|
- [ ] Implement MCP-LLM adapter (TICKET-030)
|
|
- [ ] Add core tools (weather, time, tasks)
|
|
- [ ] Create routing layer (TICKET-023)
|
|
- [ ] Test full system
|
|
|
|
## Common Issues & Solutions
|
|
|
|
### LLM Server Issues
|
|
|
|
**Problem:** Model doesn't fit in VRAM
|
|
- **Solution:** Use Q4 quantization, reduce context window
|
|
|
|
**Problem:** Slow inference
|
|
- **Solution:** Check GPU utilization, use GPU-accelerated inference
|
|
|
|
**Problem:** Function calling not working
|
|
- **Solution:** Verify model supports function calling, check prompt format
|
|
|
|
### MCP Server Issues
|
|
|
|
**Problem:** JSON-RPC errors
|
|
- **Solution:** Validate request format, check error codes
|
|
|
|
**Problem:** Tools not discovered
|
|
- **Solution:** Verify tool registration, check `tools/list` response
|
|
|
|
### Voice Services Issues
|
|
|
|
**Problem:** High latency
|
|
- **Solution:** Use GPU for ASR, optimize model size
|
|
|
|
**Problem:** Poor accuracy
|
|
- **Solution:** Use larger model, improve audio quality
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- Test each service independently
|
|
- Mock dependencies where needed
|
|
|
|
### Integration Tests
|
|
- Test LLM → MCP → Tool flow
|
|
- Test Wake-word → ASR → LLM → TTS flow
|
|
|
|
### End-to-End Tests
|
|
- Full voice interaction
|
|
- Tool calling scenarios
|
|
- Error handling
|
|
|
|
## Next Steps After Milestone 2
|
|
|
|
Once core infrastructure is working:
|
|
1. Add more MCP tools (TICKET-031, TICKET-032, TICKET-033, TICKET-034)
|
|
2. Implement phone client (TICKET-039)
|
|
3. Add system prompts (TICKET-025)
|
|
4. Implement conversation handling (TICKET-027)
|
|
|
|
## References
|
|
|
|
- **Ollama Docs**: https://ollama.com/docs
|
|
- **vLLM Docs**: https://docs.vllm.ai
|
|
- **faster-whisper**: https://github.com/guillaumekln/faster-whisper
|
|
- **MCP Spec**: https://modelcontextprotocol.io/specification
|
|
- **Model Selection**: `docs/MODEL_SELECTION.md`
|
|
- **ASR Evaluation**: `docs/ASR_EVALUATION.md`
|
|
- **MCP Architecture**: `docs/MCP_ARCHITECTURE.md`
|
|
|
|
---
|
|
|
|
**Last Updated**: 2024-01-XX
|
|
**Status**: Ready for Implementation
|