# Voice I/O Services - Implementation Complete All three voice I/O services have been implemented and are ready for testing on Pi5. ## โœ… Services Implemented ### 1. Wake-Word Detection (TICKET-006) โœ… - **Location**: `wake-word/` - **Engine**: openWakeWord - **Port**: 8002 - **Features**: - Real-time wake-word detection ("Hey Atlas") - WebSocket events - HTTP API for control - Low-latency processing ### 2. ASR Service (TICKET-010) โœ… - **Location**: `asr/` - **Engine**: faster-whisper - **Port**: 8001 - **Features**: - HTTP endpoint for file transcription - WebSocket streaming transcription - Multiple audio formats - Auto language detection - GPU acceleration support ### 3. TTS Service (TICKET-014) โœ… - **Location**: `tts/` - **Engine**: Piper - **Port**: 8003 - **Features**: - HTTP endpoint for synthesis - Low-latency (< 500ms) - Multiple voice support - WAV audio output ## ๐Ÿš€ Quick Start ### 1. Install Dependencies ```bash # Wake-word service cd wake-word pip install -r requirements.txt sudo apt-get install portaudio19-dev python3-pyaudio # System deps # ASR service cd ../asr pip install -r requirements.txt # TTS service cd ../tts pip install -r requirements.txt # Note: Requires Piper binary and voice files (see tts/README.md) ``` ### 2. Start Services ```bash # Terminal 1: Wake-word service cd wake-word python3 -m wake-word.server # Terminal 2: ASR service cd asr python3 -m asr.server # Terminal 3: TTS service cd tts python3 -m tts.server ``` ### 3. Test Services ```bash # Test wake-word health curl http://localhost:8002/health # Test ASR health curl http://localhost:8001/health # Test TTS health curl http://localhost:8003/health # Test TTS synthesis curl "http://localhost:8003/synthesize?text=Hello%20world" --output test.wav ``` ## ๐Ÿ“‹ Service Ports | Service | Port | Endpoint | |---------|------|----------| | Wake-Word | 8002 | http://localhost:8002 | | ASR | 8001 | http://localhost:8001 | | TTS | 8003 | http://localhost:8003 | | MCP Server | 8000 | http://localhost:8000 | ## ๐Ÿ”— Integration Flow ``` 1. Wake-word detects "Hey Atlas" โ†“ 2. Wake-word service emits event โ†“ 3. ASR service starts capturing audio โ†“ 4. ASR transcribes speech to text โ†“ 5. Text sent to LLM (via MCP server) โ†“ 6. LLM generates response โ†“ 7. TTS synthesizes response to speech โ†“ 8. Audio played through speakers ``` ## ๐Ÿงช Testing Checklist ### Wake-Word Service - [ ] Service starts without errors - [ ] Health endpoint responds - [ ] Can start/stop detection via API - [ ] WebSocket events received on detection - [ ] Microphone input working ### ASR Service - [ ] Service starts without errors - [ ] Health endpoint responds - [ ] Model loads successfully - [ ] File transcription works - [ ] WebSocket streaming works (if implemented) ### TTS Service - [ ] Service starts without errors - [ ] Health endpoint responds - [ ] Piper binary found - [ ] Voice files available - [ ] Text synthesis works - [ ] Audio output plays correctly ## ๐Ÿ“ Notes ### Wake-Word - Requires microphone access - Uses openWakeWord (Apache 2.0 license) - May need fine-tuning for "Hey Atlas" phrase - Default model may use "Hey Jarvis" as fallback ### ASR - First run downloads model (~500MB for small) - GPU acceleration requires CUDA (if available) - CPU mode works but slower - Supports many languages ### TTS - Requires Piper binary and voice files - Download from: https://github.com/rhasspy/piper - Voices from: https://huggingface.co/rhasspy/piper-voices - Default voice: `en_US-lessac-medium` ## ๐Ÿ”ง Configuration ### Environment Variables Create `.env` file in `home-voice-agent/`: ```bash # Voice Services WAKE_WORD_PORT=8002 ASR_PORT=8001 TTS_PORT=8003 # ASR Configuration ASR_MODEL_SIZE=small ASR_DEVICE=cpu # or "cuda" if GPU available ASR_LANGUAGE=en # TTS Configuration TTS_VOICE=en_US-lessac-medium TTS_SAMPLE_RATE=22050 ``` ## ๐Ÿ› Troubleshooting ### Wake-Word - **No microphone found**: Check USB connection, install portaudio - **No detection**: Lower threshold, check microphone volume - **False positives**: Increase threshold ### ASR - **Model download fails**: Check internet, disk space - **Slow transcription**: Use smaller model, enable GPU - **Import errors**: Install faster-whisper: `pip install faster-whisper` ### TTS - **Piper not found**: Download and place in `tts/piper/` - **Voice not found**: Download voices to `tts/piper/voices/` - **No audio output**: Check speakers, audio system ## ๐Ÿ“š Documentation - Wake-word: `wake-word/README.md` - ASR: `asr/README.md` - TTS: `tts/README.md` - API Contracts: `docs/ASR_API_CONTRACT.md` ## โœ… Status All three services are **implemented and ready for testing** on Pi5! Next steps: 1. Deploy to Pi5 2. Install dependencies 3. Test each service individually 4. Test end-to-end voice flow 5. Integrate with MCP server