atlas/home-voice-agent/VOICE_SERVICES_README.md

# Voice I/O Services - Implementation Complete

All three voice I/O services have been implemented and are ready for testing on Pi5.

## ✅ Services Implemented

### 1. Wake-Word Detection (TICKET-006) ✅
- **Location**: `wake-word/`
- **Engine**: openWakeWord
- **Port**: 8002
- **Features**:
  - Real-time wake-word detection ("Hey Atlas")
  - WebSocket events
  - HTTP API for control
  - Low-latency processing

### 2. ASR Service (TICKET-010) ✅
- **Location**: `asr/`
- **Engine**: faster-whisper
- **Port**: 8001
- **Features**:
  - HTTP endpoint for file transcription
  - WebSocket streaming transcription
  - Multiple audio formats
  - Auto language detection
  - GPU acceleration support

### 3. TTS Service (TICKET-014) ✅
- **Location**: `tts/`
- **Engine**: Piper
- **Port**: 8003
- **Features**:
  - HTTP endpoint for synthesis
  - Low-latency (< 500ms)
  - Multiple voice support
  - WAV audio output

## 🚀 Quick Start

### 1. Install Dependencies

```bash
# Wake-word service
cd wake-word
pip install -r requirements.txt
sudo apt-get install portaudio19-dev python3-pyaudio  # System deps

# ASR service
cd ../asr
pip install -r requirements.txt

# TTS service
cd ../tts
pip install -r requirements.txt
# Note: Requires Piper binary and voice files (see tts/README.md)
```

### 2. Start Services

```bash
# Terminal 1: Wake-word service
cd wake-word
python3 -m wake-word.server

# Terminal 2: ASR service
cd asr
python3 -m asr.server

# Terminal 3: TTS service
cd tts
python3 -m tts.server
```

### 3. Test Services

```bash
# Test wake-word health
curl http://localhost:8002/health

# Test ASR health
curl http://localhost:8001/health

# Test TTS health
curl http://localhost:8003/health

# Test TTS synthesis
curl "http://localhost:8003/synthesize?text=Hello%20world" --output test.wav
```

## 📋 Service Ports

| Service | Port | Endpoint |
|---------|------|----------|
| Wake-Word | 8002 | http://localhost:8002 |
| ASR | 8001 | http://localhost:8001 |
| TTS | 8003 | http://localhost:8003 |
| MCP Server | 8000 | http://localhost:8000 |

## 🔗 Integration Flow

```
1. Wake-word detects "Hey Atlas"
   ↓
2. Wake-word service emits event
   ↓
3. ASR service starts capturing audio
   ↓
4. ASR transcribes speech to text
   ↓
5. Text sent to LLM (via MCP server)
   ↓
6. LLM generates response
   ↓
7. TTS synthesizes response to speech
   ↓
8. Audio played through speakers
```

## 🧪 Testing Checklist

### Wake-Word Service
- [ ] Service starts without errors
- [ ] Health endpoint responds
- [ ] Can start/stop detection via API
- [ ] WebSocket events received on detection
- [ ] Microphone input working

### ASR Service
- [ ] Service starts without errors
- [ ] Health endpoint responds
- [ ] Model loads successfully
- [ ] File transcription works
- [ ] WebSocket streaming works (if implemented)

### TTS Service
- [ ] Service starts without errors
- [ ] Health endpoint responds
- [ ] Piper binary found
- [ ] Voice files available
- [ ] Text synthesis works
- [ ] Audio output plays correctly

## 📝 Notes

### Wake-Word
- Requires microphone access
- Uses openWakeWord (Apache 2.0 license)
- May need fine-tuning for "Hey Atlas" phrase
- Default model may use "Hey Jarvis" as fallback

### ASR
- First run downloads model (~500MB for small)
- GPU acceleration requires CUDA (if available)
- CPU mode works but slower
- Supports many languages

### TTS
- Requires Piper binary and voice files
- Download from: https://github.com/rhasspy/piper
- Voices from: https://huggingface.co/rhasspy/piper-voices
- Default voice: `en_US-lessac-medium`

## 🔧 Configuration

### Environment Variables
Create `.env` file in `home-voice-agent/`:
```bash
# Voice Services
WAKE_WORD_PORT=8002
ASR_PORT=8001
TTS_PORT=8003

# ASR Configuration
ASR_MODEL_SIZE=small
ASR_DEVICE=cpu  # or "cuda" if GPU available
ASR_LANGUAGE=en

# TTS Configuration
TTS_VOICE=en_US-lessac-medium
TTS_SAMPLE_RATE=22050
```

## 🐛 Troubleshooting

### Wake-Word
- **No microphone found**: Check USB connection, install portaudio
- **No detection**: Lower threshold, check microphone volume
- **False positives**: Increase threshold

### ASR
- **Model download fails**: Check internet, disk space
- **Slow transcription**: Use smaller model, enable GPU
- **Import errors**: Install faster-whisper: `pip install faster-whisper`

### TTS
- **Piper not found**: Download and place in `tts/piper/`
- **Voice not found**: Download voices to `tts/piper/voices/`
- **No audio output**: Check speakers, audio system

## 📚 Documentation

- Wake-word: `wake-word/README.md`
- ASR: `asr/README.md`
- TTS: `tts/README.md`
- API Contracts: `docs/ASR_API_CONTRACT.md`

## ✅ Status

All three services are **implemented and ready for testing** on Pi5!

Next steps:
1. Deploy to Pi5
2. Install dependencies
3. Test each service individually
4. Test end-to-end voice flow
5. Integrate with MCP server