atlas/home-voice-agent/tts/README.md

# TTS (Text-to-Speech) Service

Text-to-speech service using Piper for low-latency speech synthesis.

## Features

- HTTP endpoint for text-to-speech synthesis
- Low-latency processing (< 500ms)
- Multiple voice support
- WAV audio output
- Streaming support (for long text)

## Installation

### Install Piper

```bash
# Download Piper binary
# See: https://github.com/rhasspy/piper

# Download voices
# See: https://huggingface.co/rhasspy/piper-voices

# Place piper binary in tts/piper/
# Place voices in tts/piper/voices/
```

### Install Python Dependencies

```bash
pip install -r requirements.txt
```

## Usage

### Standalone Service

```bash
# Run as HTTP server
python3 -m tts.server

# Or use uvicorn directly
uvicorn tts.server:app --host 0.0.0.0 --port 8003
```

### Python API

```python
from tts.service import TTSService

service = TTSService(
    voice="en_US-lessac-medium",
    sample_rate=22050
)

# Synthesize text
audio_data = service.synthesize("Hello, this is a test.")
with open("output.wav", "wb") as f:
    f.write(audio_data)
```

## API Endpoints

### HTTP

- `GET /health` - Health check
- `POST /synthesize` - Synthesize speech from text
  - Body: `{"text": "Hello", "voice": "en_US-lessac-medium", "format": "wav"}`
- `GET /synthesize?text=Hello&voice=en_US-lessac-medium&format=wav` - Synthesize (GET)
- `GET /voices` - Get available voices

## Configuration

- **Voice**: en_US-lessac-medium (default)
- **Sample Rate**: 22050 Hz
- **Format**: WAV (default), RAW
- **Latency**: < 500ms for short text

## Integration

The TTS service is called by:
1. LLM response handler
2. Conversation manager
3. Direct HTTP requests

Output is:
1. Played through speakers
2. Streamed to clients
3. Saved to file (optional)

## Testing

```bash
# Test health
curl http://localhost:8003/health

# Test synthesis
curl -X POST http://localhost:8003/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, this is a test.", "format": "wav"}' \
  --output output.wav

# Test GET endpoint
curl "http://localhost:8003/synthesize?text=Hello" --output output.wav
```

## Notes

- Requires Piper binary and voice files
- First run may be slower (model loading)
- Supports multiple languages (with appropriate voices)
- Low resource usage (CPU-only, no GPU required)

## Voice Selection

For the "family agent" persona:
- **Recommended**: `en_US-lessac-medium` (warm, friendly, clear)
- **Alternative**: Other English voices from Piper voice collection

## Future Enhancements

- Streaming synthesis for long text
- Voice cloning
- Emotion/prosody control
- Multiple language support