# TTS (Text-to-Speech) Service Text-to-speech service using Piper for low-latency speech synthesis. ## Features - HTTP endpoint for text-to-speech synthesis - Low-latency processing (< 500ms) - Multiple voice support - WAV audio output - Streaming support (for long text) ## Installation ### Install Piper ```bash # Download Piper binary # See: https://github.com/rhasspy/piper # Download voices # See: https://huggingface.co/rhasspy/piper-voices # Place piper binary in tts/piper/ # Place voices in tts/piper/voices/ ``` ### Install Python Dependencies ```bash pip install -r requirements.txt ``` ## Usage ### Standalone Service ```bash # Run as HTTP server python3 -m tts.server # Or use uvicorn directly uvicorn tts.server:app --host 0.0.0.0 --port 8003 ``` ### Python API ```python from tts.service import TTSService service = TTSService( voice="en_US-lessac-medium", sample_rate=22050 ) # Synthesize text audio_data = service.synthesize("Hello, this is a test.") with open("output.wav", "wb") as f: f.write(audio_data) ``` ## API Endpoints ### HTTP - `GET /health` - Health check - `POST /synthesize` - Synthesize speech from text - Body: `{"text": "Hello", "voice": "en_US-lessac-medium", "format": "wav"}` - `GET /synthesize?text=Hello&voice=en_US-lessac-medium&format=wav` - Synthesize (GET) - `GET /voices` - Get available voices ## Configuration - **Voice**: en_US-lessac-medium (default) - **Sample Rate**: 22050 Hz - **Format**: WAV (default), RAW - **Latency**: < 500ms for short text ## Integration The TTS service is called by: 1. LLM response handler 2. Conversation manager 3. Direct HTTP requests Output is: 1. Played through speakers 2. Streamed to clients 3. Saved to file (optional) ## Testing ```bash # Test health curl http://localhost:8003/health # Test synthesis curl -X POST http://localhost:8003/synthesize \ -H "Content-Type: application/json" \ -d '{"text": "Hello, this is a test.", "format": "wav"}' \ --output output.wav # Test GET endpoint curl "http://localhost:8003/synthesize?text=Hello" --output output.wav ``` ## Notes - Requires Piper binary and voice files - First run may be slower (model loading) - Supports multiple languages (with appropriate voices) - Low resource usage (CPU-only, no GPU required) ## Voice Selection For the "family agent" persona: - **Recommended**: `en_US-lessac-medium` (warm, friendly, clear) - **Alternative**: Other English voices from Piper voice collection ## Future Enhancements - Streaming synthesis for long text - Voice cloning - Emotion/prosody control - Multiple language support