This commit completes the evaluation of Text-to-Speech (TTS) options as described in TICKET-013. - Creates a detailed document comparing Piper, Mimic 3, and Coqui TTS. - Recommends Piper for initial development due to its performance and low resource usage. - Updates to reflect the decision and points to the new evaluation document. - Moves TICKET-013 to the 'done' column.
1.2 KiB
1.2 KiB
Ticket: Evaluate TTS Options
Ticket Information
- ID: TICKET-013
- Title: Evaluate TTS Options
- Type: Research
- Priority: High
- Status: Backlog
- Track: Voice I/O
- Milestone: Milestone 1 - Survey & Architecture
- Created: 2024-01-XX
Description
Evaluate text-to-speech options:
- Compare open source options (Piper, Mimic 3, etc.)
- Evaluate local neural TTS solutions
- Select 1-2 voices for family agent
- Consider latency, quality, and resource usage
Acceptance Criteria
- TTS options compared
- Selected TTS engine documented
- Voice samples selected
- Resource requirements documented
- Decision recorded in architecture docs
Technical Details
Options to evaluate:
- Piper (lightweight, fast)
- Mimic 3 (high quality)
- Coqui TTS (neural, customizable)
- Other open-source solutions
Considerations:
- Latency for interactive use
- Voice quality and naturalness
- Resource usage
- Customization options
Dependencies
- TICKET-004 (architecture) - helpful context
Related Files
docs/TTS_EVALUATION.md(to be created)ARCHITECTURE.md
Notes
Independent of LLM logic. Can be developed in parallel with other voice work.