This commit completes the evaluation of Text-to-Speech (TTS) options as described in TICKET-013. - Creates a detailed document comparing Piper, Mimic 3, and Coqui TTS. - Recommends Piper for initial development due to its performance and low resource usage. - Updates to reflect the decision and points to the new evaluation document. - Moves TICKET-013 to the 'done' column.
56 lines
1.2 KiB
Markdown
56 lines
1.2 KiB
Markdown
# Ticket: Evaluate TTS Options
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-013
|
|
- **Title**: Evaluate TTS Options
|
|
- **Type**: Research
|
|
- **Priority**: High
|
|
- **Status**: Backlog
|
|
- **Track**: Voice I/O
|
|
- **Milestone**: Milestone 1 - Survey & Architecture
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Evaluate text-to-speech options:
|
|
- Compare open source options (Piper, Mimic 3, etc.)
|
|
- Evaluate local neural TTS solutions
|
|
- Select 1-2 voices for family agent
|
|
- Consider latency, quality, and resource usage
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] TTS options compared
|
|
- [ ] Selected TTS engine documented
|
|
- [ ] Voice samples selected
|
|
- [ ] Resource requirements documented
|
|
- [ ] Decision recorded in architecture docs
|
|
|
|
## Technical Details
|
|
|
|
Options to evaluate:
|
|
- Piper (lightweight, fast)
|
|
- Mimic 3 (high quality)
|
|
- Coqui TTS (neural, customizable)
|
|
- Other open-source solutions
|
|
|
|
Considerations:
|
|
- Latency for interactive use
|
|
- Voice quality and naturalness
|
|
- Resource usage
|
|
- Customization options
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-004 (architecture) - helpful context
|
|
|
|
## Related Files
|
|
|
|
- `docs/TTS_EVALUATION.md` (to be created)
|
|
- `ARCHITECTURE.md`
|
|
|
|
## Notes
|
|
|
|
Independent of LLM logic. Can be developed in parallel with other voice work.
|