- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
57 lines
1.7 KiB
Markdown
57 lines
1.7 KiB
Markdown
# Ticket: Select ASR Engine and Target Hardware
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-009
|
|
- **Title**: Select ASR Engine and Target Hardware
|
|
- **Type**: Research
|
|
- **Priority**: High
|
|
- **Status**: Done
|
|
- **Track**: Voice I/O
|
|
- **Milestone**: Milestone 1 - Survey & Architecture
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Decide on ASR (Automatic Speech Recognition) engine and deployment:
|
|
- Evaluate options: faster-whisper, Whisper.cpp, etc.
|
|
- Decide deployment: faster-whisper on 4080, CPU-only on small box, or shared
|
|
- Consider model size vs latency trade-offs
|
|
- Document hardware requirements
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] ASR engine selected: **faster-whisper** (primary)
|
|
- [x] Target hardware decided: **RTX 4080 (primary)** or **CPU always-on node (alternative)**
|
|
- [x] Model size selected: **small** (or medium if GPU headroom available)
|
|
- [x] Latency requirements documented (< 2s target)
|
|
- [x] Decision recorded in architecture docs
|
|
|
|
## Technical Details
|
|
|
|
Considerations:
|
|
- faster-whisper on 4080: Lower latency, higher quality
|
|
- CPU-only on small box: Lower cost, higher latency
|
|
- Shared deployment: Resource contention considerations
|
|
- Model sizes: tiny/small/medium/base for latency/quality trade-off
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-004 (architecture) - helpful context
|
|
|
|
## Related Files
|
|
|
|
- `docs/ASR_EVALUATION.md` (to be created)
|
|
- `ARCHITECTURE.md`
|
|
|
|
## Notes
|
|
|
|
Can run in parallel with TTS and LLM work. Needs wake-word event flow defined for when to start/stop capture.
|
|
|
|
## Progress Log
|
|
|
|
- 2024-01-XX - ASR evaluation document created (`docs/ASR_EVALUATION.md`)
|
|
- 2024-01-XX - Selected: faster-whisper with small model
|
|
- 2024-01-XX - Deployment: RTX 4080 (primary) or CPU always-on node (alternative)
|
|
- 2024-01-XX - Ready for implementation (TICKET-010)
|