atlas/tickets/done/TICKET-009_asr-engine-selection.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

1.7 KiB

Ticket: Select ASR Engine and Target Hardware

Ticket Information

  • ID: TICKET-009
  • Title: Select ASR Engine and Target Hardware
  • Type: Research
  • Priority: High
  • Status: Done
  • Track: Voice I/O
  • Milestone: Milestone 1 - Survey & Architecture
  • Created: 2024-01-XX

Description

Decide on ASR (Automatic Speech Recognition) engine and deployment:

  • Evaluate options: faster-whisper, Whisper.cpp, etc.
  • Decide deployment: faster-whisper on 4080, CPU-only on small box, or shared
  • Consider model size vs latency trade-offs
  • Document hardware requirements

Acceptance Criteria

  • ASR engine selected: faster-whisper (primary)
  • Target hardware decided: RTX 4080 (primary) or CPU always-on node (alternative)
  • Model size selected: small (or medium if GPU headroom available)
  • Latency requirements documented (< 2s target)
  • Decision recorded in architecture docs

Technical Details

Considerations:

  • faster-whisper on 4080: Lower latency, higher quality
  • CPU-only on small box: Lower cost, higher latency
  • Shared deployment: Resource contention considerations
  • Model sizes: tiny/small/medium/base for latency/quality trade-off

Dependencies

  • TICKET-004 (architecture) - helpful context
  • docs/ASR_EVALUATION.md (to be created)
  • ARCHITECTURE.md

Notes

Can run in parallel with TTS and LLM work. Needs wake-word event flow defined for when to start/stop capture.

Progress Log

  • 2024-01-XX - ASR evaluation document created (docs/ASR_EVALUATION.md)
  • 2024-01-XX - Selected: faster-whisper with small model
  • 2024-01-XX - Deployment: RTX 4080 (primary) or CPU always-on node (alternative)
  • 2024-01-XX - Ready for implementation (TICKET-010)