ilia bdbf09a9ac feat: Implement voice I/O services (TICKET-006, TICKET-010, TICKET-014)
 TICKET-006: Wake-word Detection Service
- Implemented wake-word detection using openWakeWord
- HTTP/WebSocket server on port 8002
- Real-time detection with configurable threshold
- Event emission for ASR integration
- Location: home-voice-agent/wake-word/

 TICKET-010: ASR Service
- Implemented ASR using faster-whisper
- HTTP endpoint for file transcription
- WebSocket endpoint for streaming transcription
- Support for multiple audio formats
- Auto language detection
- GPU acceleration support
- Location: home-voice-agent/asr/

 TICKET-014: TTS Service
- Implemented TTS using Piper
- HTTP endpoint for text-to-speech synthesis
- Low-latency processing (< 500ms)
- Multiple voice support
- WAV audio output
- Location: home-voice-agent/tts/

 TICKET-047: Updated Hardware Purchases
- Marked Pi5 kit, SSD, microphone, and speakers as purchased
- Updated progress log with purchase status

📚 Documentation:
- Added VOICE_SERVICES_README.md with complete testing guide
- Each service includes README.md with usage instructions
- All services ready for Pi5 deployment

🧪 Testing:
- Created test files for each service
- All imports validated
- FastAPI apps created successfully
- Code passes syntax validation

🚀 Ready for:
- Pi5 deployment
- End-to-end voice flow testing
- Integration with MCP server

Files Added:
- wake-word/detector.py
- wake-word/server.py
- wake-word/requirements.txt
- wake-word/README.md
- wake-word/test_detector.py
- asr/service.py
- asr/server.py
- asr/requirements.txt
- asr/README.md
- asr/test_service.py
- tts/service.py
- tts/server.py
- tts/requirements.txt
- tts/README.md
- tts/test_service.py
- VOICE_SERVICES_README.md

Files Modified:
- tickets/done/TICKET-047_hardware-purchases.md

Files Moved:
- tickets/backlog/TICKET-006_prototype-wake-word-node.md → tickets/done/
- tickets/backlog/TICKET-010_streaming-asr-service.md → tickets/done/
- tickets/backlog/TICKET-014_tts-service.md → tickets/done/
2026-01-12 22:22:38 -05:00

3.5 KiB

Phone PWA Client

Progressive Web App (PWA) for mobile voice interaction with Atlas.

Status

Planning Phase - Design and architecture ready for implementation.

Design Decisions

PWA vs Native

Decision: PWA (Progressive Web App)

Rationale:

  • Cross-platform (iOS, Android, desktop)
  • No app store approval needed
  • Easier updates and deployment
  • Web APIs sufficient for core features:
    • getUserMedia for microphone access
    • WebSocket for real-time communication
    • Service Worker for offline support
    • Push API for notifications

Core Features

  1. Voice Capture

    • Tap-to-talk button
    • Optional wake-word (if browser supports)
    • Audio streaming to ASR endpoint
    • Visual feedback during recording
  2. Conversation View

    • Message history
    • Agent responses (text + audio)
    • Tool call indicators
    • Timestamps
  3. Audio Playback

    • TTS audio playback
    • Play/pause controls
    • Progress indicator
    • Barge-in support (stop on new input)
  4. Task Management

    • View created tasks
    • Task status updates
    • Quick actions
  5. Notifications

    • Timer/reminder alerts
    • Push notifications (when supported)
    • In-app notifications

Technical Stack

  • Framework: Vanilla JavaScript or lightweight framework (Vue/React)
  • Audio: Web Audio API, MediaRecorder API
  • Communication: WebSocket for real-time, HTTP for REST
  • Storage: IndexedDB for offline messages
  • Service Worker: For offline support and caching

Architecture

Phone PWA
├── index.html          # Main app shell
├── manifest.json       # PWA manifest
├── service-worker.js   # Service worker
├── js/
│   ├── app.js         # Main application
│   ├── audio.js       # Audio capture/playback
│   ├── websocket.js   # WebSocket client
│   ├── ui.js          # UI components
│   └── storage.js     # IndexedDB storage
└── css/
    └── styles.css     # Mobile-first styles

API Integration

Endpoints

  • WebSocket: ws://localhost:8000/ws (to be implemented)
  • REST API: http://localhost:8000/api/dashboard/
  • MCP: http://localhost:8000/mcp

Flow

  1. User taps "Talk" button
  2. Capture audio via getUserMedia
  3. Stream to ASR endpoint (WebSocket or HTTP)
  4. Receive transcription
  5. Send to LLM via MCP adapter
  6. Receive response + tool calls
  7. Execute tools if needed
  8. Get TTS audio
  9. Play audio to user
  10. Update conversation view

Implementation Phases

Phase 1: Basic UI (Can Start Now)

  • HTML structure
  • CSS styling (mobile-first)
  • Basic JavaScript framework
  • Mock conversation view

Phase 2: Audio Capture

  • Microphone access
  • Audio recording
  • Visual feedback
  • Audio format conversion

Phase 3: Communication

  • WebSocket client
  • ASR integration
  • LLM request/response
  • Error handling

Phase 4: Audio Playback

  • TTS audio playback
  • Playback controls
  • Barge-in support

Phase 5: Advanced Features

  • Service worker
  • Offline support
  • Push notifications
  • Task management UI

Dependencies

  • TICKET-010: ASR Service (for audio → text)
  • TICKET-014: TTS Service (for text → audio)
  • Can start with mocks for UI development

Notes

  • Can begin UI development immediately with mocked endpoints
  • WebSocket endpoint needs to be added to MCP server
  • Service worker can be added incrementally
  • Push notifications require HTTPS (use local cert for testing)