✅ TICKET-006: Wake-word Detection Service - Implemented wake-word detection using openWakeWord - HTTP/WebSocket server on port 8002 - Real-time detection with configurable threshold - Event emission for ASR integration - Location: home-voice-agent/wake-word/ ✅ TICKET-010: ASR Service - Implemented ASR using faster-whisper - HTTP endpoint for file transcription - WebSocket endpoint for streaming transcription - Support for multiple audio formats - Auto language detection - GPU acceleration support - Location: home-voice-agent/asr/ ✅ TICKET-014: TTS Service - Implemented TTS using Piper - HTTP endpoint for text-to-speech synthesis - Low-latency processing (< 500ms) - Multiple voice support - WAV audio output - Location: home-voice-agent/tts/ ✅ TICKET-047: Updated Hardware Purchases - Marked Pi5 kit, SSD, microphone, and speakers as purchased - Updated progress log with purchase status 📚 Documentation: - Added VOICE_SERVICES_README.md with complete testing guide - Each service includes README.md with usage instructions - All services ready for Pi5 deployment 🧪 Testing: - Created test files for each service - All imports validated - FastAPI apps created successfully - Code passes syntax validation 🚀 Ready for: - Pi5 deployment - End-to-end voice flow testing - Integration with MCP server Files Added: - wake-word/detector.py - wake-word/server.py - wake-word/requirements.txt - wake-word/README.md - wake-word/test_detector.py - asr/service.py - asr/server.py - asr/requirements.txt - asr/README.md - asr/test_service.py - tts/service.py - tts/server.py - tts/requirements.txt - tts/README.md - tts/test_service.py - VOICE_SERVICES_README.md Files Modified: - tickets/done/TICKET-047_hardware-purchases.md Files Moved: - tickets/backlog/TICKET-006_prototype-wake-word-node.md → tickets/done/ - tickets/backlog/TICKET-010_streaming-asr-service.md → tickets/done/ - tickets/backlog/TICKET-014_tts-service.md → tickets/done/
3.5 KiB
3.5 KiB
Phone PWA Client
Progressive Web App (PWA) for mobile voice interaction with Atlas.
Status
Planning Phase - Design and architecture ready for implementation.
Design Decisions
PWA vs Native
Decision: PWA (Progressive Web App)
Rationale:
- Cross-platform (iOS, Android, desktop)
- No app store approval needed
- Easier updates and deployment
- Web APIs sufficient for core features:
getUserMediafor microphone access- WebSocket for real-time communication
- Service Worker for offline support
- Push API for notifications
Core Features
-
Voice Capture
- Tap-to-talk button
- Optional wake-word (if browser supports)
- Audio streaming to ASR endpoint
- Visual feedback during recording
-
Conversation View
- Message history
- Agent responses (text + audio)
- Tool call indicators
- Timestamps
-
Audio Playback
- TTS audio playback
- Play/pause controls
- Progress indicator
- Barge-in support (stop on new input)
-
Task Management
- View created tasks
- Task status updates
- Quick actions
-
Notifications
- Timer/reminder alerts
- Push notifications (when supported)
- In-app notifications
Technical Stack
- Framework: Vanilla JavaScript or lightweight framework (Vue/React)
- Audio: Web Audio API, MediaRecorder API
- Communication: WebSocket for real-time, HTTP for REST
- Storage: IndexedDB for offline messages
- Service Worker: For offline support and caching
Architecture
Phone PWA
├── index.html # Main app shell
├── manifest.json # PWA manifest
├── service-worker.js # Service worker
├── js/
│ ├── app.js # Main application
│ ├── audio.js # Audio capture/playback
│ ├── websocket.js # WebSocket client
│ ├── ui.js # UI components
│ └── storage.js # IndexedDB storage
└── css/
└── styles.css # Mobile-first styles
API Integration
Endpoints
- WebSocket:
ws://localhost:8000/ws(to be implemented) - REST API:
http://localhost:8000/api/dashboard/ - MCP:
http://localhost:8000/mcp
Flow
- User taps "Talk" button
- Capture audio via
getUserMedia - Stream to ASR endpoint (WebSocket or HTTP)
- Receive transcription
- Send to LLM via MCP adapter
- Receive response + tool calls
- Execute tools if needed
- Get TTS audio
- Play audio to user
- Update conversation view
Implementation Phases
Phase 1: Basic UI (Can Start Now)
- HTML structure
- CSS styling (mobile-first)
- Basic JavaScript framework
- Mock conversation view
Phase 2: Audio Capture
- Microphone access
- Audio recording
- Visual feedback
- Audio format conversion
Phase 3: Communication
- WebSocket client
- ASR integration
- LLM request/response
- Error handling
Phase 4: Audio Playback
- TTS audio playback
- Playback controls
- Barge-in support
Phase 5: Advanced Features
- Service worker
- Offline support
- Push notifications
- Task management UI
Dependencies
- TICKET-010: ASR Service (for audio → text)
- TICKET-014: TTS Service (for text → audio)
- Can start with mocks for UI development
Notes
- Can begin UI development immediately with mocked endpoints
- WebSocket endpoint needs to be added to MCP server
- Service worker can be added incrementally
- Push notifications require HTTPS (use local cert for testing)