✅ TICKET-006: Wake-word Detection Service - Implemented wake-word detection using openWakeWord - HTTP/WebSocket server on port 8002 - Real-time detection with configurable threshold - Event emission for ASR integration - Location: home-voice-agent/wake-word/ ✅ TICKET-010: ASR Service - Implemented ASR using faster-whisper - HTTP endpoint for file transcription - WebSocket endpoint for streaming transcription - Support for multiple audio formats - Auto language detection - GPU acceleration support - Location: home-voice-agent/asr/ ✅ TICKET-014: TTS Service - Implemented TTS using Piper - HTTP endpoint for text-to-speech synthesis - Low-latency processing (< 500ms) - Multiple voice support - WAV audio output - Location: home-voice-agent/tts/ ✅ TICKET-047: Updated Hardware Purchases - Marked Pi5 kit, SSD, microphone, and speakers as purchased - Updated progress log with purchase status 📚 Documentation: - Added VOICE_SERVICES_README.md with complete testing guide - Each service includes README.md with usage instructions - All services ready for Pi5 deployment 🧪 Testing: - Created test files for each service - All imports validated - FastAPI apps created successfully - Code passes syntax validation 🚀 Ready for: - Pi5 deployment - End-to-end voice flow testing - Integration with MCP server Files Added: - wake-word/detector.py - wake-word/server.py - wake-word/requirements.txt - wake-word/README.md - wake-word/test_detector.py - asr/service.py - asr/server.py - asr/requirements.txt - asr/README.md - asr/test_service.py - tts/service.py - tts/server.py - tts/requirements.txt - tts/README.md - tts/test_service.py - VOICE_SERVICES_README.md Files Modified: - tickets/done/TICKET-047_hardware-purchases.md Files Moved: - tickets/backlog/TICKET-006_prototype-wake-word-node.md → tickets/done/ - tickets/backlog/TICKET-010_streaming-asr-service.md → tickets/done/ - tickets/backlog/TICKET-014_tts-service.md → tickets/done/
144 lines
3.5 KiB
Markdown
144 lines
3.5 KiB
Markdown
# Phone PWA Client
|
|
|
|
Progressive Web App (PWA) for mobile voice interaction with Atlas.
|
|
|
|
## Status
|
|
|
|
**Planning Phase** - Design and architecture ready for implementation.
|
|
|
|
## Design Decisions
|
|
|
|
### PWA vs Native
|
|
|
|
**Decision: PWA (Progressive Web App)**
|
|
|
|
**Rationale:**
|
|
- Cross-platform (iOS, Android, desktop)
|
|
- No app store approval needed
|
|
- Easier updates and deployment
|
|
- Web APIs sufficient for core features:
|
|
- `getUserMedia` for microphone access
|
|
- WebSocket for real-time communication
|
|
- Service Worker for offline support
|
|
- Push API for notifications
|
|
|
|
### Core Features
|
|
|
|
1. **Voice Capture**
|
|
- Tap-to-talk button
|
|
- Optional wake-word (if browser supports)
|
|
- Audio streaming to ASR endpoint
|
|
- Visual feedback during recording
|
|
|
|
2. **Conversation View**
|
|
- Message history
|
|
- Agent responses (text + audio)
|
|
- Tool call indicators
|
|
- Timestamps
|
|
|
|
3. **Audio Playback**
|
|
- TTS audio playback
|
|
- Play/pause controls
|
|
- Progress indicator
|
|
- Barge-in support (stop on new input)
|
|
|
|
4. **Task Management**
|
|
- View created tasks
|
|
- Task status updates
|
|
- Quick actions
|
|
|
|
5. **Notifications**
|
|
- Timer/reminder alerts
|
|
- Push notifications (when supported)
|
|
- In-app notifications
|
|
|
|
## Technical Stack
|
|
|
|
- **Framework**: Vanilla JavaScript or lightweight framework (Vue/React)
|
|
- **Audio**: Web Audio API, MediaRecorder API
|
|
- **Communication**: WebSocket for real-time, HTTP for REST
|
|
- **Storage**: IndexedDB for offline messages
|
|
- **Service Worker**: For offline support and caching
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Phone PWA
|
|
├── index.html # Main app shell
|
|
├── manifest.json # PWA manifest
|
|
├── service-worker.js # Service worker
|
|
├── js/
|
|
│ ├── app.js # Main application
|
|
│ ├── audio.js # Audio capture/playback
|
|
│ ├── websocket.js # WebSocket client
|
|
│ ├── ui.js # UI components
|
|
│ └── storage.js # IndexedDB storage
|
|
└── css/
|
|
└── styles.css # Mobile-first styles
|
|
```
|
|
|
|
## API Integration
|
|
|
|
### Endpoints
|
|
|
|
- **WebSocket**: `ws://localhost:8000/ws` (to be implemented)
|
|
- **REST API**: `http://localhost:8000/api/dashboard/`
|
|
- **MCP**: `http://localhost:8000/mcp`
|
|
|
|
### Flow
|
|
|
|
1. User taps "Talk" button
|
|
2. Capture audio via `getUserMedia`
|
|
3. Stream to ASR endpoint (WebSocket or HTTP)
|
|
4. Receive transcription
|
|
5. Send to LLM via MCP adapter
|
|
6. Receive response + tool calls
|
|
7. Execute tools if needed
|
|
8. Get TTS audio
|
|
9. Play audio to user
|
|
10. Update conversation view
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Basic UI (Can Start Now)
|
|
- [ ] HTML structure
|
|
- [ ] CSS styling (mobile-first)
|
|
- [ ] Basic JavaScript framework
|
|
- [ ] Mock conversation view
|
|
|
|
### Phase 2: Audio Capture
|
|
- [ ] Microphone access
|
|
- [ ] Audio recording
|
|
- [ ] Visual feedback
|
|
- [ ] Audio format conversion
|
|
|
|
### Phase 3: Communication
|
|
- [ ] WebSocket client
|
|
- [ ] ASR integration
|
|
- [ ] LLM request/response
|
|
- [ ] Error handling
|
|
|
|
### Phase 4: Audio Playback
|
|
- [ ] TTS audio playback
|
|
- [ ] Playback controls
|
|
- [ ] Barge-in support
|
|
|
|
### Phase 5: Advanced Features
|
|
- [ ] Service worker
|
|
- [ ] Offline support
|
|
- [ ] Push notifications
|
|
- [ ] Task management UI
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-010: ASR Service (for audio → text)
|
|
- TICKET-014: TTS Service (for text → audio)
|
|
- Can start with mocks for UI development
|
|
|
|
## Notes
|
|
|
|
- Can begin UI development immediately with mocked endpoints
|
|
- WebSocket endpoint needs to be added to MCP server
|
|
- Service worker can be added incrementally
|
|
- Push notifications require HTTPS (use local cert for testing)
|