# Phone PWA Client Progressive Web App (PWA) for mobile voice interaction with Atlas. ## Status **Planning Phase** - Design and architecture ready for implementation. ## Design Decisions ### PWA vs Native **Decision: PWA (Progressive Web App)** **Rationale:** - Cross-platform (iOS, Android, desktop) - No app store approval needed - Easier updates and deployment - Web APIs sufficient for core features: - `getUserMedia` for microphone access - WebSocket for real-time communication - Service Worker for offline support - Push API for notifications ### Core Features 1. **Voice Capture** - Tap-to-talk button - Optional wake-word (if browser supports) - Audio streaming to ASR endpoint - Visual feedback during recording 2. **Conversation View** - Message history - Agent responses (text + audio) - Tool call indicators - Timestamps 3. **Audio Playback** - TTS audio playback - Play/pause controls - Progress indicator - Barge-in support (stop on new input) 4. **Task Management** - View created tasks - Task status updates - Quick actions 5. **Notifications** - Timer/reminder alerts - Push notifications (when supported) - In-app notifications ## Technical Stack - **Framework**: Vanilla JavaScript or lightweight framework (Vue/React) - **Audio**: Web Audio API, MediaRecorder API - **Communication**: WebSocket for real-time, HTTP for REST - **Storage**: IndexedDB for offline messages - **Service Worker**: For offline support and caching ## Architecture ``` Phone PWA ├── index.html # Main app shell ├── manifest.json # PWA manifest ├── service-worker.js # Service worker ├── js/ │ ├── app.js # Main application │ ├── audio.js # Audio capture/playback │ ├── websocket.js # WebSocket client │ ├── ui.js # UI components │ └── storage.js # IndexedDB storage └── css/ └── styles.css # Mobile-first styles ``` ## API Integration ### Endpoints - **WebSocket**: `ws://localhost:8000/ws` (to be implemented) - **REST API**: `http://localhost:8000/api/dashboard/` - **MCP**: `http://localhost:8000/mcp` ### Flow 1. User taps "Talk" button 2. Capture audio via `getUserMedia` 3. Stream to ASR endpoint (WebSocket or HTTP) 4. Receive transcription 5. Send to LLM via MCP adapter 6. Receive response + tool calls 7. Execute tools if needed 8. Get TTS audio 9. Play audio to user 10. Update conversation view ## Implementation Phases ### Phase 1: Basic UI (Can Start Now) - [ ] HTML structure - [ ] CSS styling (mobile-first) - [ ] Basic JavaScript framework - [ ] Mock conversation view ### Phase 2: Audio Capture - [ ] Microphone access - [ ] Audio recording - [ ] Visual feedback - [ ] Audio format conversion ### Phase 3: Communication - [ ] WebSocket client - [ ] ASR integration - [ ] LLM request/response - [ ] Error handling ### Phase 4: Audio Playback - [ ] TTS audio playback - [ ] Playback controls - [ ] Barge-in support ### Phase 5: Advanced Features - [ ] Service worker - [ ] Offline support - [ ] Push notifications - [ ] Task management UI ## Dependencies - TICKET-010: ASR Service (for audio → text) - TICKET-014: TTS Service (for text → audio) - Can start with mocks for UI development ## Notes - Can begin UI development immediately with mocked endpoints - WebSocket endpoint needs to be added to MCP server - Service worker can be added incrementally - Push notifications require HTTPS (use local cert for testing)