atlas/home-voice-agent/clients/phone/README.md

# Phone PWA Client

Progressive Web App (PWA) for mobile voice interaction with Atlas.

## Status

**Planning Phase** - Design and architecture ready for implementation.

## Design Decisions

### PWA vs Native

**Decision: PWA (Progressive Web App)**

**Rationale:**
- Cross-platform (iOS, Android, desktop)
- No app store approval needed
- Easier updates and deployment
- Web APIs sufficient for core features:
  - `getUserMedia` for microphone access
  - WebSocket for real-time communication
  - Service Worker for offline support
  - Push API for notifications

### Core Features

1. **Voice Capture**
   - Tap-to-talk button
   - Optional wake-word (if browser supports)
   - Audio streaming to ASR endpoint
   - Visual feedback during recording

2. **Conversation View**
   - Message history
   - Agent responses (text + audio)
   - Tool call indicators
   - Timestamps

3. **Audio Playback**
   - TTS audio playback
   - Play/pause controls
   - Progress indicator
   - Barge-in support (stop on new input)

4. **Task Management**
   - View created tasks
   - Task status updates
   - Quick actions

5. **Notifications**
   - Timer/reminder alerts
   - Push notifications (when supported)
   - In-app notifications

## Technical Stack

- **Framework**: Vanilla JavaScript or lightweight framework (Vue/React)
- **Audio**: Web Audio API, MediaRecorder API
- **Communication**: WebSocket for real-time, HTTP for REST
- **Storage**: IndexedDB for offline messages
- **Service Worker**: For offline support and caching

## Architecture

```
Phone PWA
├── index.html          # Main app shell
├── manifest.json       # PWA manifest
├── service-worker.js   # Service worker
├── js/
│   ├── app.js         # Main application
│   ├── audio.js       # Audio capture/playback
│   ├── websocket.js   # WebSocket client
│   ├── ui.js          # UI components
│   └── storage.js     # IndexedDB storage
└── css/
    └── styles.css     # Mobile-first styles
```

## API Integration

### Endpoints

- **WebSocket**: `ws://localhost:8000/ws` (to be implemented)
- **REST API**: `http://localhost:8000/api/dashboard/`
- **MCP**: `http://localhost:8000/mcp`

### Flow

1. User taps "Talk" button
2. Capture audio via `getUserMedia`
3. Stream to ASR endpoint (WebSocket or HTTP)
4. Receive transcription
5. Send to LLM via MCP adapter
6. Receive response + tool calls
7. Execute tools if needed
8. Get TTS audio
9. Play audio to user
10. Update conversation view

## Implementation Phases

### Phase 1: Basic UI (Can Start Now)
- [ ] HTML structure
- [ ] CSS styling (mobile-first)
- [ ] Basic JavaScript framework
- [ ] Mock conversation view

### Phase 2: Audio Capture
- [ ] Microphone access
- [ ] Audio recording
- [ ] Visual feedback
- [ ] Audio format conversion

### Phase 3: Communication
- [ ] WebSocket client
- [ ] ASR integration
- [ ] LLM request/response
- [ ] Error handling

### Phase 4: Audio Playback
- [ ] TTS audio playback
- [ ] Playback controls
- [ ] Barge-in support

### Phase 5: Advanced Features
- [ ] Service worker
- [ ] Offline support
- [ ] Push notifications
- [ ] Task management UI

## Dependencies

- TICKET-010: ASR Service (for audio → text)
- TICKET-014: TTS Service (for text → audio)
- Can start with mocks for UI development

## Notes

- Can begin UI development immediately with mocked endpoints
- WebSocket endpoint needs to be added to MCP server
- Service worker can be added incrementally
- Push notifications require HTTPS (use local cert for testing)