# Project: On-Device Multimodal AI Assistant - Context for Cursor ## Project Summary An Android app that lets users select any on-screen element via circle/touch gesture, then routes the captured content (text, image, audio, file) to local AI agents for transcription, summarization, or reply. The system is fully on-device, with a persistent dialogue agent and privacy-first, no-cloud architecture. ## CORE TECHNOLOGY/STACK - Android (minSdk 27+, targetSdk 34), Jetpack Compose for overlay UI - AccessibilityService overlay (TYPE_ACCESSIBILITY_OVERLAY) - MediaProjection API for screenshots (if needed for region capture) - Gesture tracking: custom View/Canvas for circle recognition - Speech-to-Text: Vosk, DeepSpeech, or PocketSphinx (offline STT) - LLM/Reasoning: MLC Chat, SmolChat, Edge Gallery (on-device Llama 3, Phi-3, Gemma, or Qwen in GGUF format) - Vision/image: MLKit, TFLite, or ONNX lightweight models for image/attachment detection/classification - Voice command: Porcupine, Vosk, or PocketSphinx for wake word and command recognition - Data privacy: No internet/network calls; all app permissions and data are local-only ## AGENT DIALOGUE FLOW 1. User draws/selects/circles a region on screen using overlay. 2. The selected region is classified (text, image, audio, etc). 3. Agent analyzes and responds with summary/transcription/explanation. 4. Dialogue persists—agent maintains context, responds until "execute" or user closes. 5. Supports both gesture and on-device voice commands. ## DIRECTORY STRUCTURE - /src/ - /accessibility/ (service, overlay manager) - /gesture/ (gesture view, region processor) - /model/ (STT and LLM wrappers, model runners) - /vision/ (content detector) - /agent/ (state manager, dialogue machine) - /ui/ (Compose overlays/components) - /privacy/ (cache controls, settings) - /tests/ (unit and integration tests) - /.notes/ (project overview, task list, directory structure, meeting notes) ## KNOWN BOTTLENECKS - Model size/footprint for LLMs and STT, ensure device compatibility - Reliable overlay and region capture across device OEMs (esp. MIUI/Samsung) - Gesture recognition, false positive handling ## CODING GUIDELINES FOR CURSOR - Use Kotlin for all Android code. - Break features into composable modules, favor composition over inheritance. - Annotate service, overlay, and ML inference points for easy lookup. - Prioritize async, non-blocking calls for ML inference. - Ensure all data is handled in local app context, zero network calls. ## CONTRIBUTOR GUIDELINES - Document module purpose in file headers. - Reference related tickets in code comments. - Use provided ticket IDs for TODOs. ## SESSION INITIALIZATION When starting a new Cursor session: 1. **Read context files first:** - `.notes/project_overview.md` - High-level project goals and modules - `.notes/directory_structure.md` - File organization and relationships - `.notes/task_list.md` - Current tasks and priorities - `.notes/meeting_notes.md` - Recent decisions and discussions 2. **Review architectural constraints:** - `ARCHITECTURE.md` - System design and technical details - `.cursorrules` - Development rules and guidelines 3. **Before making changes:** - Check module purpose and boundaries - Verify no network/cloud dependencies introduced - Ensure async patterns for ML operations - Follow Kotlin and Compose best practices - Document architectural changes ## MILESTONE OVERVIEW ### Milestone 1: Proof of Concept (POC) **Goal:** Isolated modules working for gesture selection, content classification, local AI responses **Tickets:** 1. Implement Accessibility Service overlay (touch/circle gesture support) 2. Region processor: extract area from gesture, screenshot/capture 3. Connect region processor output to content-type detector (audio/text/image) 4. Integrate Vosk/DeepSpeech and run STT on selected region (if audio) 5. Integrate MLC Chat or SmolChat for LLM reasoning (if text) 6. Prototype UI overlay with agent feedback/suggestions (Jetpack Compose) 7. Write basic tests and documentation for modules **Deliverable:** One-click deployable demo APK ### Milestone 2: Minimum Viable Product (MVP) **Goal:** Installable app showing main workflow with persistent dialogue agent **Tickets:** 8. Build dialogue agent state machine, maintain context until execution 9. Integrate voice interface for hands-free command routing 10. Add privacy controls, local data cache management 11. Polish Compose UI with context display and action triggers 12. Documentation pass, usage guide, contributor guidelines 13. Test wider device compatibility, optimize model footprint **Deliverable:** Fully functional app with gesture/voice selection, local AI inference, agent interaction ### Milestone 3: Full Feature Release **Goal:** Production-ready application with plugin system **Focus Areas:** - Advanced content classification - Plugin/module system for extensions - Export/import settings and session logs - Issue triage and contributor onboarding - Performance optimization - Comprehensive documentation ## COMMON DEVELOPMENT TASKS ### Adding a New Module 1. Create directory under `/src/` with descriptive name 2. Add `.gitkeep` with module description 3. Update `.notes/directory_structure.md` 4. Document module purpose in file headers 5. Add corresponding tests in `/tests/` ### Integrating ML Models 1. Ensure model is local/offline only 2. Wrap model loading in async operation 3. Add error handling for model failures 4. Document model requirements (size, device compatibility) 5. Add model configuration to privacy settings ### Working with Accessibility Service 1. Review Android accessibility best practices 2. Test overlay behavior on multiple OEMs 3. Handle permission requests gracefully 4. Document accessibility features ### UI Development with Compose 1. Use Jetpack Compose for all UI components 2. Follow Material Design guidelines 3. Ensure overlay UI is non-intrusive 4. Test UI responsiveness and performance ## TESTING STRATEGY - Unit tests for business logic - Integration tests for module interactions - UI tests for Compose components - Device compatibility testing (multiple OEMs) - Performance and battery impact testing - Accessibility feature testing ## PRIVACY CHECKLIST Before submitting any code: - [ ] No network calls added - [ ] No cloud service dependencies - [ ] All data stays on device - [ ] Uses local-only ML models - [ ] Respects user privacy settings - [ ] Data can be cleared by user ## PROJECT STATUS **Current Phase:** Initial Setup **Next Steps:** Begin Milestone 1 POC implementation **Priority:** Accessibility Service overlay and gesture recognition ## USEFUL REFERENCES - [Android Accessibility Service](https://developer.android.com/reference/android/accessibilityservice/AccessibilityService) - [Jetpack Compose](https://developer.android.com/jetpack/compose) - [Vosk STT](https://alphacephei.com/vosk/) - [MLC Chat](https://mlc.ai/mlc-llm/) - [Kotlin Coroutines](https://kotlinlang.org/docs/coroutines-overview.html) --- **Remember:** This project is privacy-first. Every feature must work entirely on-device with no external network dependencies for core AI functionality.