- Updated `.gitignore` and `.cursorignore` to exclude additional build artifacts and temporary files. - Enhanced `.cursorrules` with comprehensive AI guidelines and best practices. - Improved `.notes/directory_structure.md` to reflect the current project structure and module organization. - Updated `ARCHITECTURE.md` to include new insights on the system's modular design and privacy-first approach. - Refined `README.md` for clarity on project setup and usage instructions. - Added new entries in `.notes/meeting_notes.md` to document recent progress and decisions. - Ensured all changes align with the project's privacy and security standards.
185 lines
7.1 KiB
Markdown
185 lines
7.1 KiB
Markdown
# Project: On-Device Multimodal AI Assistant - Context for Cursor
|
|
|
|
## Project Summary
|
|
An Android app that lets users select any on-screen element via circle/touch gesture, then routes the captured content (text, image, audio, file) to local AI agents for transcription, summarization, or reply. The system is fully on-device, with a persistent dialogue agent and privacy-first, no-cloud architecture.
|
|
|
|
## CORE TECHNOLOGY/STACK
|
|
|
|
- Android (minSdk 27+, targetSdk 34), Jetpack Compose for overlay UI
|
|
- AccessibilityService overlay (TYPE_ACCESSIBILITY_OVERLAY)
|
|
- MediaProjection API for screenshots (if needed for region capture)
|
|
- Gesture tracking: custom View/Canvas for circle recognition
|
|
- Speech-to-Text: Vosk, DeepSpeech, or PocketSphinx (offline STT)
|
|
- LLM/Reasoning: MLC Chat, SmolChat, Edge Gallery (on-device Llama 3, Phi-3, Gemma, or Qwen in GGUF format)
|
|
- Vision/image: MLKit, TFLite, or ONNX lightweight models for image/attachment detection/classification
|
|
- Voice command: Porcupine, Vosk, or PocketSphinx for wake word and command recognition
|
|
- Data privacy: No internet/network calls; all app permissions and data are local-only
|
|
|
|
## AGENT DIALOGUE FLOW
|
|
|
|
1. User draws/selects/circles a region on screen using overlay.
|
|
2. The selected region is classified (text, image, audio, etc).
|
|
3. Agent analyzes and responds with summary/transcription/explanation.
|
|
4. Dialogue persists—agent maintains context, responds until "execute" or user closes.
|
|
5. Supports both gesture and on-device voice commands.
|
|
|
|
## DIRECTORY STRUCTURE
|
|
|
|
- /src/
|
|
- /accessibility/ (service, overlay manager)
|
|
- /gesture/ (gesture view, region processor)
|
|
- /model/ (STT and LLM wrappers, model runners)
|
|
- /vision/ (content detector)
|
|
- /agent/ (state manager, dialogue machine)
|
|
- /ui/ (Compose overlays/components)
|
|
- /privacy/ (cache controls, settings)
|
|
- /tests/ (unit and integration tests)
|
|
- /.notes/ (project overview, task list, directory structure, meeting notes)
|
|
|
|
## KNOWN BOTTLENECKS
|
|
|
|
- Model size/footprint for LLMs and STT, ensure device compatibility
|
|
- Reliable overlay and region capture across device OEMs (esp. MIUI/Samsung)
|
|
- Gesture recognition, false positive handling
|
|
|
|
## CODING GUIDELINES FOR CURSOR
|
|
|
|
- Use Kotlin for all Android code.
|
|
- Break features into composable modules, favor composition over inheritance.
|
|
- Annotate service, overlay, and ML inference points for easy lookup.
|
|
- Prioritize async, non-blocking calls for ML inference.
|
|
- Ensure all data is handled in local app context, zero network calls.
|
|
|
|
## CONTRIBUTOR GUIDELINES
|
|
|
|
- Document module purpose in file headers.
|
|
- Reference related tickets in code comments.
|
|
- Use provided ticket IDs for TODOs.
|
|
|
|
## SESSION INITIALIZATION
|
|
|
|
When starting a new Cursor session:
|
|
|
|
1. **Read context files first:**
|
|
- `.notes/project_overview.md` - High-level project goals and modules
|
|
- `.notes/directory_structure.md` - File organization and relationships
|
|
- `.notes/task_list.md` - Current tasks and priorities
|
|
- `.notes/meeting_notes.md` - Recent decisions and discussions
|
|
|
|
2. **Review architectural constraints:**
|
|
- `ARCHITECTURE.md` - System design and technical details
|
|
- `.cursorrules` - Development rules and guidelines
|
|
|
|
3. **Before making changes:**
|
|
- Check module purpose and boundaries
|
|
- Verify no network/cloud dependencies introduced
|
|
- Ensure async patterns for ML operations
|
|
- Follow Kotlin and Compose best practices
|
|
- Document architectural changes
|
|
|
|
## MILESTONE OVERVIEW
|
|
|
|
### Milestone 1: Proof of Concept (POC)
|
|
**Goal:** Isolated modules working for gesture selection, content classification, local AI responses
|
|
|
|
**Tickets:**
|
|
1. Implement Accessibility Service overlay (touch/circle gesture support)
|
|
2. Region processor: extract area from gesture, screenshot/capture
|
|
3. Connect region processor output to content-type detector (audio/text/image)
|
|
4. Integrate Vosk/DeepSpeech and run STT on selected region (if audio)
|
|
5. Integrate MLC Chat or SmolChat for LLM reasoning (if text)
|
|
6. Prototype UI overlay with agent feedback/suggestions (Jetpack Compose)
|
|
7. Write basic tests and documentation for modules
|
|
|
|
**Deliverable:** One-click deployable demo APK
|
|
|
|
### Milestone 2: Minimum Viable Product (MVP)
|
|
**Goal:** Installable app showing main workflow with persistent dialogue agent
|
|
|
|
**Tickets:**
|
|
8. Build dialogue agent state machine, maintain context until execution
|
|
9. Integrate voice interface for hands-free command routing
|
|
10. Add privacy controls, local data cache management
|
|
11. Polish Compose UI with context display and action triggers
|
|
12. Documentation pass, usage guide, contributor guidelines
|
|
13. Test wider device compatibility, optimize model footprint
|
|
|
|
**Deliverable:** Fully functional app with gesture/voice selection, local AI inference, agent interaction
|
|
|
|
### Milestone 3: Full Feature Release
|
|
**Goal:** Production-ready application with plugin system
|
|
|
|
**Focus Areas:**
|
|
- Advanced content classification
|
|
- Plugin/module system for extensions
|
|
- Export/import settings and session logs
|
|
- Issue triage and contributor onboarding
|
|
- Performance optimization
|
|
- Comprehensive documentation
|
|
|
|
## COMMON DEVELOPMENT TASKS
|
|
|
|
### Adding a New Module
|
|
1. Create directory under `/src/` with descriptive name
|
|
2. Add `.gitkeep` with module description
|
|
3. Update `.notes/directory_structure.md`
|
|
4. Document module purpose in file headers
|
|
5. Add corresponding tests in `/tests/`
|
|
|
|
### Integrating ML Models
|
|
1. Ensure model is local/offline only
|
|
2. Wrap model loading in async operation
|
|
3. Add error handling for model failures
|
|
4. Document model requirements (size, device compatibility)
|
|
5. Add model configuration to privacy settings
|
|
|
|
### Working with Accessibility Service
|
|
1. Review Android accessibility best practices
|
|
2. Test overlay behavior on multiple OEMs
|
|
3. Handle permission requests gracefully
|
|
4. Document accessibility features
|
|
|
|
### UI Development with Compose
|
|
1. Use Jetpack Compose for all UI components
|
|
2. Follow Material Design guidelines
|
|
3. Ensure overlay UI is non-intrusive
|
|
4. Test UI responsiveness and performance
|
|
|
|
## TESTING STRATEGY
|
|
|
|
- Unit tests for business logic
|
|
- Integration tests for module interactions
|
|
- UI tests for Compose components
|
|
- Device compatibility testing (multiple OEMs)
|
|
- Performance and battery impact testing
|
|
- Accessibility feature testing
|
|
|
|
## PRIVACY CHECKLIST
|
|
|
|
Before submitting any code:
|
|
- [ ] No network calls added
|
|
- [ ] No cloud service dependencies
|
|
- [ ] All data stays on device
|
|
- [ ] Uses local-only ML models
|
|
- [ ] Respects user privacy settings
|
|
- [ ] Data can be cleared by user
|
|
|
|
## PROJECT STATUS
|
|
|
|
**Current Phase:** Initial Setup
|
|
**Next Steps:** Begin Milestone 1 POC implementation
|
|
**Priority:** Accessibility Service overlay and gesture recognition
|
|
|
|
## USEFUL REFERENCES
|
|
|
|
- [Android Accessibility Service](https://developer.android.com/reference/android/accessibilityservice/AccessibilityService)
|
|
- [Jetpack Compose](https://developer.android.com/jetpack/compose)
|
|
- [Vosk STT](https://alphacephei.com/vosk/)
|
|
- [MLC Chat](https://mlc.ai/mlc-llm/)
|
|
- [Kotlin Coroutines](https://kotlinlang.org/docs/coroutines-overview.html)
|
|
|
|
---
|
|
|
|
**Remember:** This project is privacy-first. Every feature must work entirely on-device with no external network dependencies for core AI functionality.
|
|
|