- Updated `.gitignore` and `.cursorignore` to exclude additional build artifacts and temporary files. - Enhanced `.cursorrules` with comprehensive AI guidelines and best practices. - Improved `.notes/directory_structure.md` to reflect the current project structure and module organization. - Updated `ARCHITECTURE.md` to include new insights on the system's modular design and privacy-first approach. - Refined `README.md` for clarity on project setup and usage instructions. - Added new entries in `.notes/meeting_notes.md` to document recent progress and decisions. - Ensured all changes align with the project's privacy and security standards.
7.1 KiB
Project: On-Device Multimodal AI Assistant - Context for Cursor
Project Summary
An Android app that lets users select any on-screen element via circle/touch gesture, then routes the captured content (text, image, audio, file) to local AI agents for transcription, summarization, or reply. The system is fully on-device, with a persistent dialogue agent and privacy-first, no-cloud architecture.
CORE TECHNOLOGY/STACK
- Android (minSdk 27+, targetSdk 34), Jetpack Compose for overlay UI
- AccessibilityService overlay (TYPE_ACCESSIBILITY_OVERLAY)
- MediaProjection API for screenshots (if needed for region capture)
- Gesture tracking: custom View/Canvas for circle recognition
- Speech-to-Text: Vosk, DeepSpeech, or PocketSphinx (offline STT)
- LLM/Reasoning: MLC Chat, SmolChat, Edge Gallery (on-device Llama 3, Phi-3, Gemma, or Qwen in GGUF format)
- Vision/image: MLKit, TFLite, or ONNX lightweight models for image/attachment detection/classification
- Voice command: Porcupine, Vosk, or PocketSphinx for wake word and command recognition
- Data privacy: No internet/network calls; all app permissions and data are local-only
AGENT DIALOGUE FLOW
- User draws/selects/circles a region on screen using overlay.
- The selected region is classified (text, image, audio, etc).
- Agent analyzes and responds with summary/transcription/explanation.
- Dialogue persists—agent maintains context, responds until "execute" or user closes.
- Supports both gesture and on-device voice commands.
DIRECTORY STRUCTURE
- /src/
- /accessibility/ (service, overlay manager)
- /gesture/ (gesture view, region processor)
- /model/ (STT and LLM wrappers, model runners)
- /vision/ (content detector)
- /agent/ (state manager, dialogue machine)
- /ui/ (Compose overlays/components)
- /privacy/ (cache controls, settings)
- /tests/ (unit and integration tests)
- /.notes/ (project overview, task list, directory structure, meeting notes)
KNOWN BOTTLENECKS
- Model size/footprint for LLMs and STT, ensure device compatibility
- Reliable overlay and region capture across device OEMs (esp. MIUI/Samsung)
- Gesture recognition, false positive handling
CODING GUIDELINES FOR CURSOR
- Use Kotlin for all Android code.
- Break features into composable modules, favor composition over inheritance.
- Annotate service, overlay, and ML inference points for easy lookup.
- Prioritize async, non-blocking calls for ML inference.
- Ensure all data is handled in local app context, zero network calls.
CONTRIBUTOR GUIDELINES
- Document module purpose in file headers.
- Reference related tickets in code comments.
- Use provided ticket IDs for TODOs.
SESSION INITIALIZATION
When starting a new Cursor session:
-
Read context files first:
.notes/project_overview.md- High-level project goals and modules.notes/directory_structure.md- File organization and relationships.notes/task_list.md- Current tasks and priorities.notes/meeting_notes.md- Recent decisions and discussions
-
Review architectural constraints:
ARCHITECTURE.md- System design and technical details.cursorrules- Development rules and guidelines
-
Before making changes:
- Check module purpose and boundaries
- Verify no network/cloud dependencies introduced
- Ensure async patterns for ML operations
- Follow Kotlin and Compose best practices
- Document architectural changes
MILESTONE OVERVIEW
Milestone 1: Proof of Concept (POC)
Goal: Isolated modules working for gesture selection, content classification, local AI responses
Tickets:
- Implement Accessibility Service overlay (touch/circle gesture support)
- Region processor: extract area from gesture, screenshot/capture
- Connect region processor output to content-type detector (audio/text/image)
- Integrate Vosk/DeepSpeech and run STT on selected region (if audio)
- Integrate MLC Chat or SmolChat for LLM reasoning (if text)
- Prototype UI overlay with agent feedback/suggestions (Jetpack Compose)
- Write basic tests and documentation for modules
Deliverable: One-click deployable demo APK
Milestone 2: Minimum Viable Product (MVP)
Goal: Installable app showing main workflow with persistent dialogue agent
Tickets: 8. Build dialogue agent state machine, maintain context until execution 9. Integrate voice interface for hands-free command routing 10. Add privacy controls, local data cache management 11. Polish Compose UI with context display and action triggers 12. Documentation pass, usage guide, contributor guidelines 13. Test wider device compatibility, optimize model footprint
Deliverable: Fully functional app with gesture/voice selection, local AI inference, agent interaction
Milestone 3: Full Feature Release
Goal: Production-ready application with plugin system
Focus Areas:
- Advanced content classification
- Plugin/module system for extensions
- Export/import settings and session logs
- Issue triage and contributor onboarding
- Performance optimization
- Comprehensive documentation
COMMON DEVELOPMENT TASKS
Adding a New Module
- Create directory under
/src/with descriptive name - Add
.gitkeepwith module description - Update
.notes/directory_structure.md - Document module purpose in file headers
- Add corresponding tests in
/tests/
Integrating ML Models
- Ensure model is local/offline only
- Wrap model loading in async operation
- Add error handling for model failures
- Document model requirements (size, device compatibility)
- Add model configuration to privacy settings
Working with Accessibility Service
- Review Android accessibility best practices
- Test overlay behavior on multiple OEMs
- Handle permission requests gracefully
- Document accessibility features
UI Development with Compose
- Use Jetpack Compose for all UI components
- Follow Material Design guidelines
- Ensure overlay UI is non-intrusive
- Test UI responsiveness and performance
TESTING STRATEGY
- Unit tests for business logic
- Integration tests for module interactions
- UI tests for Compose components
- Device compatibility testing (multiple OEMs)
- Performance and battery impact testing
- Accessibility feature testing
PRIVACY CHECKLIST
Before submitting any code:
- No network calls added
- No cloud service dependencies
- All data stays on device
- Uses local-only ML models
- Respects user privacy settings
- Data can be cleared by user
PROJECT STATUS
Current Phase: Initial Setup Next Steps: Begin Milestone 1 POC implementation Priority: Accessibility Service overlay and gesture recognition
USEFUL REFERENCES
Remember: This project is privacy-first. Every feature must work entirely on-device with no external network dependencies for core AI functionality.