crkl/CURSOR_SUPPORT.md
ilia c351858749 Update project configuration files and enhance documentation
- Updated `.gitignore` and `.cursorignore` to exclude additional build artifacts and temporary files.
- Enhanced `.cursorrules` with comprehensive AI guidelines and best practices.
- Improved `.notes/directory_structure.md` to reflect the current project structure and module organization.
- Updated `ARCHITECTURE.md` to include new insights on the system's modular design and privacy-first approach.
- Refined `README.md` for clarity on project setup and usage instructions.
- Added new entries in `.notes/meeting_notes.md` to document recent progress and decisions.
- Ensured all changes align with the project's privacy and security standards.
2025-10-18 14:32:33 -04:00

7.1 KiB

Project: On-Device Multimodal AI Assistant - Context for Cursor

Project Summary

An Android app that lets users select any on-screen element via circle/touch gesture, then routes the captured content (text, image, audio, file) to local AI agents for transcription, summarization, or reply. The system is fully on-device, with a persistent dialogue agent and privacy-first, no-cloud architecture.

CORE TECHNOLOGY/STACK

  • Android (minSdk 27+, targetSdk 34), Jetpack Compose for overlay UI
  • AccessibilityService overlay (TYPE_ACCESSIBILITY_OVERLAY)
  • MediaProjection API for screenshots (if needed for region capture)
  • Gesture tracking: custom View/Canvas for circle recognition
  • Speech-to-Text: Vosk, DeepSpeech, or PocketSphinx (offline STT)
  • LLM/Reasoning: MLC Chat, SmolChat, Edge Gallery (on-device Llama 3, Phi-3, Gemma, or Qwen in GGUF format)
  • Vision/image: MLKit, TFLite, or ONNX lightweight models for image/attachment detection/classification
  • Voice command: Porcupine, Vosk, or PocketSphinx for wake word and command recognition
  • Data privacy: No internet/network calls; all app permissions and data are local-only

AGENT DIALOGUE FLOW

  1. User draws/selects/circles a region on screen using overlay.
  2. The selected region is classified (text, image, audio, etc).
  3. Agent analyzes and responds with summary/transcription/explanation.
  4. Dialogue persists—agent maintains context, responds until "execute" or user closes.
  5. Supports both gesture and on-device voice commands.

DIRECTORY STRUCTURE

  • /src/
    • /accessibility/ (service, overlay manager)
    • /gesture/ (gesture view, region processor)
    • /model/ (STT and LLM wrappers, model runners)
    • /vision/ (content detector)
    • /agent/ (state manager, dialogue machine)
    • /ui/ (Compose overlays/components)
    • /privacy/ (cache controls, settings)
  • /tests/ (unit and integration tests)
  • /.notes/ (project overview, task list, directory structure, meeting notes)

KNOWN BOTTLENECKS

  • Model size/footprint for LLMs and STT, ensure device compatibility
  • Reliable overlay and region capture across device OEMs (esp. MIUI/Samsung)
  • Gesture recognition, false positive handling

CODING GUIDELINES FOR CURSOR

  • Use Kotlin for all Android code.
  • Break features into composable modules, favor composition over inheritance.
  • Annotate service, overlay, and ML inference points for easy lookup.
  • Prioritize async, non-blocking calls for ML inference.
  • Ensure all data is handled in local app context, zero network calls.

CONTRIBUTOR GUIDELINES

  • Document module purpose in file headers.
  • Reference related tickets in code comments.
  • Use provided ticket IDs for TODOs.

SESSION INITIALIZATION

When starting a new Cursor session:

  1. Read context files first:

    • .notes/project_overview.md - High-level project goals and modules
    • .notes/directory_structure.md - File organization and relationships
    • .notes/task_list.md - Current tasks and priorities
    • .notes/meeting_notes.md - Recent decisions and discussions
  2. Review architectural constraints:

    • ARCHITECTURE.md - System design and technical details
    • .cursorrules - Development rules and guidelines
  3. Before making changes:

    • Check module purpose and boundaries
    • Verify no network/cloud dependencies introduced
    • Ensure async patterns for ML operations
    • Follow Kotlin and Compose best practices
    • Document architectural changes

MILESTONE OVERVIEW

Milestone 1: Proof of Concept (POC)

Goal: Isolated modules working for gesture selection, content classification, local AI responses

Tickets:

  1. Implement Accessibility Service overlay (touch/circle gesture support)
  2. Region processor: extract area from gesture, screenshot/capture
  3. Connect region processor output to content-type detector (audio/text/image)
  4. Integrate Vosk/DeepSpeech and run STT on selected region (if audio)
  5. Integrate MLC Chat or SmolChat for LLM reasoning (if text)
  6. Prototype UI overlay with agent feedback/suggestions (Jetpack Compose)
  7. Write basic tests and documentation for modules

Deliverable: One-click deployable demo APK

Milestone 2: Minimum Viable Product (MVP)

Goal: Installable app showing main workflow with persistent dialogue agent

Tickets: 8. Build dialogue agent state machine, maintain context until execution 9. Integrate voice interface for hands-free command routing 10. Add privacy controls, local data cache management 11. Polish Compose UI with context display and action triggers 12. Documentation pass, usage guide, contributor guidelines 13. Test wider device compatibility, optimize model footprint

Deliverable: Fully functional app with gesture/voice selection, local AI inference, agent interaction

Milestone 3: Full Feature Release

Goal: Production-ready application with plugin system

Focus Areas:

  • Advanced content classification
  • Plugin/module system for extensions
  • Export/import settings and session logs
  • Issue triage and contributor onboarding
  • Performance optimization
  • Comprehensive documentation

COMMON DEVELOPMENT TASKS

Adding a New Module

  1. Create directory under /src/ with descriptive name
  2. Add .gitkeep with module description
  3. Update .notes/directory_structure.md
  4. Document module purpose in file headers
  5. Add corresponding tests in /tests/

Integrating ML Models

  1. Ensure model is local/offline only
  2. Wrap model loading in async operation
  3. Add error handling for model failures
  4. Document model requirements (size, device compatibility)
  5. Add model configuration to privacy settings

Working with Accessibility Service

  1. Review Android accessibility best practices
  2. Test overlay behavior on multiple OEMs
  3. Handle permission requests gracefully
  4. Document accessibility features

UI Development with Compose

  1. Use Jetpack Compose for all UI components
  2. Follow Material Design guidelines
  3. Ensure overlay UI is non-intrusive
  4. Test UI responsiveness and performance

TESTING STRATEGY

  • Unit tests for business logic
  • Integration tests for module interactions
  • UI tests for Compose components
  • Device compatibility testing (multiple OEMs)
  • Performance and battery impact testing
  • Accessibility feature testing

PRIVACY CHECKLIST

Before submitting any code:

  • No network calls added
  • No cloud service dependencies
  • All data stays on device
  • Uses local-only ML models
  • Respects user privacy settings
  • Data can be cleared by user

PROJECT STATUS

Current Phase: Initial Setup Next Steps: Begin Milestone 1 POC implementation Priority: Accessibility Service overlay and gesture recognition

USEFUL REFERENCES


Remember: This project is privacy-first. Every feature must work entirely on-device with no external network dependencies for core AI functionality.