ilia/crkl

ilia c351858749 Update project configuration files and enhance documentation

- Updated `.gitignore` and `.cursorignore` to exclude additional build artifacts and temporary files.
- Enhanced `.cursorrules` with comprehensive AI guidelines and best practices.
- Improved `.notes/directory_structure.md` to reflect the current project structure and module organization.
- Updated `ARCHITECTURE.md` to include new insights on the system's modular design and privacy-first approach.
- Refined `README.md` for clarity on project setup and usage instructions.
- Added new entries in `.notes/meeting_notes.md` to document recent progress and decisions.
- Ensured all changes align with the project's privacy and security standards.

2025-10-18 14:32:33 -04:00

7.1 KiB

Raw Blame History

Project: On-Device Multimodal AI Assistant - Context for Cursor

Project Summary

An Android app that lets users select any on-screen element via circle/touch gesture, then routes the captured content (text, image, audio, file) to local AI agents for transcription, summarization, or reply. The system is fully on-device, with a persistent dialogue agent and privacy-first, no-cloud architecture.

CORE TECHNOLOGY/STACK

Android (minSdk 27+, targetSdk 34), Jetpack Compose for overlay UI
AccessibilityService overlay (TYPE_ACCESSIBILITY_OVERLAY)
MediaProjection API for screenshots (if needed for region capture)
Gesture tracking: custom View/Canvas for circle recognition
Speech-to-Text: Vosk, DeepSpeech, or PocketSphinx (offline STT)
LLM/Reasoning: MLC Chat, SmolChat, Edge Gallery (on-device Llama 3, Phi-3, Gemma, or Qwen in GGUF format)
Vision/image: MLKit, TFLite, or ONNX lightweight models for image/attachment detection/classification
Voice command: Porcupine, Vosk, or PocketSphinx for wake word and command recognition
Data privacy: No internet/network calls; all app permissions and data are local-only

AGENT DIALOGUE FLOW

User draws/selects/circles a region on screen using overlay.
The selected region is classified (text, image, audio, etc).
Agent analyzes and responds with summary/transcription/explanation.
Dialogue persists—agent maintains context, responds until "execute" or user closes.
Supports both gesture and on-device voice commands.

DIRECTORY STRUCTURE

/src/
- /accessibility/ (service, overlay manager)
- /gesture/ (gesture view, region processor)
- /model/ (STT and LLM wrappers, model runners)
- /vision/ (content detector)
- /agent/ (state manager, dialogue machine)
- /ui/ (Compose overlays/components)
- /privacy/ (cache controls, settings)
/tests/ (unit and integration tests)
/.notes/ (project overview, task list, directory structure, meeting notes)

KNOWN BOTTLENECKS

Model size/footprint for LLMs and STT, ensure device compatibility
Reliable overlay and region capture across device OEMs (esp. MIUI/Samsung)
Gesture recognition, false positive handling

CODING GUIDELINES FOR CURSOR

Use Kotlin for all Android code.
Break features into composable modules, favor composition over inheritance.
Annotate service, overlay, and ML inference points for easy lookup.
Prioritize async, non-blocking calls for ML inference.
Ensure all data is handled in local app context, zero network calls.

CONTRIBUTOR GUIDELINES

Document module purpose in file headers.
Reference related tickets in code comments.
Use provided ticket IDs for TODOs.

SESSION INITIALIZATION

When starting a new Cursor session:

Read context files first:
- .notes/project_overview.md - High-level project goals and modules
- .notes/directory_structure.md - File organization and relationships
- .notes/task_list.md - Current tasks and priorities
- .notes/meeting_notes.md - Recent decisions and discussions
Review architectural constraints:
- ARCHITECTURE.md - System design and technical details
- .cursorrules - Development rules and guidelines
Before making changes:
- Check module purpose and boundaries
- Verify no network/cloud dependencies introduced
- Ensure async patterns for ML operations
- Follow Kotlin and Compose best practices
- Document architectural changes

MILESTONE OVERVIEW

Milestone 1: Proof of Concept (POC)

Goal: Isolated modules working for gesture selection, content classification, local AI responses

Tickets:

Implement Accessibility Service overlay (touch/circle gesture support)
Region processor: extract area from gesture, screenshot/capture
Connect region processor output to content-type detector (audio/text/image)
Integrate Vosk/DeepSpeech and run STT on selected region (if audio)
Integrate MLC Chat or SmolChat for LLM reasoning (if text)
Prototype UI overlay with agent feedback/suggestions (Jetpack Compose)
Write basic tests and documentation for modules

Deliverable: One-click deployable demo APK

Milestone 2: Minimum Viable Product (MVP)

Goal: Installable app showing main workflow with persistent dialogue agent

Tickets: 8. Build dialogue agent state machine, maintain context until execution 9. Integrate voice interface for hands-free command routing 10. Add privacy controls, local data cache management 11. Polish Compose UI with context display and action triggers 12. Documentation pass, usage guide, contributor guidelines 13. Test wider device compatibility, optimize model footprint

Deliverable: Fully functional app with gesture/voice selection, local AI inference, agent interaction

Milestone 3: Full Feature Release

Goal: Production-ready application with plugin system

Focus Areas:

Advanced content classification
Plugin/module system for extensions
Export/import settings and session logs
Issue triage and contributor onboarding
Performance optimization
Comprehensive documentation

COMMON DEVELOPMENT TASKS

Adding a New Module

Create directory under /src/ with descriptive name
Add .gitkeep with module description
Update .notes/directory_structure.md
Document module purpose in file headers
Add corresponding tests in /tests/

Integrating ML Models

Ensure model is local/offline only
Wrap model loading in async operation
Add error handling for model failures
Document model requirements (size, device compatibility)
Add model configuration to privacy settings

Working with Accessibility Service

Review Android accessibility best practices
Test overlay behavior on multiple OEMs
Handle permission requests gracefully
Document accessibility features

UI Development with Compose

Use Jetpack Compose for all UI components
Follow Material Design guidelines
Ensure overlay UI is non-intrusive
Test UI responsiveness and performance

TESTING STRATEGY

Unit tests for business logic
Integration tests for module interactions
UI tests for Compose components
Device compatibility testing (multiple OEMs)
Performance and battery impact testing
Accessibility feature testing

PRIVACY CHECKLIST

Before submitting any code:

No network calls added
No cloud service dependencies
All data stays on device
Uses local-only ML models
Respects user privacy settings
Data can be cleared by user

PROJECT STATUS

Current Phase: Initial Setup Next Steps: Begin Milestone 1 POC implementation Priority: Accessibility Service overlay and gesture recognition

USEFUL REFERENCES

Remember: This project is privacy-first. Every feature must work entirely on-device with no external network dependencies for core AI functionality.

7.1 KiB Raw Blame History