ilia/crkl

ilia c351858749 Update project configuration files and enhance documentation

- Updated `.gitignore` and `.cursorignore` to exclude additional build artifacts and temporary files.
- Enhanced `.cursorrules` with comprehensive AI guidelines and best practices.
- Improved `.notes/directory_structure.md` to reflect the current project structure and module organization.
- Updated `ARCHITECTURE.md` to include new insights on the system's modular design and privacy-first approach.
- Refined `README.md` for clarity on project setup and usage instructions.
- Added new entries in `.notes/meeting_notes.md` to document recent progress and decisions.
- Ensured all changes align with the project's privacy and security standards.

2025-10-18 14:32:33 -04:00

5.5 KiB

Raw Blame History

Technical Architecture

Layered System Modules

1. Accessibility Service Layer

Detects gestures, draws overlays. Uses Android's AccessibilityService to create system-wide overlays that capture user input without interfering with underlying apps.

Key Components:

AccessibilityService implementation
TYPE_ACCESSIBILITY_OVERLAY window management
Gesture capture and event routing

2. Region Processor

Converts user input area into actionable bounds. Translates circle/touch gestures into screen coordinates and extracts the selected region.

Key Components:

Gesture recognition (circle, tap, drag)
Coordinate transformation
Region boundary calculation
Screenshot/content capture via MediaProjection API

3. Content-Type Detector

Classifies input for routing (audio/image/text). Analyzes the captured region to determine what type of content it contains and how to process it.

Key Components:

View hierarchy analysis
OCR text detection
Audio source identification
Image/media classification
File type detection

4. Local AI Engine

Speech Module

Options: Vosk, DeepSpeech, PocketSphinx
Purpose: Offline speech-to-text transcription
Input: Audio streams, recorded audio
Output: Transcribed text

LLM Module

Options: MLC Chat, SmolChat, Edge Gallery (Llama 3, Phi-3, Gemma, Qwen in GGUF format)
Purpose: Local reasoning, summarization, explanation
Input: Text content, context
Output: AI-generated responses

Vision Module

Options: MLKit, TFLite, ONNX lightweight models
Purpose: Image analysis, object detection, content classification
Input: Images, screenshots
Output: Classifications, descriptions, detected objects

5. Dialogue Agent

Maintains session state, open dialogue, executes upon command. Manages conversation flow and context persistence.

Key Components:

State machine for dialogue flow
Context/memory management
Command parsing and routing
Execution trigger handling

6. UI/Feedback Layer

Interactive Compose overlays for user interaction and feedback display.

Key Components:

Jetpack Compose UI components
Overlay windows
Feedback animations
Status indicators
Action buttons

7. Privacy/Data Management

Local-only data handling, cache and session controls.

Key Components:

Local storage management
Session cache controls
Privacy settings
Data retention policies
No network call enforcement

Dataflow Diagram

User Gesture → Accessibility Service → Region Processor
                                              ↓
                                     Content-Type Detector
                                              ↓
                    ┌─────────────────────────┼─────────────────────────┐
                    ↓                         ↓                         ↓
              Speech Module              LLM Module              Vision Module
                    ↓                         ↓                         ↓
                    └─────────────────────────┼─────────────────────────┘
                                              ↓
                                      Dialogue Agent
                                              ↓
                                        UI/Feedback
                                              ↓
                                    User sees response
                                              ↓
                              Continue dialogue or Execute

Technology Stack

Platform: Android (minSdk 27+, targetSdk 34)
UI Framework: Jetpack Compose
Overlay System: AccessibilityService (TYPE_ACCESSIBILITY_OVERLAY)
Screen Capture: MediaProjection API
Speech-to-Text: Vosk / DeepSpeech / PocketSphinx (offline)
LLM: MLC Chat / SmolChat / Edge Gallery (on-device Llama 3, Phi-3, Gemma, Qwen)
Vision: MLKit / TFLite / ONNX
Voice Commands: Porcupine / Vosk / PocketSphinx (wake word detection)
Language: Kotlin
Build System: Gradle

Design Principles

Privacy First: All processing happens on-device. Zero network calls for AI inference.
Modular Architecture: Each component is independently testable and replaceable.
Async Operations: All ML inference is non-blocking and asynchronous.
Dependency Injection: Components are loosely coupled via DI.
Composition Over Inheritance: Favor composable functions and interfaces.
Local Data Only: All app data stays within the local app context.

Known Constraints and Considerations

Model Size: LLMs and STT models can be large; ensure device compatibility
Device Fragmentation: Overlay behavior varies across OEMs (MIUI, Samsung, etc.)
Gesture Recognition: Need robust handling to minimize false positives
Performance: On-device inference requires optimization for lower-end devices
Battery Impact: Continuous overlay and ML inference need power optimization

Security and Privacy

No Network Permissions: App does not request internet access for AI features
Local Storage: All data stored in app-private directories
No Cloud Services: Zero dependency on external APIs or cloud infrastructure
User Control: Complete user control over data retention and cache clearing
Transparency: Open-source codebase for full auditability

5.5 KiB Raw Blame History

Technical Architecture

Layered System Modules

1. Accessibility Service Layer

2. Region Processor

3. Content-Type Detector

4. Local AI Engine

Speech Module

LLM Module

Vision Module

5. Dialogue Agent

6. UI/Feedback Layer

7. Privacy/Data Management

Dataflow Diagram

Technology Stack

Design Principles

Known Constraints and Considerations

Security and Privacy

5.5 KiB

Raw Blame History