crkl/ARCHITECTURE.md

# Technical Architecture

## Layered System Modules

### 1. Accessibility Service Layer
Detects gestures, draws overlays. Uses Android's `AccessibilityService` to create system-wide overlays that capture user input without interfering with underlying apps.

**Key Components:**
- AccessibilityService implementation
- TYPE_ACCESSIBILITY_OVERLAY window management
- Gesture capture and event routing

### 2. Region Processor
Converts user input area into actionable bounds. Translates circle/touch gestures into screen coordinates and extracts the selected region.

**Key Components:**
- Gesture recognition (circle, tap, drag)
- Coordinate transformation
- Region boundary calculation
- Screenshot/content capture via MediaProjection API

### 3. Content-Type Detector
Classifies input for routing (audio/image/text). Analyzes the captured region to determine what type of content it contains and how to process it.

**Key Components:**
- View hierarchy analysis
- OCR text detection
- Audio source identification
- Image/media classification
- File type detection

### 4. Local AI Engine

#### Speech Module
- **Options:** Vosk, DeepSpeech, PocketSphinx
- **Purpose:** Offline speech-to-text transcription
- **Input:** Audio streams, recorded audio
- **Output:** Transcribed text

#### LLM Module
- **Options:** MLC Chat, SmolChat, Edge Gallery (Llama 3, Phi-3, Gemma, Qwen in GGUF format)
- **Purpose:** Local reasoning, summarization, explanation
- **Input:** Text content, context
- **Output:** AI-generated responses

#### Vision Module
- **Options:** MLKit, TFLite, ONNX lightweight models
- **Purpose:** Image analysis, object detection, content classification
- **Input:** Images, screenshots
- **Output:** Classifications, descriptions, detected objects

### 5. Dialogue Agent
Maintains session state, open dialogue, executes upon command. Manages conversation flow and context persistence.

**Key Components:**
- State machine for dialogue flow
- Context/memory management
- Command parsing and routing
- Execution trigger handling

### 6. UI/Feedback Layer
Interactive Compose overlays for user interaction and feedback display.

**Key Components:**
- Jetpack Compose UI components
- Overlay windows
- Feedback animations
- Status indicators
- Action buttons

### 7. Privacy/Data Management
Local-only data handling, cache and session controls.

**Key Components:**
- Local storage management
- Session cache controls
- Privacy settings
- Data retention policies
- No network call enforcement

## Dataflow Diagram

```
User Gesture → Accessibility Service → Region Processor
                                              ↓
                                     Content-Type Detector
                                              ↓
                    ┌─────────────────────────┼─────────────────────────┐
                    ↓                         ↓                         ↓
              Speech Module              LLM Module              Vision Module
                    ↓                         ↓                         ↓
                    └─────────────────────────┼─────────────────────────┘
                                              ↓
                                      Dialogue Agent
                                              ↓
                                        UI/Feedback
                                              ↓
                                    User sees response
                                              ↓
                              Continue dialogue or Execute
```

## Technology Stack

- **Platform:** Android (minSdk 27+, targetSdk 34)
- **UI Framework:** Jetpack Compose
- **Overlay System:** AccessibilityService (TYPE_ACCESSIBILITY_OVERLAY)
- **Screen Capture:** MediaProjection API
- **Speech-to-Text:** Vosk / DeepSpeech / PocketSphinx (offline)
- **LLM:** MLC Chat / SmolChat / Edge Gallery (on-device Llama 3, Phi-3, Gemma, Qwen)
- **Vision:** MLKit / TFLite / ONNX
- **Voice Commands:** Porcupine / Vosk / PocketSphinx (wake word detection)
- **Language:** Kotlin
- **Build System:** Gradle

## Design Principles

1. **Privacy First:** All processing happens on-device. Zero network calls for AI inference.
2. **Modular Architecture:** Each component is independently testable and replaceable.
3. **Async Operations:** All ML inference is non-blocking and asynchronous.
4. **Dependency Injection:** Components are loosely coupled via DI.
5. **Composition Over Inheritance:** Favor composable functions and interfaces.
6. **Local Data Only:** All app data stays within the local app context.

## Known Constraints and Considerations

- **Model Size:** LLMs and STT models can be large; ensure device compatibility
- **Device Fragmentation:** Overlay behavior varies across OEMs (MIUI, Samsung, etc.)
- **Gesture Recognition:** Need robust handling to minimize false positives
- **Performance:** On-device inference requires optimization for lower-end devices
- **Battery Impact:** Continuous overlay and ML inference need power optimization

## Security and Privacy

- **No Network Permissions:** App does not request internet access for AI features
- **Local Storage:** All data stored in app-private directories
- **No Cloud Services:** Zero dependency on external APIs or cloud infrastructure
- **User Control:** Complete user control over data retention and cache clearing
- **Transparency:** Open-source codebase for full auditability