# Technical Architecture ## Layered System Modules ### 1. Accessibility Service Layer Detects gestures, draws overlays. Uses Android's `AccessibilityService` to create system-wide overlays that capture user input without interfering with underlying apps. **Key Components:** - AccessibilityService implementation - TYPE_ACCESSIBILITY_OVERLAY window management - Gesture capture and event routing ### 2. Region Processor Converts user input area into actionable bounds. Translates circle/touch gestures into screen coordinates and extracts the selected region. **Key Components:** - Gesture recognition (circle, tap, drag) - Coordinate transformation - Region boundary calculation - Screenshot/content capture via MediaProjection API ### 3. Content-Type Detector Classifies input for routing (audio/image/text). Analyzes the captured region to determine what type of content it contains and how to process it. **Key Components:** - View hierarchy analysis - OCR text detection - Audio source identification - Image/media classification - File type detection ### 4. Local AI Engine #### Speech Module - **Options:** Vosk, DeepSpeech, PocketSphinx - **Purpose:** Offline speech-to-text transcription - **Input:** Audio streams, recorded audio - **Output:** Transcribed text #### LLM Module - **Options:** MLC Chat, SmolChat, Edge Gallery (Llama 3, Phi-3, Gemma, Qwen in GGUF format) - **Purpose:** Local reasoning, summarization, explanation - **Input:** Text content, context - **Output:** AI-generated responses #### Vision Module - **Options:** MLKit, TFLite, ONNX lightweight models - **Purpose:** Image analysis, object detection, content classification - **Input:** Images, screenshots - **Output:** Classifications, descriptions, detected objects ### 5. Dialogue Agent Maintains session state, open dialogue, executes upon command. Manages conversation flow and context persistence. **Key Components:** - State machine for dialogue flow - Context/memory management - Command parsing and routing - Execution trigger handling ### 6. UI/Feedback Layer Interactive Compose overlays for user interaction and feedback display. **Key Components:** - Jetpack Compose UI components - Overlay windows - Feedback animations - Status indicators - Action buttons ### 7. Privacy/Data Management Local-only data handling, cache and session controls. **Key Components:** - Local storage management - Session cache controls - Privacy settings - Data retention policies - No network call enforcement ## Dataflow Diagram ``` User Gesture → Accessibility Service → Region Processor ↓ Content-Type Detector ↓ ┌─────────────────────────┼─────────────────────────┐ ↓ ↓ ↓ Speech Module LLM Module Vision Module ↓ ↓ ↓ └─────────────────────────┼─────────────────────────┘ ↓ Dialogue Agent ↓ UI/Feedback ↓ User sees response ↓ Continue dialogue or Execute ``` ## Technology Stack - **Platform:** Android (minSdk 27+, targetSdk 34) - **UI Framework:** Jetpack Compose - **Overlay System:** AccessibilityService (TYPE_ACCESSIBILITY_OVERLAY) - **Screen Capture:** MediaProjection API - **Speech-to-Text:** Vosk / DeepSpeech / PocketSphinx (offline) - **LLM:** MLC Chat / SmolChat / Edge Gallery (on-device Llama 3, Phi-3, Gemma, Qwen) - **Vision:** MLKit / TFLite / ONNX - **Voice Commands:** Porcupine / Vosk / PocketSphinx (wake word detection) - **Language:** Kotlin - **Build System:** Gradle ## Design Principles 1. **Privacy First:** All processing happens on-device. Zero network calls for AI inference. 2. **Modular Architecture:** Each component is independently testable and replaceable. 3. **Async Operations:** All ML inference is non-blocking and asynchronous. 4. **Dependency Injection:** Components are loosely coupled via DI. 5. **Composition Over Inheritance:** Favor composable functions and interfaces. 6. **Local Data Only:** All app data stays within the local app context. ## Known Constraints and Considerations - **Model Size:** LLMs and STT models can be large; ensure device compatibility - **Device Fragmentation:** Overlay behavior varies across OEMs (MIUI, Samsung, etc.) - **Gesture Recognition:** Need robust handling to minimize false positives - **Performance:** On-device inference requires optimization for lower-end devices - **Battery Impact:** Continuous overlay and ML inference need power optimization ## Security and Privacy - **No Network Permissions:** App does not request internet access for AI features - **Local Storage:** All data stored in app-private directories - **No Cloud Services:** Zero dependency on external APIs or cloud infrastructure - **User Control:** Complete user control over data retention and cache clearing - **Transparency:** Open-source codebase for full auditability