# Auto-Match Load Performance Analysis ## Summary Auto-Match page loads significantly slower than Identify page because it lacks the performance optimizations that Identify uses. Auto-Match always fetches all data upfront with no caching, while Identify uses sessionStorage caching and lazy loading. ## Identify Page Optimizations (Current) ### 1. **SessionStorage Caching** - **State Caching**: Caches faces, current index, similar faces, and form data in sessionStorage - **Settings Caching**: Caches filter settings (pageSize, minQuality, sortBy, etc.) - **Restoration**: On mount, restores cached state instead of making API calls - **Implementation**: - `STATE_KEY = 'identify_state'` - stores faces, currentIdx, similar, faceFormData, selectedSimilar - `SETTINGS_KEY = 'identify_settings'` - stores filter settings - Only loads fresh data if no cached state exists ### 2. **Lazy Loading** - **Similar Faces**: Only loads similar faces when: - `compareEnabled` is true - Current face changes - Not loaded during initial page load - **Images**: Uses lazy loading for similar face images (`loading="lazy"`) ### 3. **Image Preloading** - Preloads next/previous face images in background - Uses `new Image()` to preload without blocking UI - Delayed by 100ms to avoid blocking current image load ### 4. **Batch Operations** - Uses `batchSimilarity` endpoint for unique faces filtering - Single API call instead of multiple individual calls ### 5. **Progressive State Management** - Uses refs to track restoration state - Prevents unnecessary reloads during state restoration - Only triggers API calls when actually needed ## Auto-Match Page (Current - No Optimizations) ### 1. **No Caching** - **No sessionStorage**: Always makes fresh API calls on mount - **No state restoration**: Always starts from scratch - **No settings persistence**: Tolerance and other settings reset on page reload ### 2. **Eager Loading** - **All Data Upfront**: Loads ALL people and ALL matches in single API call - **No Lazy Loading**: All match data loaded even if user never views it - **No Progressive Loading**: Everything must be loaded before UI is usable ### 3. **No Image Preloading** - Images load on-demand as user navigates - No preloading of next/previous person images ### 4. **Large API Response** - Backend returns complete dataset: - All identified people - All matches for each person - All face metadata (photo info, locations, quality scores, etc.) - Response size can be very large (hundreds of KB to MB) depending on: - Number of identified people - Number of matches per person - Amount of metadata per match ### 5. **Backend Processing** The `find_auto_match_matches` function: - Queries all identified faces (one per person, quality >= 0.3) - For EACH person, calls `find_similar_faces` to find matches - This means N database queries (where N = number of people) - All processing happens synchronously before response is sent ## Performance Comparison ### Identify Page Load Flow ``` 1. Check sessionStorage for cached state 2. If cached: Restore state (instant, no API call) 3. If not cached: Load faces (paginated, ~50 faces) 4. Load similar faces only when face changes (lazy) 5. Preload next/previous images (background) ``` ### Auto-Match Page Load Flow ``` 1. Always call API (no cache check) 2. Backend processes ALL people: - Query all identified faces - For each person: query similar faces - Build complete response with all matches 3. Wait for complete response (can be large) 4. Render all data at once ``` ## Key Differences | Feature | Identify | Auto-Match | |---------|----------|------------| | **Caching** | ✅ sessionStorage | ❌ None | | **State Restoration** | ✅ Yes | ❌ No | | **Lazy Loading** | ✅ Similar faces only | ❌ All data upfront | | **Image Preloading** | ✅ Next/prev faces | ❌ None | | **Pagination** | ✅ Yes (page_size) | ❌ No (all at once) | | **Progressive Loading** | ✅ Yes | ❌ No | | **API Call Size** | Small (paginated) | Large (all data) | | **Backend Queries** | 1-2 queries | N+1 queries (N = people) | ## Why Auto-Match is Slower 1. **No Caching**: Every page load requires full API call 2. **Large Response**: All people + all matches in single response 3. **N+1 Query Problem**: Backend makes one query per person to find matches 4. **Synchronous Processing**: All processing happens before response 5. **No Lazy Loading**: All match data loaded even if never viewed ## Potential Optimizations for Auto-Match ### 1. **Add SessionStorage Caching** (High Impact) - Cache people list and matches in sessionStorage - Restore on mount instead of API call - Similar to Identify page approach ### 2. **Lazy Load Matches** (High Impact) - Load people list first - Load matches for current person only - Load matches for next person in background - Similar to how Identify loads similar faces ### 3. **Pagination** (Medium Impact) - Paginate people list (e.g., 20 people per page) - Load matches only for visible people - Reduces initial response size ### 4. **Backend Optimization** (High Impact) - Batch similarity queries instead of N+1 pattern - Use `calculate_batch_similarities` for all people at once - Cache results if tolerance hasn't changed ### 5. **Image Preloading** (Low Impact) - Preload reference face images for next/previous people - Preload match images for current person ### 6. **Progressive Rendering** (Medium Impact) - Show people list immediately - Load matches progressively as user navigates - Show loading indicators for matches ## Code Locations ### Identify Page - **Frontend**: `frontend/src/pages/Identify.tsx` - Lines 42-45: SessionStorage keys - Lines 272-347: State restoration logic - Lines 349-399: State saving logic - Lines 496-527: Image preloading - Lines 258-270: Lazy loading of similar faces ### Auto-Match Page - **Frontend**: `frontend/src/pages/AutoMatch.tsx` - Lines 35-71: `loadAutoMatch` function (always calls API) - Lines 74-77: Auto-load on mount (no cache check) ### Backend - **API Endpoint**: `src/web/api/faces.py` (lines 539-702) - **Service Function**: `src/web/services/face_service.py` (lines 1736-1846) - `find_auto_match_matches`: Processes all people synchronously ## Recommendations 1. **Immediate**: Add sessionStorage caching (similar to Identify) 2. **High Priority**: Implement lazy loading of matches 3. **Medium Priority**: Optimize backend to use batch queries 4. **Low Priority**: Add image preloading The biggest win would be adding sessionStorage caching, which would make subsequent page loads instant (like Identify).