This commit introduces several new analysis documents, including Auto-Match Load Performance Analysis, Folder Picker Analysis, Monorepo Migration Summary, and various performance analysis documents. Additionally, the installation scripts are updated to reflect changes in backend service paths, ensuring proper integration with the new backend structure. These enhancements provide better documentation and streamline the setup process for users.
6.5 KiB
6.5 KiB
Auto-Match Load Performance Analysis
Summary
Auto-Match page loads significantly slower than Identify page because it lacks the performance optimizations that Identify uses. Auto-Match always fetches all data upfront with no caching, while Identify uses sessionStorage caching and lazy loading.
Identify Page Optimizations (Current)
1. SessionStorage Caching
- State Caching: Caches faces, current index, similar faces, and form data in sessionStorage
- Settings Caching: Caches filter settings (pageSize, minQuality, sortBy, etc.)
- Restoration: On mount, restores cached state instead of making API calls
- Implementation:
STATE_KEY = 'identify_state'- stores faces, currentIdx, similar, faceFormData, selectedSimilarSETTINGS_KEY = 'identify_settings'- stores filter settings- Only loads fresh data if no cached state exists
2. Lazy Loading
- Similar Faces: Only loads similar faces when:
compareEnabledis true- Current face changes
- Not loaded during initial page load
- Images: Uses lazy loading for similar face images (
loading="lazy")
3. Image Preloading
- Preloads next/previous face images in background
- Uses
new Image()to preload without blocking UI - Delayed by 100ms to avoid blocking current image load
4. Batch Operations
- Uses
batchSimilarityendpoint for unique faces filtering - Single API call instead of multiple individual calls
5. Progressive State Management
- Uses refs to track restoration state
- Prevents unnecessary reloads during state restoration
- Only triggers API calls when actually needed
Auto-Match Page (Current - No Optimizations)
1. No Caching
- No sessionStorage: Always makes fresh API calls on mount
- No state restoration: Always starts from scratch
- No settings persistence: Tolerance and other settings reset on page reload
2. Eager Loading
- All Data Upfront: Loads ALL people and ALL matches in single API call
- No Lazy Loading: All match data loaded even if user never views it
- No Progressive Loading: Everything must be loaded before UI is usable
3. No Image Preloading
- Images load on-demand as user navigates
- No preloading of next/previous person images
4. Large API Response
- Backend returns complete dataset:
- All identified people
- All matches for each person
- All face metadata (photo info, locations, quality scores, etc.)
- Response size can be very large (hundreds of KB to MB) depending on:
- Number of identified people
- Number of matches per person
- Amount of metadata per match
5. Backend Processing
The find_auto_match_matches function:
- Queries all identified faces (one per person, quality >= 0.3)
- For EACH person, calls
find_similar_facesto find matches - This means N database queries (where N = number of people)
- All processing happens synchronously before response is sent
Performance Comparison
Identify Page Load Flow
1. Check sessionStorage for cached state
2. If cached: Restore state (instant, no API call)
3. If not cached: Load faces (paginated, ~50 faces)
4. Load similar faces only when face changes (lazy)
5. Preload next/previous images (background)
Auto-Match Page Load Flow
1. Always call API (no cache check)
2. Backend processes ALL people:
- Query all identified faces
- For each person: query similar faces
- Build complete response with all matches
3. Wait for complete response (can be large)
4. Render all data at once
Key Differences
| Feature | Identify | Auto-Match |
|---|---|---|
| Caching | ✅ sessionStorage | ❌ None |
| State Restoration | ✅ Yes | ❌ No |
| Lazy Loading | ✅ Similar faces only | ❌ All data upfront |
| Image Preloading | ✅ Next/prev faces | ❌ None |
| Pagination | ✅ Yes (page_size) | ❌ No (all at once) |
| Progressive Loading | ✅ Yes | ❌ No |
| API Call Size | Small (paginated) | Large (all data) |
| Backend Queries | 1-2 queries | N+1 queries (N = people) |
Why Auto-Match is Slower
- No Caching: Every page load requires full API call
- Large Response: All people + all matches in single response
- N+1 Query Problem: Backend makes one query per person to find matches
- Synchronous Processing: All processing happens before response
- No Lazy Loading: All match data loaded even if never viewed
Potential Optimizations for Auto-Match
1. Add SessionStorage Caching (High Impact)
- Cache people list and matches in sessionStorage
- Restore on mount instead of API call
- Similar to Identify page approach
2. Lazy Load Matches (High Impact)
- Load people list first
- Load matches for current person only
- Load matches for next person in background
- Similar to how Identify loads similar faces
3. Pagination (Medium Impact)
- Paginate people list (e.g., 20 people per page)
- Load matches only for visible people
- Reduces initial response size
4. Backend Optimization (High Impact)
- Batch similarity queries instead of N+1 pattern
- Use
calculate_batch_similaritiesfor all people at once - Cache results if tolerance hasn't changed
5. Image Preloading (Low Impact)
- Preload reference face images for next/previous people
- Preload match images for current person
6. Progressive Rendering (Medium Impact)
- Show people list immediately
- Load matches progressively as user navigates
- Show loading indicators for matches
Code Locations
Identify Page
- Frontend:
frontend/src/pages/Identify.tsx- Lines 42-45: SessionStorage keys
- Lines 272-347: State restoration logic
- Lines 349-399: State saving logic
- Lines 496-527: Image preloading
- Lines 258-270: Lazy loading of similar faces
Auto-Match Page
- Frontend:
frontend/src/pages/AutoMatch.tsx- Lines 35-71:
loadAutoMatchfunction (always calls API) - Lines 74-77: Auto-load on mount (no cache check)
- Lines 35-71:
Backend
- API Endpoint:
src/web/api/faces.py(lines 539-702) - Service Function:
src/web/services/face_service.py(lines 1736-1846)find_auto_match_matches: Processes all people synchronously
Recommendations
- Immediate: Add sessionStorage caching (similar to Identify)
- High Priority: Implement lazy loading of matches
- Medium Priority: Optimize backend to use batch queries
- Low Priority: Add image preloading
The biggest win would be adding sessionStorage caching, which would make subsequent page loads instant (like Identify).