Tanya 68d280e8f5 feat: Add new analysis documents and update installation scripts for backend integration

This commit introduces several new analysis documents, including Auto-Match Load Performance Analysis, Folder Picker Analysis, Monorepo Migration Summary, and various performance analysis documents. Additionally, the installation scripts are updated to reflect changes in backend service paths, ensuring proper integration with the new backend structure. These enhancements provide better documentation and streamline the setup process for users.

2025-12-30 15:04:32 -05:00

6.5 KiB

Raw Blame History

Auto-Match Load Performance Analysis

Summary

Auto-Match page loads significantly slower than Identify page because it lacks the performance optimizations that Identify uses. Auto-Match always fetches all data upfront with no caching, while Identify uses sessionStorage caching and lazy loading.

Identify Page Optimizations (Current)

1. SessionStorage Caching

State Caching: Caches faces, current index, similar faces, and form data in sessionStorage
Settings Caching: Caches filter settings (pageSize, minQuality, sortBy, etc.)
Restoration: On mount, restores cached state instead of making API calls
Implementation:
- STATE_KEY = 'identify_state' - stores faces, currentIdx, similar, faceFormData, selectedSimilar
- SETTINGS_KEY = 'identify_settings' - stores filter settings
- Only loads fresh data if no cached state exists

2. Lazy Loading

Similar Faces: Only loads similar faces when:
- compareEnabled is true
- Current face changes
- Not loaded during initial page load
Images: Uses lazy loading for similar face images (loading="lazy")

3. Image Preloading

Preloads next/previous face images in background
Uses new Image() to preload without blocking UI
Delayed by 100ms to avoid blocking current image load

4. Batch Operations

Uses batchSimilarity endpoint for unique faces filtering
Single API call instead of multiple individual calls

5. Progressive State Management

Uses refs to track restoration state
Prevents unnecessary reloads during state restoration
Only triggers API calls when actually needed

Auto-Match Page (Current - No Optimizations)

1. No Caching

No sessionStorage: Always makes fresh API calls on mount
No state restoration: Always starts from scratch
No settings persistence: Tolerance and other settings reset on page reload

2. Eager Loading

All Data Upfront: Loads ALL people and ALL matches in single API call
No Lazy Loading: All match data loaded even if user never views it
No Progressive Loading: Everything must be loaded before UI is usable

3. No Image Preloading

Images load on-demand as user navigates
No preloading of next/previous person images

4. Large API Response

Backend returns complete dataset:
- All identified people
- All matches for each person
- All face metadata (photo info, locations, quality scores, etc.)
Response size can be very large (hundreds of KB to MB) depending on:
- Number of identified people
- Number of matches per person
- Amount of metadata per match

5. Backend Processing

The find_auto_match_matches function:

Queries all identified faces (one per person, quality >= 0.3)
For EACH person, calls find_similar_faces to find matches
This means N database queries (where N = number of people)
All processing happens synchronously before response is sent

Performance Comparison

Identify Page Load Flow

1. Check sessionStorage for cached state
2. If cached: Restore state (instant, no API call)
3. If not cached: Load faces (paginated, ~50 faces)
4. Load similar faces only when face changes (lazy)
5. Preload next/previous images (background)

Auto-Match Page Load Flow

1. Always call API (no cache check)
2. Backend processes ALL people:
   - Query all identified faces
   - For each person: query similar faces
   - Build complete response with all matches
3. Wait for complete response (can be large)
4. Render all data at once

Key Differences

Feature	Identify	Auto-Match
Caching	✅ sessionStorage	❌ None
State Restoration	✅ Yes	❌ No
Lazy Loading	✅ Similar faces only	❌ All data upfront
Image Preloading	✅ Next/prev faces	❌ None
Pagination	✅ Yes (page_size)	❌ No (all at once)
Progressive Loading	✅ Yes	❌ No
API Call Size	Small (paginated)	Large (all data)
Backend Queries	1-2 queries	N+1 queries (N = people)

Why Auto-Match is Slower

No Caching: Every page load requires full API call
Large Response: All people + all matches in single response
N+1 Query Problem: Backend makes one query per person to find matches
Synchronous Processing: All processing happens before response
No Lazy Loading: All match data loaded even if never viewed

Potential Optimizations for Auto-Match

1. Add SessionStorage Caching (High Impact)

Cache people list and matches in sessionStorage
Restore on mount instead of API call
Similar to Identify page approach

2. Lazy Load Matches (High Impact)

Load people list first
Load matches for current person only
Load matches for next person in background
Similar to how Identify loads similar faces

3. Pagination (Medium Impact)

Paginate people list (e.g., 20 people per page)
Load matches only for visible people
Reduces initial response size

4. Backend Optimization (High Impact)

Batch similarity queries instead of N+1 pattern
Use calculate_batch_similarities for all people at once
Cache results if tolerance hasn't changed

5. Image Preloading (Low Impact)

Preload reference face images for next/previous people
Preload match images for current person

6. Progressive Rendering (Medium Impact)

Show people list immediately
Load matches progressively as user navigates
Show loading indicators for matches

Code Locations

Identify Page

Frontend: frontend/src/pages/Identify.tsx
- Lines 42-45: SessionStorage keys
- Lines 272-347: State restoration logic
- Lines 349-399: State saving logic
- Lines 496-527: Image preloading
- Lines 258-270: Lazy loading of similar faces

Auto-Match Page

Frontend: frontend/src/pages/AutoMatch.tsx
- Lines 35-71: loadAutoMatch function (always calls API)
- Lines 74-77: Auto-load on mount (no cache check)

Backend

API Endpoint: src/web/api/faces.py (lines 539-702)
Service Function: src/web/services/face_service.py (lines 1736-1846)
- find_auto_match_matches: Processes all people synchronously

Recommendations

Immediate: Add sessionStorage caching (similar to Identify)
High Priority: Implement lazy loading of matches
Medium Priority: Optimize backend to use batch queries
Low Priority: Add image preloading

The biggest win would be adding sessionStorage caching, which would make subsequent page loads instant (like Identify).

6.5 KiB Raw Blame History