tanyar09 e74ade9278 feat: Add pose mode analysis and face width detection for improved profile classification

This commit introduces a comprehensive analysis of pose modes and face width detection to enhance profile classification accuracy. New scripts have been added to analyze pose data in the database, check identified faces for pose information, and validate yaw angles. The PoseDetector class has been updated to calculate face width from landmarks, which serves as an additional indicator for profile detection. The frontend and API have been modified to include pose mode in responses, ensuring better integration with existing functionalities. Documentation has been updated to reflect these changes, improving user experience and accuracy in face processing.

2025-11-06 13:26:25 -05:00

14 KiB

Raw Blame History

6DRepNet Integration Analysis

Date: 2025-01-XX
Status: Analysis Only (No Code Changes)
Purpose: Evaluate feasibility of integrating 6DRepNet for direct yaw/pitch/roll estimation

Executive Summary

6DRepNet is technically feasible to implement as an alternative or enhancement to the current RetinaFace-based landmark pose estimation. The integration would provide more accurate direct pose estimation but requires PyTorch dependency and architectural adjustments.

Key Findings:

✅ Technically Feasible: 6DRepNet is available as a PyPI package (sixdrepnet)
⚠️ Dependency Conflict: Requires PyTorch (currently using TensorFlow via DeepFace)
✅ Interface Compatible: Can work with existing OpenCV/CV2 image processing
📊 Accuracy Improvement: Direct estimation vs. geometric calculation from landmarks
🔄 Architectural Impact: Requires abstraction layer to support both methods

Current Implementation Analysis

Current Pose Detection Architecture

Location: src/utils/pose_detection.py

Current Method:

Uses RetinaFace to detect faces and extract facial landmarks
Calculates yaw, pitch, roll geometrically from landmark positions:
- Yaw: Calculated from nose position relative to eye midpoint
- Pitch: Calculated from nose position relative to expected vertical position
- Roll: Calculated from eye line angle
Uses face width (eye distance) as additional indicator for profile detection
Classifies pose mode from angles using thresholds

Key Characteristics:

✅ No additional ML model dependencies (uses RetinaFace landmarks)
✅ Lightweight (geometric calculations only)
⚠️ Accuracy depends on landmark quality and geometric assumptions
⚠️ May have limitations with extreme poses or low-quality images

Integration Points:

FaceProcessor.__init__(): Initializes PoseDetector with graceful fallback
process_faces(): Calls pose_detector.detect_pose_faces(img_path)
face_service.py: Uses shared PoseDetector instance for batch processing
Returns: {'yaw_angle', 'pitch_angle', 'roll_angle', 'pose_mode', ...}

6DRepNet Overview

What is 6DRepNet?

6DRepNet is a PyTorch-based deep learning model designed for direct head pose estimation using a continuous 6D rotation matrix representation. It addresses ambiguities in rotation labels and enables robust full-range head pose predictions.

Key Features:

Direct estimation of yaw, pitch, roll angles
Full 360° range support
Competitive accuracy (MAE ~2.66° on BIWI dataset)
Available as easy-to-use Python package

Technical Specifications

Package: sixdrepnet (PyPI)
Framework: PyTorch
Input: Image (OpenCV format, numpy array, or PIL Image)
Output: (pitch, yaw, roll) angles in degrees
Model Size: ~50-100MB (weights downloaded automatically)
Dependencies:

PyTorch (CPU or CUDA)
OpenCV (already in requirements)
NumPy (already in requirements)

Usage Example

from sixdrepnet import SixDRepNet
import cv2

# Initialize (weights downloaded automatically)
model = SixDRepNet()

# Load image
img = cv2.imread('/path/to/image.jpg')

# Predict pose (returns pitch, yaw, roll)
pitch, yaw, roll = model.predict(img)

# Optional: visualize results
model.draw_axis(img, yaw, pitch, roll)

Integration Feasibility Analysis

✅ Advantages

Higher Accuracy
- Direct ML-based estimation vs. geometric calculations
- Trained on diverse datasets, better generalization
- Handles extreme poses better than geometric methods
Full Range Support
- Supports full 360° rotation (current method may struggle with extreme angles)
- Better profile detection accuracy
Simpler Integration
- Single method call: model.predict(img) returns angles directly
- No need to match landmarks to faces or calculate from geometry
- Can work with face crops directly (no need for full landmarks)
Consistent Interface
- Returns same format: (pitch, yaw, roll) in degrees
- Can drop-in replace current PoseDetector class methods

⚠️ Challenges

Dependency Conflict
- Current Stack: TensorFlow (via DeepFace)
- 6DRepNet Requires: PyTorch
- Impact: Both frameworks can coexist but increase memory footprint
Face Detection Dependency
- 6DRepNet requires face crops as input (not full images)
- Current flow: RetinaFace → landmarks → geometric calculation
- New flow: RetinaFace → face crop → 6DRepNet → angles
- Still need RetinaFace for face detection/bounding boxes
Initialization Overhead
- Model loading time on first use (~1-2 seconds)
- Model weights download (~50-100MB) on first initialization
- GPU memory usage if CUDA available (optional but faster)
Processing Speed
- Current: Geometric calculations (very fast, <1ms per face)
- 6DRepNet: Neural network inference (~10-50ms per face on CPU, ~5-10ms on GPU)
- Impact on batch processing: ~10-50x slower per face
Memory Footprint
- PyTorch + model weights: ~200-500MB additional memory
- Model kept in memory for batch processing (good for performance)

Architecture Compatibility

Current Architecture

┌─────────────────────────────────────────┐
│  FaceProcessor                          │
│  ┌───────────────────────────────────┐  │
│  │  PoseDetector (RetinaFace)       │  │
│  │  - detect_pose_faces(img_path)   │  │
│  │  - Returns: yaw, pitch, roll      │  │
│  └───────────────────────────────────┘  │
│                                         │
│  DeepFace (TensorFlow)                  │
│  - Face detection + encoding            │
└─────────────────────────────────────────┘

Proposed Architecture (6DRepNet)

┌─────────────────────────────────────────┐
│  FaceProcessor                          │
│  ┌───────────────────────────────────┐  │
│  │  PoseDetector (6DRepNet)         │  │
│  │  - Requires: face crop (from      │  │
│  │              RetinaFace/DeepFace)  │  │
│  │  - model.predict(face_crop)       │  │
│  │  - Returns: yaw, pitch, roll       │  │
│  └───────────────────────────────────┘  │
│                                         │
│  DeepFace (TensorFlow)                  │
│  - Face detection + encoding            │
│                                         │
│  RetinaFace (still needed)              │
│  - Face detection + bounding boxes      │
└─────────────────────────────────────────┘

Integration Strategy Options

Option 1: Replace Current Method

Remove geometric calculations
Use 6DRepNet exclusively
Pros: Simpler, one method only
Cons: Loses lightweight fallback option

Option 2: Hybrid Approach (Recommended)

Support both methods via configuration
Use 6DRepNet when available, fallback to geometric
Pros: Backward compatible, graceful degradation
Cons: More complex code

Option 3: Parallel Execution

Run both methods and compare/validate
Pros: Best of both worlds, validation
Cons: 2x processing time

Implementation Requirements

1. Dependencies

Add to requirements.txt:

# 6DRepNet for direct pose estimation
sixdrepnet>=1.0.0
torch>=2.0.0  # PyTorch (CPU version)
# OR
# torch>=2.0.0+cu118  # PyTorch with CUDA support (if GPU available)

Note: PyTorch installation depends on system:

CPU-only: pip install torch (smaller, ~150MB)
CUDA-enabled: pip install torch --index-url https://download.pytorch.org/whl/cu118 (larger, ~1GB)

2. Code Changes Required

File: src/utils/pose_detection.py

New Class: SixDRepNetPoseDetector

class SixDRepNetPoseDetector:
    """Pose detector using 6DRepNet for direct angle estimation"""
    
    def __init__(self):
        from sixdrepnet import SixDRepNet
        self.model = SixDRepNet()
    
    def predict_pose(self, face_crop_img) -> Tuple[float, float, float]:
        """Predict yaw, pitch, roll from face crop"""
        pitch, yaw, roll = self.model.predict(face_crop_img)
        return yaw, pitch, roll  # Match current interface (yaw, pitch, roll)

Integration Points:

Modify PoseDetector.detect_pose_faces() to optionally use 6DRepNet
Extract face crops from RetinaFace bounding boxes
Pass crops to 6DRepNet for prediction
Return same format as current method

Key Challenge: Need face crops, not just landmarks

Current: Uses landmarks from RetinaFace
6DRepNet: Needs image crops (can extract from same RetinaFace detection)

3. Configuration Changes

File: src/core/config.py

Add configuration option:

# Pose detection method: 'geometric' (current) or '6drepnet' (ML-based)
POSE_DETECTION_METHOD = 'geometric'  # or '6drepnet'

Performance Comparison

Current Method (Geometric)

Speed:

~0.1-1ms per face (geometric calculations only)
No model loading overhead

Accuracy:

Good for frontal and moderate poses
May struggle with extreme angles or profile views
Depends on landmark quality

Memory:

Minimal (~10-50MB for RetinaFace only)

6DRepNet Method

Speed:

CPU: ~10-50ms per face (neural network inference)
GPU: ~5-10ms per face (with CUDA)
Initial model load: ~1-2 seconds (one-time)

Accuracy:

Higher accuracy across all pose ranges
Better generalization from training data
More robust to image quality variations

Memory:

Model weights: ~50-100MB
PyTorch runtime: ~200-500MB
Total: ~250-600MB additional

Batch Processing Impact

Example: Processing 1000 photos with 3 faces each = 3000 faces

Current Method:

Time: ~300-3000ms (0.3-3 seconds)
Very fast, minimal impact

6DRepNet (CPU):

Time: ~30-150 seconds (0.5-2.5 minutes)
Significant slowdown but acceptable for batch jobs

6DRepNet (GPU):

Time: ~15-30 seconds
Much faster with GPU acceleration

Recommendations

✅ Recommended Approach: Hybrid Implementation

Phase 1: Add 6DRepNet as Optional Enhancement

Keep current geometric method as default
Add 6DRepNet as optional alternative
Use configuration flag to enable: POSE_DETECTION_METHOD = '6drepnet'
Graceful fallback if 6DRepNet unavailable

Phase 2: Performance Tuning

Implement GPU acceleration if available
Batch processing optimizations
Cache model instance across batch operations

Phase 3: Evaluation

Compare accuracy on real dataset
Measure performance impact
Decide on default method based on results

⚠️ Considerations

Dependency Management:
- PyTorch + TensorFlow coexistence is possible but increases requirements
- Consider making 6DRepNet optional (extra dependency group)
Face Crop Extraction:
- Need to extract face crops from images
- Can use RetinaFace bounding boxes (already available)
- Or use DeepFace detection results
Backward Compatibility:
- Keep current method available
- Database schema unchanged (same fields: yaw_angle, pitch_angle, roll_angle)
- API interface unchanged
GPU Support:
- Optional but recommended for performance
- Can detect CUDA availability automatically
- Falls back to CPU if GPU unavailable

Implementation Complexity Assessment

Complexity: Medium

Factors:

✅ Interface is compatible (same output format)
✅ Existing architecture supports abstraction
⚠️ Requires face crop extraction (not just landmarks)
⚠️ PyTorch dependency adds complexity
⚠️ Performance considerations for batch processing

Estimated Effort:

Initial Implementation: 2-4 hours
Testing & Validation: 2-3 hours
Documentation: 1 hour
Total: ~5-8 hours

Conclusion

6DRepNet is technically feasible and recommended for integration as an optional enhancement to the current geometric pose estimation method. The hybrid approach provides:

Backward Compatibility: Current method remains default
Improved Accuracy: Better pose estimation, especially for extreme angles
Flexibility: Users can choose method based on accuracy vs. speed tradeoff
Future-Proof: ML-based approach can be improved with model updates

Next Steps (if proceeding):

Add sixdrepnet and torch to requirements (optional dependency group)
Implement SixDRepNetPoseDetector class
Modify PoseDetector to support both methods
Add configuration option
Test on sample dataset
Measure performance impact
Update documentation

References

6DRepNet Paper: 6D Rotation Representation For Unconstrained Head Pose Estimation
PyPI Package: sixdrepnet
PyTorch Installation: https://pytorch.org/get-started/locally/
Current Implementation: src/utils/pose_detection.py

14 KiB Raw Blame History