punimtag/docs/6DREPNET_ANALYSIS.md
tanyar09 e74ade9278 feat: Add pose mode analysis and face width detection for improved profile classification
This commit introduces a comprehensive analysis of pose modes and face width detection to enhance profile classification accuracy. New scripts have been added to analyze pose data in the database, check identified faces for pose information, and validate yaw angles. The PoseDetector class has been updated to calculate face width from landmarks, which serves as an additional indicator for profile detection. The frontend and API have been modified to include pose mode in responses, ensuring better integration with existing functionalities. Documentation has been updated to reflect these changes, improving user experience and accuracy in face processing.
2025-11-06 13:26:25 -05:00

14 KiB

6DRepNet Integration Analysis

Date: 2025-01-XX
Status: Analysis Only (No Code Changes)
Purpose: Evaluate feasibility of integrating 6DRepNet for direct yaw/pitch/roll estimation


Executive Summary

6DRepNet is technically feasible to implement as an alternative or enhancement to the current RetinaFace-based landmark pose estimation. The integration would provide more accurate direct pose estimation but requires PyTorch dependency and architectural adjustments.

Key Findings:

  • Technically Feasible: 6DRepNet is available as a PyPI package (sixdrepnet)
  • ⚠️ Dependency Conflict: Requires PyTorch (currently using TensorFlow via DeepFace)
  • Interface Compatible: Can work with existing OpenCV/CV2 image processing
  • 📊 Accuracy Improvement: Direct estimation vs. geometric calculation from landmarks
  • 🔄 Architectural Impact: Requires abstraction layer to support both methods

Current Implementation Analysis

Current Pose Detection Architecture

Location: src/utils/pose_detection.py

Current Method:

  1. Uses RetinaFace to detect faces and extract facial landmarks
  2. Calculates yaw, pitch, roll geometrically from landmark positions:
    • Yaw: Calculated from nose position relative to eye midpoint
    • Pitch: Calculated from nose position relative to expected vertical position
    • Roll: Calculated from eye line angle
  3. Uses face width (eye distance) as additional indicator for profile detection
  4. Classifies pose mode from angles using thresholds

Key Characteristics:

  • No additional ML model dependencies (uses RetinaFace landmarks)
  • Lightweight (geometric calculations only)
  • ⚠️ Accuracy depends on landmark quality and geometric assumptions
  • ⚠️ May have limitations with extreme poses or low-quality images

Integration Points:

  • FaceProcessor.__init__(): Initializes PoseDetector with graceful fallback
  • process_faces(): Calls pose_detector.detect_pose_faces(img_path)
  • face_service.py: Uses shared PoseDetector instance for batch processing
  • Returns: {'yaw_angle', 'pitch_angle', 'roll_angle', 'pose_mode', ...}

6DRepNet Overview

What is 6DRepNet?

6DRepNet is a PyTorch-based deep learning model designed for direct head pose estimation using a continuous 6D rotation matrix representation. It addresses ambiguities in rotation labels and enables robust full-range head pose predictions.

Key Features:

  • Direct estimation of yaw, pitch, roll angles
  • Full 360° range support
  • Competitive accuracy (MAE ~2.66° on BIWI dataset)
  • Available as easy-to-use Python package

Technical Specifications

Package: sixdrepnet (PyPI)
Framework: PyTorch
Input: Image (OpenCV format, numpy array, or PIL Image)
Output: (pitch, yaw, roll) angles in degrees
Model Size: ~50-100MB (weights downloaded automatically)
Dependencies:

  • PyTorch (CPU or CUDA)
  • OpenCV (already in requirements)
  • NumPy (already in requirements)

Usage Example

from sixdrepnet import SixDRepNet
import cv2

# Initialize (weights downloaded automatically)
model = SixDRepNet()

# Load image
img = cv2.imread('/path/to/image.jpg')

# Predict pose (returns pitch, yaw, roll)
pitch, yaw, roll = model.predict(img)

# Optional: visualize results
model.draw_axis(img, yaw, pitch, roll)

Integration Feasibility Analysis

Advantages

  1. Higher Accuracy

    • Direct ML-based estimation vs. geometric calculations
    • Trained on diverse datasets, better generalization
    • Handles extreme poses better than geometric methods
  2. Full Range Support

    • Supports full 360° rotation (current method may struggle with extreme angles)
    • Better profile detection accuracy
  3. Simpler Integration

    • Single method call: model.predict(img) returns angles directly
    • No need to match landmarks to faces or calculate from geometry
    • Can work with face crops directly (no need for full landmarks)
  4. Consistent Interface

    • Returns same format: (pitch, yaw, roll) in degrees
    • Can drop-in replace current PoseDetector class methods

⚠️ Challenges

  1. Dependency Conflict

    • Current Stack: TensorFlow (via DeepFace)
    • 6DRepNet Requires: PyTorch
    • Impact: Both frameworks can coexist but increase memory footprint
  2. Face Detection Dependency

    • 6DRepNet requires face crops as input (not full images)
    • Current flow: RetinaFace → landmarks → geometric calculation
    • New flow: RetinaFace → face crop → 6DRepNet → angles
    • Still need RetinaFace for face detection/bounding boxes
  3. Initialization Overhead

    • Model loading time on first use (~1-2 seconds)
    • Model weights download (~50-100MB) on first initialization
    • GPU memory usage if CUDA available (optional but faster)
  4. Processing Speed

    • Current: Geometric calculations (very fast, <1ms per face)
    • 6DRepNet: Neural network inference (~10-50ms per face on CPU, ~5-10ms on GPU)
    • Impact on batch processing: ~10-50x slower per face
  5. Memory Footprint

    • PyTorch + model weights: ~200-500MB additional memory
    • Model kept in memory for batch processing (good for performance)

Architecture Compatibility

Current Architecture

┌─────────────────────────────────────────┐
│  FaceProcessor                          │
│  ┌───────────────────────────────────┐  │
│  │  PoseDetector (RetinaFace)       │  │
│  │  - detect_pose_faces(img_path)   │  │
│  │  - Returns: yaw, pitch, roll      │  │
│  └───────────────────────────────────┘  │
│                                         │
│  DeepFace (TensorFlow)                  │
│  - Face detection + encoding            │
└─────────────────────────────────────────┘

Proposed Architecture (6DRepNet)

┌─────────────────────────────────────────┐
│  FaceProcessor                          │
│  ┌───────────────────────────────────┐  │
│  │  PoseDetector (6DRepNet)         │  │
│  │  - Requires: face crop (from      │  │
│  │              RetinaFace/DeepFace)  │  │
│  │  - model.predict(face_crop)       │  │
│  │  - Returns: yaw, pitch, roll       │  │
│  └───────────────────────────────────┘  │
│                                         │
│  DeepFace (TensorFlow)                  │
│  - Face detection + encoding            │
│                                         │
│  RetinaFace (still needed)              │
│  - Face detection + bounding boxes      │
└─────────────────────────────────────────┘

Integration Strategy Options

Option 1: Replace Current Method

  • Remove geometric calculations
  • Use 6DRepNet exclusively
  • Pros: Simpler, one method only
  • Cons: Loses lightweight fallback option

Option 2: Hybrid Approach (Recommended)

  • Support both methods via configuration
  • Use 6DRepNet when available, fallback to geometric
  • Pros: Backward compatible, graceful degradation
  • Cons: More complex code

Option 3: Parallel Execution

  • Run both methods and compare/validate
  • Pros: Best of both worlds, validation
  • Cons: 2x processing time

Implementation Requirements

1. Dependencies

Add to requirements.txt:

# 6DRepNet for direct pose estimation
sixdrepnet>=1.0.0
torch>=2.0.0  # PyTorch (CPU version)
# OR
# torch>=2.0.0+cu118  # PyTorch with CUDA support (if GPU available)

Note: PyTorch installation depends on system:

  • CPU-only: pip install torch (smaller, ~150MB)
  • CUDA-enabled: pip install torch --index-url https://download.pytorch.org/whl/cu118 (larger, ~1GB)

2. Code Changes Required

File: src/utils/pose_detection.py

New Class: SixDRepNetPoseDetector

class SixDRepNetPoseDetector:
    """Pose detector using 6DRepNet for direct angle estimation"""
    
    def __init__(self):
        from sixdrepnet import SixDRepNet
        self.model = SixDRepNet()
    
    def predict_pose(self, face_crop_img) -> Tuple[float, float, float]:
        """Predict yaw, pitch, roll from face crop"""
        pitch, yaw, roll = self.model.predict(face_crop_img)
        return yaw, pitch, roll  # Match current interface (yaw, pitch, roll)

Integration Points:

  1. Modify PoseDetector.detect_pose_faces() to optionally use 6DRepNet
  2. Extract face crops from RetinaFace bounding boxes
  3. Pass crops to 6DRepNet for prediction
  4. Return same format as current method

Key Challenge: Need face crops, not just landmarks

  • Current: Uses landmarks from RetinaFace
  • 6DRepNet: Needs image crops (can extract from same RetinaFace detection)

3. Configuration Changes

File: src/core/config.py

Add configuration option:

# Pose detection method: 'geometric' (current) or '6drepnet' (ML-based)
POSE_DETECTION_METHOD = 'geometric'  # or '6drepnet'

Performance Comparison

Current Method (Geometric)

Speed:

  • ~0.1-1ms per face (geometric calculations only)
  • No model loading overhead

Accuracy:

  • Good for frontal and moderate poses
  • May struggle with extreme angles or profile views
  • Depends on landmark quality

Memory:

  • Minimal (~10-50MB for RetinaFace only)

6DRepNet Method

Speed:

  • CPU: ~10-50ms per face (neural network inference)
  • GPU: ~5-10ms per face (with CUDA)
  • Initial model load: ~1-2 seconds (one-time)

Accuracy:

  • Higher accuracy across all pose ranges
  • Better generalization from training data
  • More robust to image quality variations

Memory:

  • Model weights: ~50-100MB
  • PyTorch runtime: ~200-500MB
  • Total: ~250-600MB additional

Batch Processing Impact

Example: Processing 1000 photos with 3 faces each = 3000 faces

Current Method:

  • Time: ~300-3000ms (0.3-3 seconds)
  • Very fast, minimal impact

6DRepNet (CPU):

  • Time: ~30-150 seconds (0.5-2.5 minutes)
  • Significant slowdown but acceptable for batch jobs

6DRepNet (GPU):

  • Time: ~15-30 seconds
  • Much faster with GPU acceleration

Recommendations

Phase 1: Add 6DRepNet as Optional Enhancement

  1. Keep current geometric method as default
  2. Add 6DRepNet as optional alternative
  3. Use configuration flag to enable: POSE_DETECTION_METHOD = '6drepnet'
  4. Graceful fallback if 6DRepNet unavailable

Phase 2: Performance Tuning

  1. Implement GPU acceleration if available
  2. Batch processing optimizations
  3. Cache model instance across batch operations

Phase 3: Evaluation

  1. Compare accuracy on real dataset
  2. Measure performance impact
  3. Decide on default method based on results

⚠️ Considerations

  1. Dependency Management:

    • PyTorch + TensorFlow coexistence is possible but increases requirements
    • Consider making 6DRepNet optional (extra dependency group)
  2. Face Crop Extraction:

    • Need to extract face crops from images
    • Can use RetinaFace bounding boxes (already available)
    • Or use DeepFace detection results
  3. Backward Compatibility:

    • Keep current method available
    • Database schema unchanged (same fields: yaw_angle, pitch_angle, roll_angle)
    • API interface unchanged
  4. GPU Support:

    • Optional but recommended for performance
    • Can detect CUDA availability automatically
    • Falls back to CPU if GPU unavailable

Implementation Complexity Assessment

Complexity: Medium

Factors:

  • Interface is compatible (same output format)
  • Existing architecture supports abstraction
  • ⚠️ Requires face crop extraction (not just landmarks)
  • ⚠️ PyTorch dependency adds complexity
  • ⚠️ Performance considerations for batch processing

Estimated Effort:

  • Initial Implementation: 2-4 hours
  • Testing & Validation: 2-3 hours
  • Documentation: 1 hour
  • Total: ~5-8 hours

Conclusion

6DRepNet is technically feasible and recommended for integration as an optional enhancement to the current geometric pose estimation method. The hybrid approach provides:

  1. Backward Compatibility: Current method remains default
  2. Improved Accuracy: Better pose estimation, especially for extreme angles
  3. Flexibility: Users can choose method based on accuracy vs. speed tradeoff
  4. Future-Proof: ML-based approach can be improved with model updates

Next Steps (if proceeding):

  1. Add sixdrepnet and torch to requirements (optional dependency group)
  2. Implement SixDRepNetPoseDetector class
  3. Modify PoseDetector to support both methods
  4. Add configuration option
  5. Test on sample dataset
  6. Measure performance impact
  7. Update documentation

References