This commit introduces a comprehensive analysis of pose modes and face width detection to enhance profile classification accuracy. New scripts have been added to analyze pose data in the database, check identified faces for pose information, and validate yaw angles. The PoseDetector class has been updated to calculate face width from landmarks, which serves as an additional indicator for profile detection. The frontend and API have been modified to include pose mode in responses, ensuring better integration with existing functionalities. Documentation has been updated to reflect these changes, improving user experience and accuracy in face processing.
14 KiB
6DRepNet Integration Analysis
Date: 2025-01-XX
Status: Analysis Only (No Code Changes)
Purpose: Evaluate feasibility of integrating 6DRepNet for direct yaw/pitch/roll estimation
Executive Summary
6DRepNet is technically feasible to implement as an alternative or enhancement to the current RetinaFace-based landmark pose estimation. The integration would provide more accurate direct pose estimation but requires PyTorch dependency and architectural adjustments.
Key Findings:
- ✅ Technically Feasible: 6DRepNet is available as a PyPI package (
sixdrepnet) - ⚠️ Dependency Conflict: Requires PyTorch (currently using TensorFlow via DeepFace)
- ✅ Interface Compatible: Can work with existing OpenCV/CV2 image processing
- 📊 Accuracy Improvement: Direct estimation vs. geometric calculation from landmarks
- 🔄 Architectural Impact: Requires abstraction layer to support both methods
Current Implementation Analysis
Current Pose Detection Architecture
Location: src/utils/pose_detection.py
Current Method:
- Uses RetinaFace to detect faces and extract facial landmarks
- Calculates yaw, pitch, roll geometrically from landmark positions:
- Yaw: Calculated from nose position relative to eye midpoint
- Pitch: Calculated from nose position relative to expected vertical position
- Roll: Calculated from eye line angle
- Uses face width (eye distance) as additional indicator for profile detection
- Classifies pose mode from angles using thresholds
Key Characteristics:
- ✅ No additional ML model dependencies (uses RetinaFace landmarks)
- ✅ Lightweight (geometric calculations only)
- ⚠️ Accuracy depends on landmark quality and geometric assumptions
- ⚠️ May have limitations with extreme poses or low-quality images
Integration Points:
FaceProcessor.__init__(): InitializesPoseDetectorwith graceful fallbackprocess_faces(): Callspose_detector.detect_pose_faces(img_path)face_service.py: Uses sharedPoseDetectorinstance for batch processing- Returns:
{'yaw_angle', 'pitch_angle', 'roll_angle', 'pose_mode', ...}
6DRepNet Overview
What is 6DRepNet?
6DRepNet is a PyTorch-based deep learning model designed for direct head pose estimation using a continuous 6D rotation matrix representation. It addresses ambiguities in rotation labels and enables robust full-range head pose predictions.
Key Features:
- Direct estimation of yaw, pitch, roll angles
- Full 360° range support
- Competitive accuracy (MAE ~2.66° on BIWI dataset)
- Available as easy-to-use Python package
Technical Specifications
Package: sixdrepnet (PyPI)
Framework: PyTorch
Input: Image (OpenCV format, numpy array, or PIL Image)
Output: (pitch, yaw, roll) angles in degrees
Model Size: ~50-100MB (weights downloaded automatically)
Dependencies:
- PyTorch (CPU or CUDA)
- OpenCV (already in requirements)
- NumPy (already in requirements)
Usage Example
from sixdrepnet import SixDRepNet
import cv2
# Initialize (weights downloaded automatically)
model = SixDRepNet()
# Load image
img = cv2.imread('/path/to/image.jpg')
# Predict pose (returns pitch, yaw, roll)
pitch, yaw, roll = model.predict(img)
# Optional: visualize results
model.draw_axis(img, yaw, pitch, roll)
Integration Feasibility Analysis
✅ Advantages
-
Higher Accuracy
- Direct ML-based estimation vs. geometric calculations
- Trained on diverse datasets, better generalization
- Handles extreme poses better than geometric methods
-
Full Range Support
- Supports full 360° rotation (current method may struggle with extreme angles)
- Better profile detection accuracy
-
Simpler Integration
- Single method call:
model.predict(img)returns angles directly - No need to match landmarks to faces or calculate from geometry
- Can work with face crops directly (no need for full landmarks)
- Single method call:
-
Consistent Interface
- Returns same format:
(pitch, yaw, roll)in degrees - Can drop-in replace current
PoseDetectorclass methods
- Returns same format:
⚠️ Challenges
-
Dependency Conflict
- Current Stack: TensorFlow (via DeepFace)
- 6DRepNet Requires: PyTorch
- Impact: Both frameworks can coexist but increase memory footprint
-
Face Detection Dependency
- 6DRepNet requires face crops as input (not full images)
- Current flow: RetinaFace → landmarks → geometric calculation
- New flow: RetinaFace → face crop → 6DRepNet → angles
- Still need RetinaFace for face detection/bounding boxes
-
Initialization Overhead
- Model loading time on first use (~1-2 seconds)
- Model weights download (~50-100MB) on first initialization
- GPU memory usage if CUDA available (optional but faster)
-
Processing Speed
- Current: Geometric calculations (very fast, <1ms per face)
- 6DRepNet: Neural network inference (~10-50ms per face on CPU, ~5-10ms on GPU)
- Impact on batch processing: ~10-50x slower per face
-
Memory Footprint
- PyTorch + model weights: ~200-500MB additional memory
- Model kept in memory for batch processing (good for performance)
Architecture Compatibility
Current Architecture
┌─────────────────────────────────────────┐
│ FaceProcessor │
│ ┌───────────────────────────────────┐ │
│ │ PoseDetector (RetinaFace) │ │
│ │ - detect_pose_faces(img_path) │ │
│ │ - Returns: yaw, pitch, roll │ │
│ └───────────────────────────────────┘ │
│ │
│ DeepFace (TensorFlow) │
│ - Face detection + encoding │
└─────────────────────────────────────────┘
Proposed Architecture (6DRepNet)
┌─────────────────────────────────────────┐
│ FaceProcessor │
│ ┌───────────────────────────────────┐ │
│ │ PoseDetector (6DRepNet) │ │
│ │ - Requires: face crop (from │ │
│ │ RetinaFace/DeepFace) │ │
│ │ - model.predict(face_crop) │ │
│ │ - Returns: yaw, pitch, roll │ │
│ └───────────────────────────────────┘ │
│ │
│ DeepFace (TensorFlow) │
│ - Face detection + encoding │
│ │
│ RetinaFace (still needed) │
│ - Face detection + bounding boxes │
└─────────────────────────────────────────┘
Integration Strategy Options
Option 1: Replace Current Method
- Remove geometric calculations
- Use 6DRepNet exclusively
- Pros: Simpler, one method only
- Cons: Loses lightweight fallback option
Option 2: Hybrid Approach (Recommended)
- Support both methods via configuration
- Use 6DRepNet when available, fallback to geometric
- Pros: Backward compatible, graceful degradation
- Cons: More complex code
Option 3: Parallel Execution
- Run both methods and compare/validate
- Pros: Best of both worlds, validation
- Cons: 2x processing time
Implementation Requirements
1. Dependencies
Add to requirements.txt:
# 6DRepNet for direct pose estimation
sixdrepnet>=1.0.0
torch>=2.0.0 # PyTorch (CPU version)
# OR
# torch>=2.0.0+cu118 # PyTorch with CUDA support (if GPU available)
Note: PyTorch installation depends on system:
- CPU-only:
pip install torch(smaller, ~150MB) - CUDA-enabled:
pip install torch --index-url https://download.pytorch.org/whl/cu118(larger, ~1GB)
2. Code Changes Required
File: src/utils/pose_detection.py
New Class: SixDRepNetPoseDetector
class SixDRepNetPoseDetector:
"""Pose detector using 6DRepNet for direct angle estimation"""
def __init__(self):
from sixdrepnet import SixDRepNet
self.model = SixDRepNet()
def predict_pose(self, face_crop_img) -> Tuple[float, float, float]:
"""Predict yaw, pitch, roll from face crop"""
pitch, yaw, roll = self.model.predict(face_crop_img)
return yaw, pitch, roll # Match current interface (yaw, pitch, roll)
Integration Points:
- Modify
PoseDetector.detect_pose_faces()to optionally use 6DRepNet - Extract face crops from RetinaFace bounding boxes
- Pass crops to 6DRepNet for prediction
- Return same format as current method
Key Challenge: Need face crops, not just landmarks
- Current: Uses landmarks from RetinaFace
- 6DRepNet: Needs image crops (can extract from same RetinaFace detection)
3. Configuration Changes
File: src/core/config.py
Add configuration option:
# Pose detection method: 'geometric' (current) or '6drepnet' (ML-based)
POSE_DETECTION_METHOD = 'geometric' # or '6drepnet'
Performance Comparison
Current Method (Geometric)
Speed:
- ~0.1-1ms per face (geometric calculations only)
- No model loading overhead
Accuracy:
- Good for frontal and moderate poses
- May struggle with extreme angles or profile views
- Depends on landmark quality
Memory:
- Minimal (~10-50MB for RetinaFace only)
6DRepNet Method
Speed:
- CPU: ~10-50ms per face (neural network inference)
- GPU: ~5-10ms per face (with CUDA)
- Initial model load: ~1-2 seconds (one-time)
Accuracy:
- Higher accuracy across all pose ranges
- Better generalization from training data
- More robust to image quality variations
Memory:
- Model weights: ~50-100MB
- PyTorch runtime: ~200-500MB
- Total: ~250-600MB additional
Batch Processing Impact
Example: Processing 1000 photos with 3 faces each = 3000 faces
Current Method:
- Time: ~300-3000ms (0.3-3 seconds)
- Very fast, minimal impact
6DRepNet (CPU):
- Time: ~30-150 seconds (0.5-2.5 minutes)
- Significant slowdown but acceptable for batch jobs
6DRepNet (GPU):
- Time: ~15-30 seconds
- Much faster with GPU acceleration
Recommendations
✅ Recommended Approach: Hybrid Implementation
Phase 1: Add 6DRepNet as Optional Enhancement
- Keep current geometric method as default
- Add 6DRepNet as optional alternative
- Use configuration flag to enable:
POSE_DETECTION_METHOD = '6drepnet' - Graceful fallback if 6DRepNet unavailable
Phase 2: Performance Tuning
- Implement GPU acceleration if available
- Batch processing optimizations
- Cache model instance across batch operations
Phase 3: Evaluation
- Compare accuracy on real dataset
- Measure performance impact
- Decide on default method based on results
⚠️ Considerations
-
Dependency Management:
- PyTorch + TensorFlow coexistence is possible but increases requirements
- Consider making 6DRepNet optional (extra dependency group)
-
Face Crop Extraction:
- Need to extract face crops from images
- Can use RetinaFace bounding boxes (already available)
- Or use DeepFace detection results
-
Backward Compatibility:
- Keep current method available
- Database schema unchanged (same fields: yaw_angle, pitch_angle, roll_angle)
- API interface unchanged
-
GPU Support:
- Optional but recommended for performance
- Can detect CUDA availability automatically
- Falls back to CPU if GPU unavailable
Implementation Complexity Assessment
Complexity: Medium
Factors:
- ✅ Interface is compatible (same output format)
- ✅ Existing architecture supports abstraction
- ⚠️ Requires face crop extraction (not just landmarks)
- ⚠️ PyTorch dependency adds complexity
- ⚠️ Performance considerations for batch processing
Estimated Effort:
- Initial Implementation: 2-4 hours
- Testing & Validation: 2-3 hours
- Documentation: 1 hour
- Total: ~5-8 hours
Conclusion
6DRepNet is technically feasible and recommended for integration as an optional enhancement to the current geometric pose estimation method. The hybrid approach provides:
- Backward Compatibility: Current method remains default
- Improved Accuracy: Better pose estimation, especially for extreme angles
- Flexibility: Users can choose method based on accuracy vs. speed tradeoff
- Future-Proof: ML-based approach can be improved with model updates
Next Steps (if proceeding):
- Add
sixdrepnetandtorchto requirements (optional dependency group) - Implement
SixDRepNetPoseDetectorclass - Modify
PoseDetectorto support both methods - Add configuration option
- Test on sample dataset
- Measure performance impact
- Update documentation
References
- 6DRepNet Paper: 6D Rotation Representation For Unconstrained Head Pose Estimation
- PyPI Package: sixdrepnet
- PyTorch Installation: https://pytorch.org/get-started/locally/
- Current Implementation:
src/utils/pose_detection.py