# PunimTag A minimal face tagging proof-of-concept that automatically groups similar faces in your photo collection using face recognition and clustering. ## What it does PunimTag scans a folder of photos, detects all faces, and automatically groups similar faces together. It: 1. **Walks through your photos folder** - Processes all `.jpg` and `.png` files 2. **Detects faces** - Finds all faces in each image using dlib's face detection 3. **Creates face encodings** - Generates 128-dimensional face embeddings for each detected face 4. **Clusters similar faces** - Uses HDBSCAN clustering to group similar faces together 5. **Stores results in SQLite** - Saves everything to a `faces.db` database for easy querying ## Prerequisites - Python 3.8+ - CMake (required for dlib installation) - A `photos/` folder with your images ## Installation 1. Clone this repository: ```bash git clone cd PunimTag ``` 2. Create and activate a virtual environment: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. Install CMake if not already installed: ```bash # Ubuntu/Debian sudo apt-get install cmake # macOS brew install cmake # Windows # Download from https://cmake.org/download/ ``` 4. Install Python dependencies: ```bash pip install -r requirements.txt ``` ## Usage 1. Place your photos in the `photos/` folder (subdirectories are supported) 2. Run the script: ```bash python punimtag.py ``` 3. The script will process all images and create a `faces.db` SQLite database ## Database Schema The script creates three tables: ### `images` table - `id`: Primary key - `path`: File path to the image ### `faces` table - `id`: Primary key - `image_id`: Foreign key to images table - `location`: Face bounding box coordinates as string - `encoding`: 128-dimensional face encoding (stored as BLOB) - `cluster_id`: Foreign key to clusters table (NULL for unclustered faces) ### `clusters` table - `id`: Primary key - `label`: Cluster label (e.g., "Cluster 0", "Cluster 1") ## Querying the Database You can explore the results using any SQLite client: ```bash sqlite3 faces.db ``` Example queries: ```sql -- Count faces per image SELECT i.path, COUNT(f.id) as face_count FROM images i LEFT JOIN faces f ON i.id = f.image_id GROUP BY i.path; -- Find all images containing faces from a specific cluster SELECT DISTINCT i.path FROM images i JOIN faces f ON i.id = f.image_id WHERE f.cluster_id = 1; -- Count faces per cluster SELECT c.label, COUNT(f.id) as face_count FROM clusters c JOIN faces f ON c.id = f.cluster_id GROUP BY c.id; ``` ## How It Works 1. **Face Detection**: Uses HOG-based face detection from dlib to find face locations 2. **Face Encoding**: Generates a 128-dimensional vector for each face using a pre-trained neural network 3. **Clustering**: HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) groups similar face encodings together - Faces with similar encodings are grouped into the same cluster - Faces that don't match any cluster well are marked as noise (cluster_id = NULL) ## Limitations - This is a proof-of-concept with minimal error handling - Face detection may miss faces in poor lighting or at extreme angles - Clustering quality depends on having multiple photos of the same person - No GUI - results must be queried from the database ## Next Steps This minimal implementation can be extended with: - A web interface for viewing clustered faces - Better error handling and logging - Support for more image formats - Face recognition (matching against known individuals) - Incremental processing of new photos - Export functionality for organized photo albums ## License [Your chosen license]