PunimTag
A minimal face tagging proof-of-concept that automatically groups similar faces in your photo collection using face recognition and clustering.
What it does
PunimTag scans a folder of photos, detects all faces, and automatically groups similar faces together. It:
- Walks through your photos folder - Processes all
.jpgand.pngfiles - Detects faces - Finds all faces in each image using dlib's face detection
- Creates face encodings - Generates 128-dimensional face embeddings for each detected face
- Clusters similar faces - Uses HDBSCAN clustering to group similar faces together
- Stores results in SQLite - Saves everything to a
faces.dbdatabase for easy querying
Prerequisites
- Python 3.8+
- CMake (required for dlib installation)
- A
photos/folder with your images
Installation
- Clone this repository:
git clone <repository-url>
cd PunimTag
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install CMake if not already installed:
# Ubuntu/Debian
sudo apt-get install cmake
# macOS
brew install cmake
# Windows
# Download from https://cmake.org/download/
- Install Python dependencies:
pip install -r requirements.txt
Usage
-
Place your photos in the
photos/folder (subdirectories are supported) -
Run the script:
python punimtag.py
- The script will process all images and create a
faces.dbSQLite database
Database Schema
The script creates three tables:
images table
id: Primary keypath: File path to the image
faces table
id: Primary keyimage_id: Foreign key to images tablelocation: Face bounding box coordinates as stringencoding: 128-dimensional face encoding (stored as BLOB)cluster_id: Foreign key to clusters table (NULL for unclustered faces)
clusters table
id: Primary keylabel: Cluster label (e.g., "Cluster 0", "Cluster 1")
Querying the Database
You can explore the results using any SQLite client:
sqlite3 faces.db
Example queries:
-- Count faces per image
SELECT i.path, COUNT(f.id) as face_count
FROM images i
LEFT JOIN faces f ON i.id = f.image_id
GROUP BY i.path;
-- Find all images containing faces from a specific cluster
SELECT DISTINCT i.path
FROM images i
JOIN faces f ON i.id = f.image_id
WHERE f.cluster_id = 1;
-- Count faces per cluster
SELECT c.label, COUNT(f.id) as face_count
FROM clusters c
JOIN faces f ON c.id = f.cluster_id
GROUP BY c.id;
How It Works
- Face Detection: Uses HOG-based face detection from dlib to find face locations
- Face Encoding: Generates a 128-dimensional vector for each face using a pre-trained neural network
- Clustering: HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) groups similar face encodings together
- Faces with similar encodings are grouped into the same cluster
- Faces that don't match any cluster well are marked as noise (cluster_id = NULL)
Limitations
- This is a proof-of-concept with minimal error handling
- Face detection may miss faces in poor lighting or at extreme angles
- Clustering quality depends on having multiple photos of the same person
- No GUI - results must be queried from the database
Next Steps
This minimal implementation can be extended with:
- A web interface for viewing clustered faces
- Better error handling and logging
- Support for more image formats
- Face recognition (matching against known individuals)
- Incremental processing of new photos
- Export functionality for organized photo albums
License
[Your chosen license]
Description
Languages
TypeScript
50%
Python
47.7%
Shell
2.2%