punimtag/docs/RESOURCE_REQUIREMENTS.md
Tanya 68d280e8f5 feat: Add new analysis documents and update installation scripts for backend integration
This commit introduces several new analysis documents, including Auto-Match Load Performance Analysis, Folder Picker Analysis, Monorepo Migration Summary, and various performance analysis documents. Additionally, the installation scripts are updated to reflect changes in backend service paths, ensuring proper integration with the new backend structure. These enhancements provide better documentation and streamline the setup process for users.
2025-12-30 15:04:32 -05:00

15 KiB

PunimTag Resource Requirements

How the Software Works & What Resources You Need

This document explains how PunimTag works and what infrastructure resources you need to provision for Development, QA, and Production environments.


How the Software Works

System Architecture

PunimTag consists of 5 main components that work together:

┌─────────────────────────────────────────────────────────────┐
│                    USER'S BROWSER                            │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Frontend Website (React)                            │   │
│  │  - User interface for viewing/searching photos       │   │
│  │  - User authentication                               │   │
│  │  - Photo uploads, face identification, tagging      │   │
│  └──────────────┬──────────────────────────────────────┘   │
└──────────────────┼──────────────────────────────────────────┘
                   │ HTTP/HTTPS
                   ▼
┌─────────────────────────────────────────────────────────────┐
│              APPLICATION SERVER                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Backend API (FastAPI - Python)                      │   │
│  │  - Handles all requests from frontend                │   │
│  │  - Manages photos, faces, people, tags               │   │
│  │  - Processes authentication                           │   │
│  │  - Port: 8000                                        │   │
│  └──────────────┬──────────────────────────────────────┘   │
│                 │                                           │
│  ┌──────────────▼──────────────────────────────────────┐   │
│  │  Background Worker (Python)                          │   │
│  │  - Processes photos in background                    │   │
│  │  - Detects faces using AI (DeepFace)                │   │
│  │  - Generates face encodings                          │   │
│  │  - CPU-intensive work                                │   │
│  └──────────────┬──────────────────────────────────────┘   │
└──────────────────┼──────────────────────────────────────────┘
                   │
        ┌──────────┼──────────┬──────────┬──────────┐
        │          │          │          │          │
        ▼          ▼          ▼          ▼          ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│  Main    │ │  Auth    │ │  Redis   │ │  Photo   │ │  Web    │
│ Database │ │ Database │ │  Queue   │ │ Storage  │ │ Server  │
│(Postgres)│ │(Postgres)│ │          │ │  (Disk)  │ │(Nginx)  │
│          │ │          │ │          │ │          │ │         │
│ Photos   │ │ Frontend │ │ Job      │ │ Uploaded │ │ Serves  │
│ Faces    │ │ Users    │ │ Queue    │ │ Photos   │ │ Frontend│
│ People   │ │ Accounts │ │          │ │          │ │ Files   │
│ Tags     │ │          │ │          │ │          │ │         │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘

Component Responsibilities

  1. Frontend Website (React)

    • What it does: User interface that runs in browsers
    • Users interact with: View photos, search, identify faces, upload photos, tag photos
    • Technology: React + TypeScript, built with Vite
    • Served by: Web server (Nginx/Apache) in production, or dev server for testing
  2. Backend API (FastAPI)

    • What it does: Handles all business logic, processes requests from frontend
    • Responsibilities:
      • Authentication and authorization
      • Photo management (upload, scan, delete)
      • Face detection and processing (queues jobs)
      • Person identification and management
      • Tag management
      • Search functionality
    • Technology: Python 3.12+, FastAPI framework
    • Port: 8000 (configurable)
  3. Background Worker

    • What it does: Processes heavy tasks in the background
    • Responsibilities:
      • Face detection in photos (uses AI models)
      • Generating face encodings (512-dimensional vectors)
      • Processing photos uploaded by users
    • Technology: Python 3.12+, uses DeepFace AI library
    • Runs: As a separate process, connects to Redis for job queue
  4. Main Database (PostgreSQL)

    • What it stores:
      • Photos metadata (paths, dates, file info)
      • Detected faces (locations, encodings, quality scores)
      • People records (names, dates of birth)
      • Tags and photo-tag relationships
    • Technology: PostgreSQL 12+
    • Size: Grows with number of photos and faces
  5. Auth Database (PostgreSQL)

    • What it stores:
      • Frontend website user accounts
      • User-uploaded photos (pending approval)
      • User face identifications (pending approval)
      • User tag suggestions (pending approval)
      • Reported photos
    • Technology: PostgreSQL 12+
    • Size: Small, grows with number of users
  6. Redis

    • What it does: Job queue for background processing
    • Stores: Background job status and progress
    • Technology: Redis 5.0+
    • Size: Very small (just job metadata)
  7. Photo Storage (File System)

    • What it stores: Actual photo and video files
    • Location: Directory on server or network storage
    • Size: Depends on number and size of photos/videos
  8. Web Server (Nginx/Apache)

    • What it does: Serves the frontend website files in production
    • Technology: Nginx or Apache
    • Ports: 80 (HTTP), 443 (HTTPS)

Resource Requirements by Environment

Development Environment

Purpose: For development, testing new features, debugging

Components Needed:

  • 1 Application Server (can run all services)
  • 1 Database Server (can host both databases)
  • 1 Redis instance (can be on application server)

Server Specifications:

  • CPU: 2-4 cores (sufficient for development)
  • RAM: 4-8 GB (enough for development workloads)
  • Storage:
    • OS + Software: 20 GB
    • Database: 10-50 GB (depends on test data)
    • Photo storage: 50-200 GB (test photos)
    • Total: 80-270 GB

Network:

  • Port 8000 (Backend API)
  • Port 3000 (Frontend dev server) OR port 80/443 (if using web server)
  • Internal network access sufficient

Software:

  • Python 3.12+
  • Node.js 18+ (for building frontend)
  • PostgreSQL 12+
  • Redis 5.0+
  • Git

Notes:

  • Can run everything on a single server
  • Lower performance requirements
  • May use SQLite for databases (not recommended but possible)
  • Development mode frontend (Vite dev server) is fine

QA Environment

Purpose: For quality assurance testing, user acceptance testing, staging

Components Needed:

  • 1 Application Server
  • 1 Database Server (recommended separate, but can be same as app server)
  • 1 Redis instance (can be on application server)

Server Specifications:

  • CPU: 4-8 cores (should handle moderate load)
  • RAM: 8-16 GB (needs to handle concurrent users and processing)
  • Storage:
    • OS + Software: 20 GB
    • Database: 50-200 GB (larger test dataset)
    • Photo storage: 200 GB - 1 TB (realistic photo collection)
    • Total: 270 GB - 1.2 TB

Network:

  • Port 8000 (Backend API)
  • Port 80/443 (Frontend via web server - production-like)
  • Should be accessible to QA team (internal network or VPN)

Software:

  • Python 3.12+
  • Node.js 18+ (for building frontend)
  • PostgreSQL 12+
  • Redis 5.0+
  • Nginx or Apache (for serving frontend)
  • Git

Notes:

  • Should mirror production setup as closely as possible
  • Use production build of frontend (not dev server)
  • Should have realistic data volumes for testing
  • Performance should be similar to production

Production Environment

Purpose: Live system for end users

Components Needed:

  • 1-2 Application Servers (can scale horizontally)
  • 1 Database Server (dedicated, high performance)
  • 1 Redis Server (can be on database server or separate)
  • 1 Web Server (for serving frontend, can be on app server or separate)
  • Optional: Load balancer (if multiple app servers)

Server Specifications:

Application Server(s):

  • CPU: 8-16 cores (more cores = faster face processing)
  • RAM: 16-32 GB (handles concurrent users and background processing)
  • Storage:
    • OS + Software: 50 GB
    • Application code: 5 GB
    • Logs: 10-50 GB (with rotation)
    • Total: 65-105 GB per server

Database Server:

  • CPU: 8-16 cores (database queries and indexing)
  • RAM: 16-64 GB (depends on database size, more RAM = better performance)
  • Storage:
    • OS + Software: 50 GB
    • Database data: 100 GB - 5 TB+ (depends on photo collection size)
    • Database backups: 100 GB - 5 TB+ (same as data)
    • Total: 250 GB - 10 TB+
  • Storage Type: SSD recommended for better performance
  • Backup: Automated daily backups recommended

Photo Storage:

  • Storage: 500 GB - 50 TB+ (depends on photo collection)
  • Location: Can be on database server, separate storage server, or network storage (NAS)
  • Type: Network storage (NAS) recommended for large collections

Redis Server:

  • CPU: 2-4 cores (lightweight)
  • RAM: 2-4 GB (very small data)
  • Storage: 10 GB (minimal)
  • Can run on database server or application server

Web Server (if separate):

  • CPU: 2-4 cores
  • RAM: 2-4 GB
  • Storage: 10 GB
  • Can run on application server

Network:

  • Port 8000 (Backend API) - internal or load balancer
  • Port 80/443 (Frontend via web server) - public access
  • Port 5432 (Database) - internal only
  • Port 6379 (Redis) - internal only
  • HTTPS/SSL certificate required

Software:

  • Python 3.12+
  • Node.js 18+ (for building frontend)
  • PostgreSQL 12+ (latest stable recommended)
  • Redis 5.0+
  • Nginx or Apache (for serving frontend)
  • SSL certificate (for HTTPS)
  • Monitoring tools (optional but recommended)

High Availability (Optional):

  • Multiple application servers behind load balancer
  • Database replication (master-slave)
  • Redis replication
  • Automated backups
  • Monitoring and alerting

Resource Sizing Guidelines

Database Size Estimation

Main Database:

  • Per photo: ~1-5 KB (metadata only, photos stored on disk)
  • Per face: ~2-10 KB (encoding + metadata)
  • Per person: ~1 KB
  • Per tag: ~1 KB

Example:

  • 10,000 photos with 50,000 faces = ~250-500 MB
  • 1,000 people = ~1 MB
  • 100 tags = ~100 KB
  • Total: ~250-500 MB for 10K photos

Auth Database:

  • Per user: ~1 KB
  • Per pending photo: ~1 KB
  • Total: Usually < 100 MB

Photo Storage Size Estimation

  • Average photo: 2-5 MB
  • Average video: 50-200 MB
  • 10,000 photos = 20-50 GB
  • 100,000 photos = 200-500 GB
  • 1,000,000 photos = 2-5 TB

CPU Requirements

Face Processing:

  • 1 photo with faces: 2-10 seconds (depends on number of faces)
  • CPU-intensive: More cores = faster processing
  • Background worker uses CPU heavily during processing

API Server:

  • Light CPU usage for most requests
  • Higher CPU during photo uploads and searches

RAM Requirements

Application Server:

  • Base: 2-4 GB (OS + Python + FastAPI)
  • Per concurrent user: ~50-100 MB
  • Background worker: 2-4 GB (AI models loaded in memory)
  • Recommended: 16-32 GB for production

Database Server:

  • Base: 2-4 GB (OS + PostgreSQL)
  • Database cache: 25-50% of database size (PostgreSQL uses RAM for caching)
  • Recommended: 16-64 GB depending on database size

Deployment Scenarios

Scenario 1: Small Deployment (Single Server)

For: Small organizations, < 10,000 photos, < 50 users

  • 1 Server: Application + Database + Redis + Web Server
  • CPU: 8 cores
  • RAM: 16 GB
  • Storage: 500 GB - 1 TB
  • Cost: Low

Scenario 2: Medium Deployment (Separate Database)

For: Medium organizations, 10,000-100,000 photos, 50-200 users

  • 1 Application Server: App + Worker + Web Server
  • 1 Database Server: Both databases
  • CPU: 8-16 cores each
  • RAM: 16-32 GB each
  • Storage: 1-5 TB (database), 1-5 TB (photos)
  • Cost: Medium

Scenario 3: Large Deployment (Fully Separated)

For: Large organizations, 100,000+ photos, 200+ users

  • 2+ Application Servers: Behind load balancer
  • 1 Database Server: Main database (high performance)
  • 1 Database Server: Auth database (can be smaller)
  • 1 Redis Server: Job queue
  • 1 Web Server: Frontend (or CDN)
  • Storage Server/NAS: Photo storage
  • CPU: 16+ cores per server
  • RAM: 32-64 GB per server
  • Storage: 5-50+ TB
  • Cost: High

Summary Checklist

Development Environment

  • 1 server (4-8 GB RAM, 2-4 cores, 100-300 GB storage)
  • PostgreSQL (both databases can be on same server)
  • Redis
  • Python 3.12+, Node.js 18+

QA Environment

  • 1-2 servers (8-16 GB RAM, 4-8 cores, 300 GB - 1 TB storage)
  • PostgreSQL (separate server recommended)
  • Redis
  • Web server (Nginx/Apache)
  • Python 3.12+, Node.js 18+

Production Environment

  • 1-2 application servers (16-32 GB RAM, 8-16 cores, 100 GB storage)
  • 1 database server (16-64 GB RAM, 8-16 cores, 500 GB - 10 TB storage)
  • Redis server (2-4 GB RAM, can be on database server)
  • Web server (2-4 GB RAM, can be on app server)
  • Photo storage (500 GB - 50 TB, can be on database server or separate)
  • SSL certificate
  • Backup solution
  • Monitoring (optional but recommended)

Questions? Contact us to discuss your specific requirements and we can help you size the infrastructure appropriately.