diff --git a/docs/DATABASE_ARCHITECTURE_REVIEW.md b/docs/DATABASE_ARCHITECTURE_REVIEW.md new file mode 100644 index 0000000..b494837 --- /dev/null +++ b/docs/DATABASE_ARCHITECTURE_REVIEW.md @@ -0,0 +1,465 @@ +# Database Architecture Review + +**Created:** 2024 +**Purpose:** Comprehensive review of main database and auth database architecture, configuration, and production deployment options. + +--- + +## Table of Contents + +1. [Main Database (punimtag)](#main-database-punimtag) +2. [Auth Database (punimtag_auth)](#auth-database-punimtag_auth) +3. [SQLite vs PostgreSQL Configuration](#sqlite-vs-postgresql-configuration) +4. [Production Deployment Architecture](#production-deployment-architecture) +5. [Key Differences Summary](#key-differences-summary) +6. [Connection Management](#connection-management) +7. [Decision Points](#decision-points) + +--- + +## Main Database (punimtag) + +### Configuration +- **Environment Variable:** `DATABASE_URL` +- **Default:** SQLite (`sqlite:///data/punimtag.db`) +- **Production:** PostgreSQL (`postgresql+psycopg2://punimtag:punimtag_password@localhost:5432/punimtag`) +- **Connection Pool (PostgreSQL):** pool_size=10, max_overflow=20, pool_recycle=3600 + +### Purpose +Stores all photo metadata, face detections, people records, tags, and backend user accounts. + +### Schema Overview +The main database contains the core application data with the following tables: + +#### Core Tables + +1. **`photos`** - Photo metadata + - `id`, `path` (unique), `filename`, `date_added`, `date_taken`, `processed`, `file_hash`, `media_type` + - Relationships: faces, photo_tags, favorites, video_people + +2. **`people`** - Person records + - `id`, `first_name`, `last_name`, `middle_name`, `maiden_name`, `date_of_birth`, `created_date` + - Unique constraint on (first_name, last_name, middle_name, maiden_name, date_of_birth) + - Relationships: faces, person_encodings, video_photos + +3. **`faces`** - Face detection data + - `id`, `photo_id` (FK), `person_id` (FK), `encoding` (BLOB), `location`, `confidence`, `quality_score` + - Additional fields: `detector_backend`, `model_name`, `face_confidence`, `exif_orientation`, `pose_mode` + - Angles: `yaw_angle`, `pitch_angle`, `roll_angle` + - `landmarks` (JSON string), `identified_by_user_id` (FK), `excluded` (boolean) + - Relationships: photo, person, person_encodings + +4. **`person_encodings`** - Person face encodings for matching + - `id`, `person_id` (FK), `face_id` (FK), `encoding` (BLOB), `quality_score` + - `detector_backend`, `model_name`, `created_date` + - Relationships: person, face + +5. **`tags`** - Tag definitions + - `id`, `tag_name` (unique), `created_date` + - Relationships: photo_tags + +6. **`phototaglinkage`** - Photo-tag relationships + - `linkage_id` (PK), `photo_id` (FK), `tag_id` (FK), `linkage_type` (0=single, 1=bulk), `created_date` + - Unique constraint on (photo_id, tag_id) + - Relationships: photo, tag + +7. **`users`** (Main Database) - Backend user accounts + - `id`, `username` (unique), `password_hash`, `email` (unique), `full_name` + - `is_active`, `is_admin`, `role`, `password_change_required` + - `created_date`, `last_login` + - Relationships: faces (identified_by_user_id), photo_person_linkage + +8. **`photo_favorites`** - User-specific favorites + - `id`, `username`, `photo_id` (FK), `created_date` + - Unique constraint on (username, photo_id) + - Relationships: photo + +9. **`photo_person_linkage`** - Direct photo-person associations (for videos) + - `id`, `photo_id` (FK), `person_id` (FK), `identified_by_user_id` (FK), `created_date` + - Unique constraint on (photo_id, person_id) + - Relationships: photo, person, user + +10. **`role_permissions`** - Role-to-feature permission matrix + - `id`, `role`, `feature_key`, `allowed` + - Unique constraint on (role, feature_key) + +--- + +## Auth Database (punimtag_auth) + +### Configuration +- **Environment Variable:** `DATABASE_URL_AUTH` +- **Database Type:** PostgreSQL (always required) +- **Connection String:** `postgresql+psycopg2://punimtag:punimtag_password@localhost:5432/punimtag_auth` +- **Connection Pool:** pool_size=10, max_overflow=20, pool_recycle=3600 + +### Purpose +Separate database for frontend website users and pending operations, keeping the main database read-only. + +### Schema Overview + +#### Tables + +1. **`users`** (Auth Database) - Frontend website user accounts + - `id`, `email` (unique), `name`, `password_hash` + - `is_admin`, `has_write_access`, `email_verified` + - `email_confirmation_token`, `email_confirmation_token_expiry` + - `password_reset_token`, `password_reset_token_expiry` + - `is_active`, `created_at`, `updated_at` + - Relationships: pending_identifications, pending_photos, inappropriate_photo_reports, pending_linkages, photo_favorites + +2. **`pending_identifications`** - Face identifications pending approval + - `id`, `face_id` (references main DB, no FK), `user_id` (FK), `first_name`, `last_name` + - `middle_name`, `maiden_name`, `date_of_birth`, `status` (pending/approved/rejected) + - `created_at`, `updated_at` + - Relationships: user + - **Note:** `face_id` references faces in main database, validated in application code (no cross-database FK) + +3. **`pending_photos`** - Photos pending moderation + - `id`, `user_id` (FK), `filename`, `original_filename`, `file_path`, `file_size`, `mime_type` + - `status` (pending/approved/rejected), `submitted_at`, `reviewed_at`, `reviewed_by`, `rejection_reason` + - Relationships: user + +4. **`inappropriate_photo_reports`** - Reported photos for review + - `id`, `photo_id` (references main DB, no FK), `user_id` (FK), `status` (pending/reviewed/dismissed) + - `reported_at`, `reviewed_at`, `reviewed_by`, `review_notes`, `report_comment` + - Unique constraint on (photo_id, user_id) - prevents duplicate reports + - Relationships: user + - **Note:** `photo_id` references photos in main database, validated in application code + +5. **`pending_linkages`** - Tag submissions pending approval + - `id`, `photo_id` (references main DB), `tag_id`, `tag_name`, `user_id` (FK) + - `status` (pending/approved/rejected), `notes`, `created_at`, `updated_at` + - Relationships: user + +6. **`photo_favorites`** (Auth Database) - User favorites + - `id`, `photo_id` (references main DB, no FK), `user_id` (FK), `favorited_at` + - Unique constraint on (photo_id, user_id) + - Relationships: user + - **Note:** `photo_id` references photos in main database, validated in application code + +--- + +## SQLite vs PostgreSQL Configuration + +### SQLite (Development Default) + +**Connection String:** +``` +sqlite:///data/punimtag.db +``` + +**What it means:** +- `sqlite:///` - SQLite protocol +- `data/punimtag.db` - Relative file path (single file) + +**Characteristics:** +- ✅ File-based database (single `.db` file) +- ✅ No separate server process needed +- ✅ Zero configuration +- ✅ Perfect for development and small deployments +- ❌ Limited concurrency (file-level locking) +- ❌ No network access (local only) +- ❌ Lower performance under heavy load + +**When to use:** +- Local development +- Testing +- Small deployments +- Single-user scenarios + +### PostgreSQL (Production) + +**Connection String:** +``` +postgresql+psycopg2://punimtag:punimtag_password@localhost:5432/punimtag +``` + +**What it means:** +- `postgresql+psycopg2://` - PostgreSQL with psycopg2 driver +- `punimtag` - Username +- `punimtag_password` - Password +- `localhost:5432` - Host and port +- `punimtag` - Database name + +**Characteristics:** +- ✅ Server-based database +- ✅ Requires running PostgreSQL server +- ✅ Excellent concurrency (row-level locking) +- ✅ Network access +- ✅ High performance under load +- ✅ ACID compliance +- ✅ Advanced features (full-text search, JSON, etc.) +- ✅ Best for multi-user production + +**When to use:** +- Production environments +- Multi-user access +- High concurrency requirements +- Large datasets +- Network access needed + +### Key Differences + +| Feature | SQLite | PostgreSQL | +|---------|--------|------------| +| **Setup** | No setup needed | Requires server installation | +| **File vs Server** | Single file | Server process | +| **Concurrency** | Limited (file locking) | Excellent (row-level locking) | +| **Network Access** | No | Yes | +| **Performance** | Good for small/medium | Excellent for large scale | +| **Configuration** | Zero config | Requires setup | +| **Best For** | Development, small apps | Production, multi-user | + +### Migration Path + +You can switch from SQLite to PostgreSQL by: +1. Installing PostgreSQL +2. Creating the database +3. Updating `DATABASE_URL` in `.env` +4. Running migration scripts (if needed) + +The application code remains the same; SQLAlchemy handles the differences between database types. + +--- + +## Production Deployment Architecture + +### Question: Will main and auth databases sit on the same server in production? + +**Answer: Yes, they can be on the same server. There are multiple deployment options:** + +### Option 1: Same PostgreSQL Server, Different Databases (Most Common) ⭐ + +Both databases run on the same PostgreSQL instance as separate databases: + +``` +PostgreSQL Server (localhost:5432) +├── Database: punimtag (main database) +└── Database: punimtag_auth (auth database) +``` + +**Connection strings:** +```bash +# Main database +DATABASE_URL=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag + +# Auth database (same server, different database) +DATABASE_URL_AUTH=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag_auth +``` + +**Advantages:** +- ✅ Simple setup and management +- ✅ Shared resources (memory, connections) +- ✅ Single backup target +- ✅ Lower overhead + +**When to use:** +- Standard production deployments +- Small to medium scale +- Single-server setups + +**Setup:** +```bash +# Create both databases on same PostgreSQL instance +sudo -u postgres psql -c "CREATE DATABASE punimtag OWNER punimtag;" +sudo -u postgres psql -c "CREATE DATABASE punimtag_auth OWNER punimtag;" +``` + +### Option 2: Different Servers (For Separation/High Availability) + +Each database runs on its own PostgreSQL server: + +``` +Server 1: PostgreSQL (db-server-1:5432) +└── Database: punimtag + +Server 2: PostgreSQL (db-server-2:5432) +└── Database: punimtag_auth +``` + +**Connection strings:** +```bash +# Main database +DATABASE_URL=postgresql+psycopg2://punimtag:password@db-server-1:5432/punimtag + +# Auth database (different server) +DATABASE_URL_AUTH=postgresql+psycopg2://punimtag:password@db-server-2:5432/punimtag_auth +``` + +**Advantages:** +- ✅ Complete isolation and security +- ✅ Independent scaling +- ✅ Better fault tolerance +- ✅ Compliance/regulatory separation + +**When to use:** +- High availability requirements +- Security/compliance needs +- Large scale deployments +- Multi-region setups + +### Option 3: Same Server, Different PostgreSQL Instances (Rare) + +Both instances on the same machine: + +``` +Server +├── PostgreSQL Instance 1 (port 5432) +│ └── Database: punimtag +└── PostgreSQL Instance 2 (port 5433) + └── Database: punimtag_auth +``` + +**When to use:** +- Special requirements +- Resource isolation needs +- Testing scenarios + +### Deployment Options Summary + +| Setup | Same Server? | Same PostgreSQL Instance? | Use Case | +|-------|--------------|---------------------------|----------| +| **Option 1** | ✅ Yes | ✅ Yes (different databases) | **Most common** - Standard production | +| **Option 2** | ❌ No | ❌ No | High availability, security separation | +| **Option 3** | ✅ Yes | ❌ No (different ports) | Rare - special requirements | + +**Recommendation:** Start with **Option 1** (same server, same PostgreSQL instance, different databases). It's simpler, easier to manage, and sufficient for most deployments. You can move to separate servers later if needed. + +The application supports all three options - just configure via `DATABASE_URL` and `DATABASE_URL_AUTH`. + +--- + +## Key Differences Summary + +| Feature | Main DB | Auth DB | +|---------|---------|---------| +| **Database Type** | SQLite (dev) / PostgreSQL (prod) | PostgreSQL (always) | +| **Purpose** | Photo metadata, faces, people, tags | User accounts, pending operations | +| **Read/Write** | Read-write | Read-write | +| **Foreign Keys** | Full FK support | Cross-database references validated in code | +| **User Table** | Backend users (username, email, role) | Frontend users (email, name, password_hash) | +| **Initialization** | Auto-created on startup | Requires manual setup or frontend script | +| **Required in Production** | Yes | Yes (for frontend website) | + +--- + +## Connection Management + +The application automatically detects which database type to use based on the `DATABASE_URL`: + +### Main Database Connection + +```python +def get_database_url() -> str: + """Fetch database URL from environment or defaults.""" + db_url = os.getenv("DATABASE_URL") + if db_url: + return db_url + # Default to SQLite for development + return "sqlite:///data/punimtag.db" +``` + +**Connection behavior:** +- **SQLite:** Uses `check_same_thread=False` for multi-threading +- **PostgreSQL:** Uses connection pooling (pool_size=10, max_overflow=20) + +### Auth Database Connection + +```python +def get_auth_database_url() -> str: + """Fetch auth database URL from environment.""" + db_url = os.getenv("DATABASE_URL_AUTH") + if not db_url: + raise ValueError("DATABASE_URL_AUTH environment variable not set") + return db_url +``` + +**Connection behavior:** +- **PostgreSQL only:** Uses connection pooling (pool_size=10, max_overflow=20) +- **Required:** Must be configured for frontend website functionality + +### Connection Pool Settings + +Both databases use connection pooling for PostgreSQL: +- `pool_size=10` - Base connection pool size +- `max_overflow=20` - Maximum overflow connections +- `pool_recycle=3600` - Recycle connections after 1 hour +- `pool_pre_ping=True` - Verify connections before use + +--- + +## Decision Points + +### For Development + +1. **Main Database:** + - ✅ Use SQLite (`sqlite:///data/punimtag.db`) + - Simple, no setup required + - Fast for development + +2. **Auth Database:** + - ✅ Use PostgreSQL (`postgresql+psycopg2://...@localhost:5432/punimtag_auth`) + - Required for frontend website features + - Can be on same PostgreSQL instance as production + +### For Production + +1. **Database Type:** + - ✅ Main DB: PostgreSQL (switch from SQLite) + - ✅ Auth DB: PostgreSQL (always required) + +2. **Deployment Architecture:** + - **Recommended:** Option 1 (same server, same PostgreSQL instance, different databases) + - **Alternative:** Option 2 (different servers) if you need high availability or security separation + +3. **Configuration:** + ```bash + # Main database + DATABASE_URL=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag + + # Auth database + DATABASE_URL_AUTH=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag_auth + ``` + +4. **Backup Strategy:** + - Both databases should be backed up + - If on same server, can backup together + - If on different servers, backup separately + +5. **Security Considerations:** + - Use strong passwords + - Limit database user permissions + - Use SSL/TLS for remote connections + - Consider separate users for read-only access + +--- + +## Next Steps + +1. **Review this document** and decide on deployment architecture +2. **Choose database configuration:** + - Development: SQLite for main DB, PostgreSQL for auth DB + - Production: PostgreSQL for both +3. **Decide on deployment:** + - Same server, same PostgreSQL instance (recommended) + - Or separate servers if needed +4. **Set up databases** according to chosen architecture +5. **Configure `.env` file** with appropriate connection strings +6. **Test connections** and verify both databases work correctly + +--- + +## References + +- Main database models: `backend/db/models.py` +- Database session management: `backend/db/session.py` +- Auth database schema: `viewer-frontend/prisma/schema-auth.prisma` +- Main database schema: `viewer-frontend/prisma/schema.prisma` +- Deployment questions: `docs/CLIENT_DEPLOYMENT_QUESTIONS.md` +- Architecture documentation: `docs/ARCHITECTURE.md` + +--- + +**Note:** This document should be reviewed and updated as deployment decisions are made and architecture evolves. +