feat: Add comprehensive database architecture review document

This commit introduces a new document, `DATABASE_ARCHITECTURE_REVIEW.md`, providing a detailed overview of the main and auth database architectures, configurations, and production deployment options. It includes sections on database schemas, connection management, and deployment strategies, enhancing documentation for better understanding and future reference.
This commit is contained in:
Tanya 2026-01-05 15:22:09 -05:00
parent fe01ff51b8
commit b104dcba71

View File

@ -0,0 +1,465 @@
# Database Architecture Review
**Created:** 2024
**Purpose:** Comprehensive review of main database and auth database architecture, configuration, and production deployment options.
---
## Table of Contents
1. [Main Database (punimtag)](#main-database-punimtag)
2. [Auth Database (punimtag_auth)](#auth-database-punimtag_auth)
3. [SQLite vs PostgreSQL Configuration](#sqlite-vs-postgresql-configuration)
4. [Production Deployment Architecture](#production-deployment-architecture)
5. [Key Differences Summary](#key-differences-summary)
6. [Connection Management](#connection-management)
7. [Decision Points](#decision-points)
---
## Main Database (punimtag)
### Configuration
- **Environment Variable:** `DATABASE_URL`
- **Default:** SQLite (`sqlite:///data/punimtag.db`)
- **Production:** PostgreSQL (`postgresql+psycopg2://punimtag:punimtag_password@localhost:5432/punimtag`)
- **Connection Pool (PostgreSQL):** pool_size=10, max_overflow=20, pool_recycle=3600
### Purpose
Stores all photo metadata, face detections, people records, tags, and backend user accounts.
### Schema Overview
The main database contains the core application data with the following tables:
#### Core Tables
1. **`photos`** - Photo metadata
- `id`, `path` (unique), `filename`, `date_added`, `date_taken`, `processed`, `file_hash`, `media_type`
- Relationships: faces, photo_tags, favorites, video_people
2. **`people`** - Person records
- `id`, `first_name`, `last_name`, `middle_name`, `maiden_name`, `date_of_birth`, `created_date`
- Unique constraint on (first_name, last_name, middle_name, maiden_name, date_of_birth)
- Relationships: faces, person_encodings, video_photos
3. **`faces`** - Face detection data
- `id`, `photo_id` (FK), `person_id` (FK), `encoding` (BLOB), `location`, `confidence`, `quality_score`
- Additional fields: `detector_backend`, `model_name`, `face_confidence`, `exif_orientation`, `pose_mode`
- Angles: `yaw_angle`, `pitch_angle`, `roll_angle`
- `landmarks` (JSON string), `identified_by_user_id` (FK), `excluded` (boolean)
- Relationships: photo, person, person_encodings
4. **`person_encodings`** - Person face encodings for matching
- `id`, `person_id` (FK), `face_id` (FK), `encoding` (BLOB), `quality_score`
- `detector_backend`, `model_name`, `created_date`
- Relationships: person, face
5. **`tags`** - Tag definitions
- `id`, `tag_name` (unique), `created_date`
- Relationships: photo_tags
6. **`phototaglinkage`** - Photo-tag relationships
- `linkage_id` (PK), `photo_id` (FK), `tag_id` (FK), `linkage_type` (0=single, 1=bulk), `created_date`
- Unique constraint on (photo_id, tag_id)
- Relationships: photo, tag
7. **`users`** (Main Database) - Backend user accounts
- `id`, `username` (unique), `password_hash`, `email` (unique), `full_name`
- `is_active`, `is_admin`, `role`, `password_change_required`
- `created_date`, `last_login`
- Relationships: faces (identified_by_user_id), photo_person_linkage
8. **`photo_favorites`** - User-specific favorites
- `id`, `username`, `photo_id` (FK), `created_date`
- Unique constraint on (username, photo_id)
- Relationships: photo
9. **`photo_person_linkage`** - Direct photo-person associations (for videos)
- `id`, `photo_id` (FK), `person_id` (FK), `identified_by_user_id` (FK), `created_date`
- Unique constraint on (photo_id, person_id)
- Relationships: photo, person, user
10. **`role_permissions`** - Role-to-feature permission matrix
- `id`, `role`, `feature_key`, `allowed`
- Unique constraint on (role, feature_key)
---
## Auth Database (punimtag_auth)
### Configuration
- **Environment Variable:** `DATABASE_URL_AUTH`
- **Database Type:** PostgreSQL (always required)
- **Connection String:** `postgresql+psycopg2://punimtag:punimtag_password@localhost:5432/punimtag_auth`
- **Connection Pool:** pool_size=10, max_overflow=20, pool_recycle=3600
### Purpose
Separate database for frontend website users and pending operations, keeping the main database read-only.
### Schema Overview
#### Tables
1. **`users`** (Auth Database) - Frontend website user accounts
- `id`, `email` (unique), `name`, `password_hash`
- `is_admin`, `has_write_access`, `email_verified`
- `email_confirmation_token`, `email_confirmation_token_expiry`
- `password_reset_token`, `password_reset_token_expiry`
- `is_active`, `created_at`, `updated_at`
- Relationships: pending_identifications, pending_photos, inappropriate_photo_reports, pending_linkages, photo_favorites
2. **`pending_identifications`** - Face identifications pending approval
- `id`, `face_id` (references main DB, no FK), `user_id` (FK), `first_name`, `last_name`
- `middle_name`, `maiden_name`, `date_of_birth`, `status` (pending/approved/rejected)
- `created_at`, `updated_at`
- Relationships: user
- **Note:** `face_id` references faces in main database, validated in application code (no cross-database FK)
3. **`pending_photos`** - Photos pending moderation
- `id`, `user_id` (FK), `filename`, `original_filename`, `file_path`, `file_size`, `mime_type`
- `status` (pending/approved/rejected), `submitted_at`, `reviewed_at`, `reviewed_by`, `rejection_reason`
- Relationships: user
4. **`inappropriate_photo_reports`** - Reported photos for review
- `id`, `photo_id` (references main DB, no FK), `user_id` (FK), `status` (pending/reviewed/dismissed)
- `reported_at`, `reviewed_at`, `reviewed_by`, `review_notes`, `report_comment`
- Unique constraint on (photo_id, user_id) - prevents duplicate reports
- Relationships: user
- **Note:** `photo_id` references photos in main database, validated in application code
5. **`pending_linkages`** - Tag submissions pending approval
- `id`, `photo_id` (references main DB), `tag_id`, `tag_name`, `user_id` (FK)
- `status` (pending/approved/rejected), `notes`, `created_at`, `updated_at`
- Relationships: user
6. **`photo_favorites`** (Auth Database) - User favorites
- `id`, `photo_id` (references main DB, no FK), `user_id` (FK), `favorited_at`
- Unique constraint on (photo_id, user_id)
- Relationships: user
- **Note:** `photo_id` references photos in main database, validated in application code
---
## SQLite vs PostgreSQL Configuration
### SQLite (Development Default)
**Connection String:**
```
sqlite:///data/punimtag.db
```
**What it means:**
- `sqlite:///` - SQLite protocol
- `data/punimtag.db` - Relative file path (single file)
**Characteristics:**
- ✅ File-based database (single `.db` file)
- ✅ No separate server process needed
- ✅ Zero configuration
- ✅ Perfect for development and small deployments
- ❌ Limited concurrency (file-level locking)
- ❌ No network access (local only)
- ❌ Lower performance under heavy load
**When to use:**
- Local development
- Testing
- Small deployments
- Single-user scenarios
### PostgreSQL (Production)
**Connection String:**
```
postgresql+psycopg2://punimtag:punimtag_password@localhost:5432/punimtag
```
**What it means:**
- `postgresql+psycopg2://` - PostgreSQL with psycopg2 driver
- `punimtag` - Username
- `punimtag_password` - Password
- `localhost:5432` - Host and port
- `punimtag` - Database name
**Characteristics:**
- ✅ Server-based database
- ✅ Requires running PostgreSQL server
- ✅ Excellent concurrency (row-level locking)
- ✅ Network access
- ✅ High performance under load
- ✅ ACID compliance
- ✅ Advanced features (full-text search, JSON, etc.)
- ✅ Best for multi-user production
**When to use:**
- Production environments
- Multi-user access
- High concurrency requirements
- Large datasets
- Network access needed
### Key Differences
| Feature | SQLite | PostgreSQL |
|---------|--------|------------|
| **Setup** | No setup needed | Requires server installation |
| **File vs Server** | Single file | Server process |
| **Concurrency** | Limited (file locking) | Excellent (row-level locking) |
| **Network Access** | No | Yes |
| **Performance** | Good for small/medium | Excellent for large scale |
| **Configuration** | Zero config | Requires setup |
| **Best For** | Development, small apps | Production, multi-user |
### Migration Path
You can switch from SQLite to PostgreSQL by:
1. Installing PostgreSQL
2. Creating the database
3. Updating `DATABASE_URL` in `.env`
4. Running migration scripts (if needed)
The application code remains the same; SQLAlchemy handles the differences between database types.
---
## Production Deployment Architecture
### Question: Will main and auth databases sit on the same server in production?
**Answer: Yes, they can be on the same server. There are multiple deployment options:**
### Option 1: Same PostgreSQL Server, Different Databases (Most Common) ⭐
Both databases run on the same PostgreSQL instance as separate databases:
```
PostgreSQL Server (localhost:5432)
├── Database: punimtag (main database)
└── Database: punimtag_auth (auth database)
```
**Connection strings:**
```bash
# Main database
DATABASE_URL=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag
# Auth database (same server, different database)
DATABASE_URL_AUTH=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag_auth
```
**Advantages:**
- ✅ Simple setup and management
- ✅ Shared resources (memory, connections)
- ✅ Single backup target
- ✅ Lower overhead
**When to use:**
- Standard production deployments
- Small to medium scale
- Single-server setups
**Setup:**
```bash
# Create both databases on same PostgreSQL instance
sudo -u postgres psql -c "CREATE DATABASE punimtag OWNER punimtag;"
sudo -u postgres psql -c "CREATE DATABASE punimtag_auth OWNER punimtag;"
```
### Option 2: Different Servers (For Separation/High Availability)
Each database runs on its own PostgreSQL server:
```
Server 1: PostgreSQL (db-server-1:5432)
└── Database: punimtag
Server 2: PostgreSQL (db-server-2:5432)
└── Database: punimtag_auth
```
**Connection strings:**
```bash
# Main database
DATABASE_URL=postgresql+psycopg2://punimtag:password@db-server-1:5432/punimtag
# Auth database (different server)
DATABASE_URL_AUTH=postgresql+psycopg2://punimtag:password@db-server-2:5432/punimtag_auth
```
**Advantages:**
- ✅ Complete isolation and security
- ✅ Independent scaling
- ✅ Better fault tolerance
- ✅ Compliance/regulatory separation
**When to use:**
- High availability requirements
- Security/compliance needs
- Large scale deployments
- Multi-region setups
### Option 3: Same Server, Different PostgreSQL Instances (Rare)
Both instances on the same machine:
```
Server
├── PostgreSQL Instance 1 (port 5432)
│ └── Database: punimtag
└── PostgreSQL Instance 2 (port 5433)
└── Database: punimtag_auth
```
**When to use:**
- Special requirements
- Resource isolation needs
- Testing scenarios
### Deployment Options Summary
| Setup | Same Server? | Same PostgreSQL Instance? | Use Case |
|-------|--------------|---------------------------|----------|
| **Option 1** | ✅ Yes | ✅ Yes (different databases) | **Most common** - Standard production |
| **Option 2** | ❌ No | ❌ No | High availability, security separation |
| **Option 3** | ✅ Yes | ❌ No (different ports) | Rare - special requirements |
**Recommendation:** Start with **Option 1** (same server, same PostgreSQL instance, different databases). It's simpler, easier to manage, and sufficient for most deployments. You can move to separate servers later if needed.
The application supports all three options - just configure via `DATABASE_URL` and `DATABASE_URL_AUTH`.
---
## Key Differences Summary
| Feature | Main DB | Auth DB |
|---------|---------|---------|
| **Database Type** | SQLite (dev) / PostgreSQL (prod) | PostgreSQL (always) |
| **Purpose** | Photo metadata, faces, people, tags | User accounts, pending operations |
| **Read/Write** | Read-write | Read-write |
| **Foreign Keys** | Full FK support | Cross-database references validated in code |
| **User Table** | Backend users (username, email, role) | Frontend users (email, name, password_hash) |
| **Initialization** | Auto-created on startup | Requires manual setup or frontend script |
| **Required in Production** | Yes | Yes (for frontend website) |
---
## Connection Management
The application automatically detects which database type to use based on the `DATABASE_URL`:
### Main Database Connection
```python
def get_database_url() -> str:
"""Fetch database URL from environment or defaults."""
db_url = os.getenv("DATABASE_URL")
if db_url:
return db_url
# Default to SQLite for development
return "sqlite:///data/punimtag.db"
```
**Connection behavior:**
- **SQLite:** Uses `check_same_thread=False` for multi-threading
- **PostgreSQL:** Uses connection pooling (pool_size=10, max_overflow=20)
### Auth Database Connection
```python
def get_auth_database_url() -> str:
"""Fetch auth database URL from environment."""
db_url = os.getenv("DATABASE_URL_AUTH")
if not db_url:
raise ValueError("DATABASE_URL_AUTH environment variable not set")
return db_url
```
**Connection behavior:**
- **PostgreSQL only:** Uses connection pooling (pool_size=10, max_overflow=20)
- **Required:** Must be configured for frontend website functionality
### Connection Pool Settings
Both databases use connection pooling for PostgreSQL:
- `pool_size=10` - Base connection pool size
- `max_overflow=20` - Maximum overflow connections
- `pool_recycle=3600` - Recycle connections after 1 hour
- `pool_pre_ping=True` - Verify connections before use
---
## Decision Points
### For Development
1. **Main Database:**
- ✅ Use SQLite (`sqlite:///data/punimtag.db`)
- Simple, no setup required
- Fast for development
2. **Auth Database:**
- ✅ Use PostgreSQL (`postgresql+psycopg2://...@localhost:5432/punimtag_auth`)
- Required for frontend website features
- Can be on same PostgreSQL instance as production
### For Production
1. **Database Type:**
- ✅ Main DB: PostgreSQL (switch from SQLite)
- ✅ Auth DB: PostgreSQL (always required)
2. **Deployment Architecture:**
- **Recommended:** Option 1 (same server, same PostgreSQL instance, different databases)
- **Alternative:** Option 2 (different servers) if you need high availability or security separation
3. **Configuration:**
```bash
# Main database
DATABASE_URL=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag
# Auth database
DATABASE_URL_AUTH=postgresql+psycopg2://punimtag:password@localhost:5432/punimtag_auth
```
4. **Backup Strategy:**
- Both databases should be backed up
- If on same server, can backup together
- If on different servers, backup separately
5. **Security Considerations:**
- Use strong passwords
- Limit database user permissions
- Use SSL/TLS for remote connections
- Consider separate users for read-only access
---
## Next Steps
1. **Review this document** and decide on deployment architecture
2. **Choose database configuration:**
- Development: SQLite for main DB, PostgreSQL for auth DB
- Production: PostgreSQL for both
3. **Decide on deployment:**
- Same server, same PostgreSQL instance (recommended)
- Or separate servers if needed
4. **Set up databases** according to chosen architecture
5. **Configure `.env` file** with appropriate connection strings
6. **Test connections** and verify both databases work correctly
---
## References
- Main database models: `backend/db/models.py`
- Database session management: `backend/db/session.py`
- Auth database schema: `viewer-frontend/prisma/schema-auth.prisma`
- Main database schema: `viewer-frontend/prisma/schema.prisma`
- Deployment questions: `docs/CLIENT_DEPLOYMENT_QUESTIONS.md`
- Architecture documentation: `docs/ARCHITECTURE.md`
---
**Note:** This document should be reviewed and updated as deployment decisions are made and architecture evolves.