# Ticket: LLM Capacity Assessment ## Ticket Information - **ID**: TICKET-018 - **Title**: LLM Capacity Assessment - **Type**: Research - **Priority**: High - **Status**: In Progress - **Track**: LLM Infra - **Milestone**: Milestone 1 - Survey & Architecture - **Created**: 2024-01-XX ## Description Determine maximum context and parameter size: - Assess 16GB VRAM capacity (13B-24B comfortable with quantization) - Determine max context window for 4080 - Assess 1050 capacity (smaller models, limited context) - Document memory requirements ## Acceptance Criteria - [x] VRAM capacity documented for 4080 - [x] VRAM capacity documented for 1050 - [x] Max context window determined - [x] Model size limits documented - [x] Memory requirements in architecture docs ## Technical Details Assessment should cover: - 4080: 16GB VRAM, Q4/Q5 quantization - 1050: 4GB VRAM, very small models - Context window: 4K, 8K, 16K, 32K options - Batch size and concurrency limits ## Dependencies - TICKET-017 (model survey) ## Related Files - `docs/LLM_CAPACITY.md` (to be created) - `ARCHITECTURE.md` ## Notes Critical for model selection. Should be done early. ## Progress Log - 2024-01-XX - Capacity assessment document created - 2024-01-XX - VRAM limits determined: - 4080: 70B Q4 fits comfortably (~14GB), max 8K context - 1050: 3.8B Q4 fits comfortably (~2.5GB), max 8K context - 2024-01-XX - Concurrency limits documented (2 requests for 4080, 1-2 for 1050)