atlas/tickets/done/TICKET-018_llm-capacity-assessment.md

# Ticket: LLM Capacity Assessment

## Ticket Information

- **ID**: TICKET-018
- **Title**: LLM Capacity Assessment
- **Type**: Research
- **Priority**: High
- **Status**: In Progress
- **Track**: LLM Infra
- **Milestone**: Milestone 1 - Survey & Architecture
- **Created**: 2024-01-XX

## Description

Determine maximum context and parameter size:
- Assess 16GB VRAM capacity (13B-24B comfortable with quantization)
- Determine max context window for 4080
- Assess 1050 capacity (smaller models, limited context)
- Document memory requirements

## Acceptance Criteria

- [x] VRAM capacity documented for 4080
- [x] VRAM capacity documented for 1050
- [x] Max context window determined
- [x] Model size limits documented
- [x] Memory requirements in architecture docs

## Technical Details

Assessment should cover:
- 4080: 16GB VRAM, Q4/Q5 quantization
- 1050: 4GB VRAM, very small models
- Context window: 4K, 8K, 16K, 32K options
- Batch size and concurrency limits

## Dependencies

- TICKET-017 (model survey)

## Related Files

- `docs/LLM_CAPACITY.md` (to be created)
- `ARCHITECTURE.md`

## Notes

Critical for model selection. Should be done early.

## Progress Log

- 2024-01-XX - Capacity assessment document created
- 2024-01-XX - VRAM limits determined:
  - 4080: 70B Q4 fits comfortably (~14GB), max 8K context
  - 1050: 3.8B Q4 fits comfortably (~2.5GB), max 8K context
- 2024-01-XX - Concurrency limits documented (2 requests for 4080, 1-2 for 1050)