- Added .cursorrules for project guidelines and context - Created README.md for project overview and goals - Established ARCHITECTURE.md for architectural documentation - Set up tickets directory with initial ticket management files - Included .gitignore to manage ignored files and directories This commit lays the foundation for the Atlas project, ensuring a clear structure for development and collaboration.
50 lines
1.1 KiB
Markdown
50 lines
1.1 KiB
Markdown
# Ticket: LLM Capacity Assessment
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-018
|
|
- **Title**: LLM Capacity Assessment
|
|
- **Type**: Research
|
|
- **Priority**: High
|
|
- **Status**: Backlog
|
|
- **Track**: LLM Infra
|
|
- **Milestone**: Milestone 1 - Survey & Architecture
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Determine maximum context and parameter size:
|
|
- Assess 16GB VRAM capacity (13B-24B comfortable with quantization)
|
|
- Determine max context window for 4080
|
|
- Assess 1050 capacity (smaller models, limited context)
|
|
- Document memory requirements
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] VRAM capacity documented for 4080
|
|
- [ ] VRAM capacity documented for 1050
|
|
- [ ] Max context window determined
|
|
- [ ] Model size limits documented
|
|
- [ ] Memory requirements in architecture docs
|
|
|
|
## Technical Details
|
|
|
|
Assessment should cover:
|
|
- 4080: 16GB VRAM, Q4/Q5 quantization
|
|
- 1050: 4GB VRAM, very small models
|
|
- Context window: 4K, 8K, 16K, 32K options
|
|
- Batch size and concurrency limits
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-017 (model survey)
|
|
|
|
## Related Files
|
|
|
|
- `docs/LLM_CAPACITY.md` (to be created)
|
|
- `ARCHITECTURE.md`
|
|
|
|
## Notes
|
|
|
|
Critical for model selection. Should be done early.
|