- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
58 lines
1.4 KiB
Markdown
58 lines
1.4 KiB
Markdown
# Ticket: LLM Capacity Assessment
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-018
|
|
- **Title**: LLM Capacity Assessment
|
|
- **Type**: Research
|
|
- **Priority**: High
|
|
- **Status**: In Progress
|
|
- **Track**: LLM Infra
|
|
- **Milestone**: Milestone 1 - Survey & Architecture
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Determine maximum context and parameter size:
|
|
- Assess 16GB VRAM capacity (13B-24B comfortable with quantization)
|
|
- Determine max context window for 4080
|
|
- Assess 1050 capacity (smaller models, limited context)
|
|
- Document memory requirements
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] VRAM capacity documented for 4080
|
|
- [x] VRAM capacity documented for 1050
|
|
- [x] Max context window determined
|
|
- [x] Model size limits documented
|
|
- [x] Memory requirements in architecture docs
|
|
|
|
## Technical Details
|
|
|
|
Assessment should cover:
|
|
- 4080: 16GB VRAM, Q4/Q5 quantization
|
|
- 1050: 4GB VRAM, very small models
|
|
- Context window: 4K, 8K, 16K, 32K options
|
|
- Batch size and concurrency limits
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-017 (model survey)
|
|
|
|
## Related Files
|
|
|
|
- `docs/LLM_CAPACITY.md` (to be created)
|
|
- `ARCHITECTURE.md`
|
|
|
|
## Notes
|
|
|
|
Critical for model selection. Should be done early.
|
|
|
|
## Progress Log
|
|
|
|
- 2024-01-XX - Capacity assessment document created
|
|
- 2024-01-XX - VRAM limits determined:
|
|
- 4080: 70B Q4 fits comfortably (~14GB), max 8K context
|
|
- 1050: 3.8B Q4 fits comfortably (~2.5GB), max 8K context
|
|
- 2024-01-XX - Concurrency limits documented (2 requests for 4080, 1-2 for 1050)
|