- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
1.4 KiB
1.4 KiB
Ticket: LLM Capacity Assessment
Ticket Information
- ID: TICKET-018
- Title: LLM Capacity Assessment
- Type: Research
- Priority: High
- Status: In Progress
- Track: LLM Infra
- Milestone: Milestone 1 - Survey & Architecture
- Created: 2024-01-XX
Description
Determine maximum context and parameter size:
- Assess 16GB VRAM capacity (13B-24B comfortable with quantization)
- Determine max context window for 4080
- Assess 1050 capacity (smaller models, limited context)
- Document memory requirements
Acceptance Criteria
- VRAM capacity documented for 4080
- VRAM capacity documented for 1050
- Max context window determined
- Model size limits documented
- Memory requirements in architecture docs
Technical Details
Assessment should cover:
- 4080: 16GB VRAM, Q4/Q5 quantization
- 1050: 4GB VRAM, very small models
- Context window: 4K, 8K, 16K, 32K options
- Batch size and concurrency limits
Dependencies
- TICKET-017 (model survey)
Related Files
docs/LLM_CAPACITY.md(to be created)ARCHITECTURE.md
Notes
Critical for model selection. Should be done early.
Progress Log
- 2024-01-XX - Capacity assessment document created
- 2024-01-XX - VRAM limits determined:
- 4080: 70B Q4 fits comfortably (~14GB), max 8K context
- 1050: 3.8B Q4 fits comfortably (~2.5GB), max 8K context
- 2024-01-XX - Concurrency limits documented (2 requests for 4080, 1-2 for 1050)