- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
57 lines
1.4 KiB
Markdown
57 lines
1.4 KiB
Markdown
# Ticket: Select Work Agent Model (4080)
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-019
|
|
- **Title**: Select Work Agent Model for 4080
|
|
- **Type**: Research
|
|
- **Priority**: High
|
|
- **Status**: Done
|
|
- **Track**: LLM Infra
|
|
- **Milestone**: Milestone 1 - Survey & Architecture
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Select the LLM model for work agent on 4080:
|
|
- Coding/research-optimized model
|
|
- Not used by family agent
|
|
- Suitable for 16GB VRAM with quantization
|
|
- Good function-calling support
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] Work agent model selected: **Llama 3.1 70B Q4**
|
|
- [x] Quantization level chosen: **Q4 (4-bit)**
|
|
- [x] Rationale documented (see `docs/MODEL_SELECTION.md`)
|
|
- [x] Model file location specified
|
|
- [x] Performance characteristics documented
|
|
|
|
## Technical Details
|
|
|
|
Selection criteria:
|
|
- Coding capabilities (CodeLlama, DeepSeek Coder, etc.)
|
|
- Research/analysis capabilities
|
|
- Function calling support
|
|
- Context window size
|
|
- Quantization: Q4-Q6 for 16GB VRAM
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-017 (model survey)
|
|
- TICKET-018 (capacity assessment)
|
|
|
|
## Related Files
|
|
|
|
- `docs/MODEL_SELECTION.md` (to be created)
|
|
|
|
## Notes
|
|
|
|
Separate from family agent model. Can be selected independently.
|
|
|
|
## Progress Log
|
|
|
|
- 2024-01-XX - Model selected: Llama 3.1 70B Q4
|
|
- 2024-01-XX - Rationale documented in `docs/MODEL_SELECTION.md`
|
|
- 2024-01-XX - Based on TICKET-017 (survey) and TICKET-018 (capacity assessment)
|