- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
57 lines
1.6 KiB
Markdown
57 lines
1.6 KiB
Markdown
# Ticket: Survey Candidate Open-Weight Models
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-017
|
|
- **Title**: Survey Candidate Open-Weight Models
|
|
- **Type**: Research
|
|
- **Priority**: High
|
|
- **Status**: In Progress
|
|
- **Track**: LLM Infra
|
|
- **Milestone**: Milestone 1 - Survey & Architecture
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Survey and evaluate open-weight LLM models:
|
|
- 8-14B and 30B quantized options for RTX 4080 (Q4-Q6 variants)
|
|
- Small models for RTX 1050 (family agent)
|
|
- Evaluate coding/research capabilities for work agent
|
|
- Evaluate instruction-following for family agent
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] Model comparison matrix created
|
|
- [x] 4080 model candidates identified (70B quantized, 33B alternatives)
|
|
- [x] 1050 model candidates identified (3.8B, 1.5B, 1.1B options)
|
|
- [x] Evaluation criteria documented
|
|
- [x] Recommendations documented
|
|
|
|
## Technical Details
|
|
|
|
Models to evaluate:
|
|
- 4080: Llama 3 8B/70B, Mistral 7B, Qwen, etc.
|
|
- 1050: TinyLlama, Phi-2, smaller quantized models
|
|
- Quantization: Q4, Q5, Q6, Q8
|
|
- Function calling support required
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-004 (architecture) - helpful context
|
|
|
|
## Related Files
|
|
|
|
- `docs/LLM_MODEL_SURVEY.md` (to be created)
|
|
- `ARCHITECTURE.md`
|
|
|
|
## Notes
|
|
|
|
Can start in parallel with wake-word and clients. Depends on high-level architecture doc.
|
|
|
|
## Progress Log
|
|
|
|
- 2024-01-XX - Survey document created with comprehensive model analysis
|
|
- 2024-01-XX - Recommendations finalized:
|
|
- Work Agent (4080): Llama 3.1 70B Q4 (primary), DeepSeek Coder 33B Q4 (alternative)
|
|
- Family Agent (1050): Phi-3 Mini 3.8B Q4 (primary), Qwen2.5 1.5B Q4 (alternative)
|