- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
57 lines
1.4 KiB
Markdown
57 lines
1.4 KiB
Markdown
# Ticket: Select Family Agent Model (1050)
|
|
|
|
## Ticket Information
|
|
|
|
- **ID**: TICKET-020
|
|
- **Title**: Select Family Agent Model for 1050
|
|
- **Type**: Research
|
|
- **Priority**: High
|
|
- **Status**: Done
|
|
- **Track**: LLM Infra
|
|
- **Milestone**: Milestone 1 - Survey & Architecture
|
|
- **Created**: 2024-01-XX
|
|
|
|
## Description
|
|
|
|
Select the LLM model for family agent on 1050:
|
|
- Small, instruction-tuned model
|
|
- Latency-optimized for 24/7 operation
|
|
- Suitable for 4GB VRAM
|
|
- Good instruction-following
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] Family agent model selected: **Phi-3 Mini 3.8B Q4**
|
|
- [x] Quantization level chosen: **Q4 (4-bit)**
|
|
- [x] Rationale documented (see `docs/MODEL_SELECTION.md`)
|
|
- [x] Model file location specified
|
|
- [x] Latency characteristics documented
|
|
|
|
## Technical Details
|
|
|
|
Selection criteria:
|
|
- Small model size (1B-3B parameters)
|
|
- Instruction-tuned
|
|
- Low latency (< 1s response time)
|
|
- Function calling support
|
|
- Quantization: Q4 or Q5 for 4GB VRAM
|
|
|
|
## Dependencies
|
|
|
|
- TICKET-017 (model survey)
|
|
- TICKET-018 (capacity assessment)
|
|
|
|
## Related Files
|
|
|
|
- `docs/MODEL_SELECTION.md` (to be created)
|
|
|
|
## Notes
|
|
|
|
Optimized for always-on, low-latency family interactions. Separate from work agent.
|
|
|
|
## Progress Log
|
|
|
|
- 2024-01-XX - Model selected: Phi-3 Mini 3.8B Q4
|
|
- 2024-01-XX - Rationale documented in `docs/MODEL_SELECTION.md`
|
|
- 2024-01-XX - Based on TICKET-017 (survey) and TICKET-018 (capacity assessment)
|