- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
1.4 KiB
1.4 KiB
Ticket: Select Family Agent Model (1050)
Ticket Information
- ID: TICKET-020
- Title: Select Family Agent Model for 1050
- Type: Research
- Priority: High
- Status: Done
- Track: LLM Infra
- Milestone: Milestone 1 - Survey & Architecture
- Created: 2024-01-XX
Description
Select the LLM model for family agent on 1050:
- Small, instruction-tuned model
- Latency-optimized for 24/7 operation
- Suitable for 4GB VRAM
- Good instruction-following
Acceptance Criteria
- Family agent model selected: Phi-3 Mini 3.8B Q4
- Quantization level chosen: Q4 (4-bit)
- Rationale documented (see
docs/MODEL_SELECTION.md) - Model file location specified
- Latency characteristics documented
Technical Details
Selection criteria:
- Small model size (1B-3B parameters)
- Instruction-tuned
- Low latency (< 1s response time)
- Function calling support
- Quantization: Q4 or Q5 for 4GB VRAM
Dependencies
- TICKET-017 (model survey)
- TICKET-018 (capacity assessment)
Related Files
docs/MODEL_SELECTION.md(to be created)
Notes
Optimized for always-on, low-latency family interactions. Separate from work agent.
Progress Log
- 2024-01-XX - Model selected: Phi-3 Mini 3.8B Q4
- 2024-01-XX - Rationale documented in
docs/MODEL_SELECTION.md - 2024-01-XX - Based on TICKET-017 (survey) and TICKET-018 (capacity assessment)