atlas/tickets/done/TICKET-020_select-family-agent-model.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

1.4 KiB

Ticket: Select Family Agent Model (1050)

Ticket Information

  • ID: TICKET-020
  • Title: Select Family Agent Model for 1050
  • Type: Research
  • Priority: High
  • Status: Done
  • Track: LLM Infra
  • Milestone: Milestone 1 - Survey & Architecture
  • Created: 2024-01-XX

Description

Select the LLM model for family agent on 1050:

  • Small, instruction-tuned model
  • Latency-optimized for 24/7 operation
  • Suitable for 4GB VRAM
  • Good instruction-following

Acceptance Criteria

  • Family agent model selected: Phi-3 Mini 3.8B Q4
  • Quantization level chosen: Q4 (4-bit)
  • Rationale documented (see docs/MODEL_SELECTION.md)
  • Model file location specified
  • Latency characteristics documented

Technical Details

Selection criteria:

  • Small model size (1B-3B parameters)
  • Instruction-tuned
  • Low latency (< 1s response time)
  • Function calling support
  • Quantization: Q4 or Q5 for 4GB VRAM

Dependencies

  • TICKET-017 (model survey)
  • TICKET-018 (capacity assessment)
  • docs/MODEL_SELECTION.md (to be created)

Notes

Optimized for always-on, low-latency family interactions. Separate from work agent.

Progress Log

  • 2024-01-XX - Model selected: Phi-3 Mini 3.8B Q4
  • 2024-01-XX - Rationale documented in docs/MODEL_SELECTION.md
  • 2024-01-XX - Based on TICKET-017 (survey) and TICKET-018 (capacity assessment)