atlas/tickets/done/TICKET-017_llm-model-survey.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

1.6 KiB

Ticket: Survey Candidate Open-Weight Models

Ticket Information

  • ID: TICKET-017
  • Title: Survey Candidate Open-Weight Models
  • Type: Research
  • Priority: High
  • Status: In Progress
  • Track: LLM Infra
  • Milestone: Milestone 1 - Survey & Architecture
  • Created: 2024-01-XX

Description

Survey and evaluate open-weight LLM models:

  • 8-14B and 30B quantized options for RTX 4080 (Q4-Q6 variants)
  • Small models for RTX 1050 (family agent)
  • Evaluate coding/research capabilities for work agent
  • Evaluate instruction-following for family agent

Acceptance Criteria

  • Model comparison matrix created
  • 4080 model candidates identified (70B quantized, 33B alternatives)
  • 1050 model candidates identified (3.8B, 1.5B, 1.1B options)
  • Evaluation criteria documented
  • Recommendations documented

Technical Details

Models to evaluate:

  • 4080: Llama 3 8B/70B, Mistral 7B, Qwen, etc.
  • 1050: TinyLlama, Phi-2, smaller quantized models
  • Quantization: Q4, Q5, Q6, Q8
  • Function calling support required

Dependencies

  • TICKET-004 (architecture) - helpful context
  • docs/LLM_MODEL_SURVEY.md (to be created)
  • ARCHITECTURE.md

Notes

Can start in parallel with wake-word and clients. Depends on high-level architecture doc.

Progress Log

  • 2024-01-XX - Survey document created with comprehensive model analysis
  • 2024-01-XX - Recommendations finalized:
    • Work Agent (4080): Llama 3.1 70B Q4 (primary), DeepSeek Coder 33B Q4 (alternative)
    • Family Agent (1050): Phi-3 Mini 3.8B Q4 (primary), Qwen2.5 1.5B Q4 (alternative)