atlas/docs/LLM_QUICK_REFERENCE.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

2.2 KiB

LLM Quick Reference Guide

Model Recommendations

Work Agent (RTX 4080, 16GB VRAM)

Recommended: Llama 3.1 70B Q4 or DeepSeek Coder 33B Q4

  • Why: Best coding/research capabilities, fits in 16GB
  • Context: 8K-16K tokens
  • Cost: $0.018-0.03/hour ($1.08-1.80/month if 2hrs/day)

Family Agent (RTX 1050, 4GB VRAM)

Recommended: Phi-3 Mini 3.8B Q4 or TinyLlama 1.1B Q4

  • Why: Fast, efficient, good instruction-following
  • Context: 4K-8K tokens
  • Cost: $0.006-0.01/hour ($1.44-2.40/month always-on)

Task → Model Mapping

Task Use This Model Why
Daily conversations Family Agent (1050) Fast, cheap, sufficient
Coding help Work Agent (4080) Needs capability
Research/analysis Work Agent (4080) Needs reasoning
Task management Family Agent (1050) Simple, fast
Weather queries Family Agent (1050) Simple tool calls
Summarization Family Agent (1050) Cheaper, sufficient
Complex summaries Work Agent (4080) Better quality if needed
Memory queries Family Agent (1050) Mostly embeddings

Cost Per Ticket (Monthly)

Setup Tickets (One-time)

  • TICKET-021 (4080 Server): $0 setup, ~$1.08-1.80/month ongoing
  • TICKET-022 (1050 Server): $0 setup, ~$1.44-2.40/month ongoing

Usage Tickets (Per Ticket)

  • TICKET-025 (System Prompts): $0 (config only)
  • TICKET-027 (Conversations): $0 (uses existing servers)
  • TICKET-030 (MCP Integration): $0 (adapter code)
  • TICKET-043 (Summarization): ~$0.003-0.012/month
  • TICKET-042 (Memory): ~$0.01/month

Total: ~$2.53-4.22/month for entire system

Key Decisions

  1. Use local models - 30-100x cheaper than cloud APIs
  2. Q4 quantization - Best balance of quality/speed/cost
  3. Family Agent always-on - Low power, efficient
  4. Work Agent on-demand - Only run when needed
  5. Use Family Agent for summaries - Saves money

Cost Comparison

Option Monthly Cost Privacy
Local (Recommended) ~$2.50-4.20 Full
OpenAI GPT-4 ~$120-240 Cloud
Anthropic Claude ~$69-135 Cloud

Local is 30-100x cheaper!