ilia/atlas

ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP

- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.

2026-01-05 23:44:16 -05:00

2.2 KiB

Raw Blame History

LLM Quick Reference Guide

Model Recommendations

Work Agent (RTX 4080, 16GB VRAM)

Recommended: Llama 3.1 70B Q4 or DeepSeek Coder 33B Q4

Why: Best coding/research capabilities, fits in 16GB
Context: 8K-16K tokens
Cost: ~~$0.018-0.03/hour (~~$1.08-1.80/month if 2hrs/day)

Family Agent (RTX 1050, 4GB VRAM)

Recommended: Phi-3 Mini 3.8B Q4 or TinyLlama 1.1B Q4

Why: Fast, efficient, good instruction-following
Context: 4K-8K tokens
Cost: ~~$0.006-0.01/hour (~~$1.44-2.40/month always-on)

Task → Model Mapping

Task	Use This Model	Why
Daily conversations	Family Agent (1050)	Fast, cheap, sufficient
Coding help	Work Agent (4080)	Needs capability
Research/analysis	Work Agent (4080)	Needs reasoning
Task management	Family Agent (1050)	Simple, fast
Weather queries	Family Agent (1050)	Simple tool calls
Summarization	Family Agent (1050)	Cheaper, sufficient
Complex summaries	Work Agent (4080)	Better quality if needed
Memory queries	Family Agent (1050)	Mostly embeddings

Cost Per Ticket (Monthly)

Setup Tickets (One-time)

TICKET-021 (4080 Server): $0 setup, ~$1.08-1.80/month ongoing
TICKET-022 (1050 Server): $0 setup, ~$1.44-2.40/month ongoing

Usage Tickets (Per Ticket)

TICKET-025 (System Prompts): $0 (config only)
TICKET-027 (Conversations): $0 (uses existing servers)
TICKET-030 (MCP Integration): $0 (adapter code)
TICKET-043 (Summarization): ~$0.003-0.012/month
TICKET-042 (Memory): ~$0.01/month

Total: ~$2.53-4.22/month for entire system

Key Decisions

Use local models - 30-100x cheaper than cloud APIs
Q4 quantization - Best balance of quality/speed/cost
Family Agent always-on - Low power, efficient
Work Agent on-demand - Only run when needed
Use Family Agent for summaries - Saves money

Cost Comparison

Option	Monthly Cost	Privacy
Local (Recommended)	~$2.50-4.20	✅ Full
OpenAI GPT-4	~$120-240	❌ Cloud
Anthropic Claude	~$69-135	❌ Cloud

Local is 30-100x cheaper!

2.2 KiB Raw Blame History

LLM Quick Reference Guide

Model Recommendations

Work Agent (RTX 4080, 16GB VRAM)

Family Agent (RTX 1050, 4GB VRAM)

Task → Model Mapping

Cost Per Ticket (Monthly)

Setup Tickets (One-time)

Usage Tickets (Per Ticket)

Total: ~$2.53-4.22/month for entire system

Key Decisions

Cost Comparison

2.2 KiB

Raw Blame History