atlas/docs/LLM_QUICK_REFERENCE.md
ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

62 lines
2.2 KiB
Markdown

# LLM Quick Reference Guide
## Model Recommendations
### Work Agent (RTX 4080, 16GB VRAM)
**Recommended**: **Llama 3.1 70B Q4** or **DeepSeek Coder 33B Q4**
- **Why**: Best coding/research capabilities, fits in 16GB
- **Context**: 8K-16K tokens
- **Cost**: ~$0.018-0.03/hour (~$1.08-1.80/month if 2hrs/day)
### Family Agent (RTX 1050, 4GB VRAM)
**Recommended**: **Phi-3 Mini 3.8B Q4** or **TinyLlama 1.1B Q4**
- **Why**: Fast, efficient, good instruction-following
- **Context**: 4K-8K tokens
- **Cost**: ~$0.006-0.01/hour (~$1.44-2.40/month always-on)
## Task → Model Mapping
| Task | Use This Model | Why |
|------|----------------|-----|
| Daily conversations | Family Agent (1050) | Fast, cheap, sufficient |
| Coding help | Work Agent (4080) | Needs capability |
| Research/analysis | Work Agent (4080) | Needs reasoning |
| Task management | Family Agent (1050) | Simple, fast |
| Weather queries | Family Agent (1050) | Simple tool calls |
| Summarization | Family Agent (1050) | Cheaper, sufficient |
| Complex summaries | Work Agent (4080) | Better quality if needed |
| Memory queries | Family Agent (1050) | Mostly embeddings |
## Cost Per Ticket (Monthly)
### Setup Tickets (One-time)
- TICKET-021 (4080 Server): $0 setup, ~$1.08-1.80/month ongoing
- TICKET-022 (1050 Server): $0 setup, ~$1.44-2.40/month ongoing
### Usage Tickets (Per Ticket)
- TICKET-025 (System Prompts): $0 (config only)
- TICKET-027 (Conversations): $0 (uses existing servers)
- TICKET-030 (MCP Integration): $0 (adapter code)
- TICKET-043 (Summarization): ~$0.003-0.012/month
- TICKET-042 (Memory): ~$0.01/month
### **Total: ~$2.53-4.22/month** for entire system
## Key Decisions
1. **Use local models** - 30-100x cheaper than cloud APIs
2. **Q4 quantization** - Best balance of quality/speed/cost
3. **Family Agent always-on** - Low power, efficient
4. **Work Agent on-demand** - Only run when needed
5. **Use Family Agent for summaries** - Saves money
## Cost Comparison
| Option | Monthly Cost | Privacy |
|--------|-------------|---------|
| **Local (Recommended)** | **~$2.50-4.20** | ✅ Full |
| OpenAI GPT-4 | ~$120-240 | ❌ Cloud |
| Anthropic Claude | ~$69-135 | ❌ Cloud |
**Local is 30-100x cheaper!**