atlas/docs/LLM_QUICK_REFERENCE.md

# LLM Quick Reference Guide

## Model Recommendations

### Work Agent (RTX 4080, 16GB VRAM)
**Recommended**: **Llama 3.1 70B Q4** or **DeepSeek Coder 33B Q4**
- **Why**: Best coding/research capabilities, fits in 16GB
- **Context**: 8K-16K tokens
- **Cost**: ~$0.018-0.03/hour (~$1.08-1.80/month if 2hrs/day)

### Family Agent (RTX 1050, 4GB VRAM)
**Recommended**: **Phi-3 Mini 3.8B Q4** or **TinyLlama 1.1B Q4**
- **Why**: Fast, efficient, good instruction-following
- **Context**: 4K-8K tokens
- **Cost**: ~$0.006-0.01/hour (~$1.44-2.40/month always-on)

## Task → Model Mapping

| Task | Use This Model | Why |
|------|----------------|-----|
| Daily conversations | Family Agent (1050) | Fast, cheap, sufficient |
| Coding help | Work Agent (4080) | Needs capability |
| Research/analysis | Work Agent (4080) | Needs reasoning |
| Task management | Family Agent (1050) | Simple, fast |
| Weather queries | Family Agent (1050) | Simple tool calls |
| Summarization | Family Agent (1050) | Cheaper, sufficient |
| Complex summaries | Work Agent (4080) | Better quality if needed |
| Memory queries | Family Agent (1050) | Mostly embeddings |

## Cost Per Ticket (Monthly)

### Setup Tickets (One-time)
- TICKET-021 (4080 Server): $0 setup, ~$1.08-1.80/month ongoing
- TICKET-022 (1050 Server): $0 setup, ~$1.44-2.40/month ongoing

### Usage Tickets (Per Ticket)
- TICKET-025 (System Prompts): $0 (config only)
- TICKET-027 (Conversations): $0 (uses existing servers)
- TICKET-030 (MCP Integration): $0 (adapter code)
- TICKET-043 (Summarization): ~$0.003-0.012/month
- TICKET-042 (Memory): ~$0.01/month

### **Total: ~$2.53-4.22/month** for entire system

## Key Decisions

1. **Use local models** - 30-100x cheaper than cloud APIs
2. **Q4 quantization** - Best balance of quality/speed/cost
3. **Family Agent always-on** - Low power, efficient
4. **Work Agent on-demand** - Only run when needed
5. **Use Family Agent for summaries** - Saves money

## Cost Comparison

| Option | Monthly Cost | Privacy |
|--------|-------------|---------|
| **Local (Recommended)** | **~$2.50-4.20** | ✅ Full |
| OpenAI GPT-4 | ~$120-240 | ❌ Cloud |
| Anthropic Claude | ~$69-135 | ❌ Cloud |

**Local is 30-100x cheaper!**