# LLM Quick Reference Guide ## Model Recommendations ### Work Agent (RTX 4080, 16GB VRAM) **Recommended**: **Llama 3.1 70B Q4** or **DeepSeek Coder 33B Q4** - **Why**: Best coding/research capabilities, fits in 16GB - **Context**: 8K-16K tokens - **Cost**: ~$0.018-0.03/hour (~$1.08-1.80/month if 2hrs/day) ### Family Agent (RTX 1050, 4GB VRAM) **Recommended**: **Phi-3 Mini 3.8B Q4** or **TinyLlama 1.1B Q4** - **Why**: Fast, efficient, good instruction-following - **Context**: 4K-8K tokens - **Cost**: ~$0.006-0.01/hour (~$1.44-2.40/month always-on) ## Task → Model Mapping | Task | Use This Model | Why | |------|----------------|-----| | Daily conversations | Family Agent (1050) | Fast, cheap, sufficient | | Coding help | Work Agent (4080) | Needs capability | | Research/analysis | Work Agent (4080) | Needs reasoning | | Task management | Family Agent (1050) | Simple, fast | | Weather queries | Family Agent (1050) | Simple tool calls | | Summarization | Family Agent (1050) | Cheaper, sufficient | | Complex summaries | Work Agent (4080) | Better quality if needed | | Memory queries | Family Agent (1050) | Mostly embeddings | ## Cost Per Ticket (Monthly) ### Setup Tickets (One-time) - TICKET-021 (4080 Server): $0 setup, ~$1.08-1.80/month ongoing - TICKET-022 (1050 Server): $0 setup, ~$1.44-2.40/month ongoing ### Usage Tickets (Per Ticket) - TICKET-025 (System Prompts): $0 (config only) - TICKET-027 (Conversations): $0 (uses existing servers) - TICKET-030 (MCP Integration): $0 (adapter code) - TICKET-043 (Summarization): ~$0.003-0.012/month - TICKET-042 (Memory): ~$0.01/month ### **Total: ~$2.53-4.22/month** for entire system ## Key Decisions 1. **Use local models** - 30-100x cheaper than cloud APIs 2. **Q4 quantization** - Best balance of quality/speed/cost 3. **Family Agent always-on** - Low power, efficient 4. **Work Agent on-demand** - Only run when needed 5. **Use Family Agent for summaries** - Saves money ## Cost Comparison | Option | Monthly Cost | Privacy | |--------|-------------|---------| | **Local (Recommended)** | **~$2.50-4.20** | ✅ Full | | OpenAI GPT-4 | ~$120-240 | ❌ Cloud | | Anthropic Claude | ~$69-135 | ❌ Cloud | **Local is 30-100x cheaper!**