- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2.2 KiB
2.2 KiB
LLM Quick Reference Guide
Model Recommendations
Work Agent (RTX 4080, 16GB VRAM)
Recommended: Llama 3.1 70B Q4 or DeepSeek Coder 33B Q4
- Why: Best coding/research capabilities, fits in 16GB
- Context: 8K-16K tokens
- Cost:
$0.018-0.03/hour ($1.08-1.80/month if 2hrs/day)
Family Agent (RTX 1050, 4GB VRAM)
Recommended: Phi-3 Mini 3.8B Q4 or TinyLlama 1.1B Q4
- Why: Fast, efficient, good instruction-following
- Context: 4K-8K tokens
- Cost:
$0.006-0.01/hour ($1.44-2.40/month always-on)
Task → Model Mapping
| Task | Use This Model | Why |
|---|---|---|
| Daily conversations | Family Agent (1050) | Fast, cheap, sufficient |
| Coding help | Work Agent (4080) | Needs capability |
| Research/analysis | Work Agent (4080) | Needs reasoning |
| Task management | Family Agent (1050) | Simple, fast |
| Weather queries | Family Agent (1050) | Simple tool calls |
| Summarization | Family Agent (1050) | Cheaper, sufficient |
| Complex summaries | Work Agent (4080) | Better quality if needed |
| Memory queries | Family Agent (1050) | Mostly embeddings |
Cost Per Ticket (Monthly)
Setup Tickets (One-time)
- TICKET-021 (4080 Server): $0 setup, ~$1.08-1.80/month ongoing
- TICKET-022 (1050 Server): $0 setup, ~$1.44-2.40/month ongoing
Usage Tickets (Per Ticket)
- TICKET-025 (System Prompts): $0 (config only)
- TICKET-027 (Conversations): $0 (uses existing servers)
- TICKET-030 (MCP Integration): $0 (adapter code)
- TICKET-043 (Summarization): ~$0.003-0.012/month
- TICKET-042 (Memory): ~$0.01/month
Total: ~$2.53-4.22/month for entire system
Key Decisions
- Use local models - 30-100x cheaper than cloud APIs
- Q4 quantization - Best balance of quality/speed/cost
- Family Agent always-on - Low power, efficient
- Work Agent on-demand - Only run when needed
- Use Family Agent for summaries - Saves money
Cost Comparison
| Option | Monthly Cost | Privacy |
|---|---|---|
| Local (Recommended) | ~$2.50-4.20 | ✅ Full |
| OpenAI GPT-4 | ~$120-240 | ❌ Cloud |
| Anthropic Claude | ~$69-135 | ❌ Cloud |
Local is 30-100x cheaper!