- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
62 lines
2.2 KiB
Markdown
62 lines
2.2 KiB
Markdown
# LLM Quick Reference Guide
|
|
|
|
## Model Recommendations
|
|
|
|
### Work Agent (RTX 4080, 16GB VRAM)
|
|
**Recommended**: **Llama 3.1 70B Q4** or **DeepSeek Coder 33B Q4**
|
|
- **Why**: Best coding/research capabilities, fits in 16GB
|
|
- **Context**: 8K-16K tokens
|
|
- **Cost**: ~$0.018-0.03/hour (~$1.08-1.80/month if 2hrs/day)
|
|
|
|
### Family Agent (RTX 1050, 4GB VRAM)
|
|
**Recommended**: **Phi-3 Mini 3.8B Q4** or **TinyLlama 1.1B Q4**
|
|
- **Why**: Fast, efficient, good instruction-following
|
|
- **Context**: 4K-8K tokens
|
|
- **Cost**: ~$0.006-0.01/hour (~$1.44-2.40/month always-on)
|
|
|
|
## Task → Model Mapping
|
|
|
|
| Task | Use This Model | Why |
|
|
|------|----------------|-----|
|
|
| Daily conversations | Family Agent (1050) | Fast, cheap, sufficient |
|
|
| Coding help | Work Agent (4080) | Needs capability |
|
|
| Research/analysis | Work Agent (4080) | Needs reasoning |
|
|
| Task management | Family Agent (1050) | Simple, fast |
|
|
| Weather queries | Family Agent (1050) | Simple tool calls |
|
|
| Summarization | Family Agent (1050) | Cheaper, sufficient |
|
|
| Complex summaries | Work Agent (4080) | Better quality if needed |
|
|
| Memory queries | Family Agent (1050) | Mostly embeddings |
|
|
|
|
## Cost Per Ticket (Monthly)
|
|
|
|
### Setup Tickets (One-time)
|
|
- TICKET-021 (4080 Server): $0 setup, ~$1.08-1.80/month ongoing
|
|
- TICKET-022 (1050 Server): $0 setup, ~$1.44-2.40/month ongoing
|
|
|
|
### Usage Tickets (Per Ticket)
|
|
- TICKET-025 (System Prompts): $0 (config only)
|
|
- TICKET-027 (Conversations): $0 (uses existing servers)
|
|
- TICKET-030 (MCP Integration): $0 (adapter code)
|
|
- TICKET-043 (Summarization): ~$0.003-0.012/month
|
|
- TICKET-042 (Memory): ~$0.01/month
|
|
|
|
### **Total: ~$2.53-4.22/month** for entire system
|
|
|
|
## Key Decisions
|
|
|
|
1. **Use local models** - 30-100x cheaper than cloud APIs
|
|
2. **Q4 quantization** - Best balance of quality/speed/cost
|
|
3. **Family Agent always-on** - Low power, efficient
|
|
4. **Work Agent on-demand** - Only run when needed
|
|
5. **Use Family Agent for summaries** - Saves money
|
|
|
|
## Cost Comparison
|
|
|
|
| Option | Monthly Cost | Privacy |
|
|
|--------|-------------|---------|
|
|
| **Local (Recommended)** | **~$2.50-4.20** | ✅ Full |
|
|
| OpenAI GPT-4 | ~$120-240 | ❌ Cloud |
|
|
| Anthropic Claude | ~$69-135 | ❌ Cloud |
|
|
|
|
**Local is 30-100x cheaper!**
|