- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
1050 LLM Server (Family Agent)
LLM server for family agent running Phi-3 Mini 3.8B Q4 on RTX 1050.
Setup
Using Ollama (Recommended)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Download model
ollama pull phi3:mini-q4_0
# Start server
ollama serve --host 0.0.0.0
# Runs on http://<1050-ip>:11434
Configuration
- Model: Phi-3 Mini 3.8B Q4
- Context Window: 8K tokens (practical limit)
- VRAM Usage: ~2.5GB
- Concurrency: 1-2 requests max
API
Ollama uses OpenAI-compatible API:
curl http://<1050-ip>:11434/api/chat -d '{
"model": "phi3:mini-q4_0",
"messages": [
{"role": "user", "content": "Hello"}
],
"stream": false
}'
Systemd Service
See ollama-1050.service for systemd configuration.