- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
790 B
790 B
1050 LLM Server (Family Agent)
LLM server for family agent running Phi-3 Mini 3.8B Q4 on RTX 1050.
Setup
Using Ollama (Recommended)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Download model
ollama pull phi3:mini-q4_0
# Start server
ollama serve --host 0.0.0.0
# Runs on http://<1050-ip>:11434
Configuration
- Model: Phi-3 Mini 3.8B Q4
- Context Window: 8K tokens (practical limit)
- VRAM Usage: ~2.5GB
- Concurrency: 1-2 requests max
API
Ollama uses OpenAI-compatible API:
curl http://<1050-ip>:11434/api/chat -d '{
"model": "phi3:mini-q4_0",
"messages": [
{"role": "user", "content": "Hello"}
],
"stream": false
}'
Systemd Service
See ollama-1050.service for systemd configuration.