ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00
..

1050 LLM Server (Family Agent)

LLM server for family agent running Phi-3 Mini 3.8B Q4 on RTX 1050.

Setup

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Download model
ollama pull phi3:mini-q4_0

# Start server
ollama serve --host 0.0.0.0
# Runs on http://<1050-ip>:11434

Configuration

  • Model: Phi-3 Mini 3.8B Q4
  • Context Window: 8K tokens (practical limit)
  • VRAM Usage: ~2.5GB
  • Concurrency: 1-2 requests max

API

Ollama uses OpenAI-compatible API:

curl http://<1050-ip>:11434/api/chat -d '{
  "model": "phi3:mini-q4_0",
  "messages": [
    {"role": "user", "content": "Hello"}
  ],
  "stream": false
}'

Systemd Service

See ollama-1050.service for systemd configuration.