ilia 4b9ffb5ddf docs: Update architecture and add new documentation for LLM and MCP
- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4).
- Introduced new documents:
  - `ASR_EVALUATION.md` for ASR engine evaluation and selection.
  - `HARDWARE.md` outlining hardware requirements and purchase plans.
  - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps.
  - `LLM_CAPACITY.md` assessing VRAM and context window limits.
  - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models.
  - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs.
  - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture.
  - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status.

These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
2026-01-05 23:44:16 -05:00

45 lines
790 B
Markdown

# 1050 LLM Server (Family Agent)
LLM server for family agent running Phi-3 Mini 3.8B Q4 on RTX 1050.
## Setup
### Using Ollama (Recommended)
```bash
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Download model
ollama pull phi3:mini-q4_0
# Start server
ollama serve --host 0.0.0.0
# Runs on http://<1050-ip>:11434
```
## Configuration
- **Model**: Phi-3 Mini 3.8B Q4
- **Context Window**: 8K tokens (practical limit)
- **VRAM Usage**: ~2.5GB
- **Concurrency**: 1-2 requests max
## API
Ollama uses OpenAI-compatible API:
```bash
curl http://<1050-ip>:11434/api/chat -d '{
"model": "phi3:mini-q4_0",
"messages": [
{"role": "user", "content": "Hello"}
],
"stream": false
}'
```
## Systemd Service
See `ollama-1050.service` for systemd configuration.