- Enhanced `ARCHITECTURE.md` with details on LLM models for work (Llama 3.1 70B Q4) and family agents (Phi-3 Mini 3.8B Q4). - Introduced new documents: - `ASR_EVALUATION.md` for ASR engine evaluation and selection. - `HARDWARE.md` outlining hardware requirements and purchase plans. - `IMPLEMENTATION_GUIDE.md` for Milestone 2 implementation steps. - `LLM_CAPACITY.md` assessing VRAM and context window limits. - `LLM_MODEL_SURVEY.md` surveying open-weight LLM models. - `LLM_USAGE_AND_COSTS.md` detailing LLM usage and operational costs. - `MCP_ARCHITECTURE.md` describing the Model Context Protocol architecture. - `MCP_IMPLEMENTATION_SUMMARY.md` summarizing MCP implementation status. These updates provide comprehensive guidance for the next phases of development and ensure clarity in project documentation.
45 lines
790 B
Markdown
45 lines
790 B
Markdown
# 1050 LLM Server (Family Agent)
|
|
|
|
LLM server for family agent running Phi-3 Mini 3.8B Q4 on RTX 1050.
|
|
|
|
## Setup
|
|
|
|
### Using Ollama (Recommended)
|
|
|
|
```bash
|
|
# Install Ollama
|
|
curl -fsSL https://ollama.com/install.sh | sh
|
|
|
|
# Download model
|
|
ollama pull phi3:mini-q4_0
|
|
|
|
# Start server
|
|
ollama serve --host 0.0.0.0
|
|
# Runs on http://<1050-ip>:11434
|
|
```
|
|
|
|
## Configuration
|
|
|
|
- **Model**: Phi-3 Mini 3.8B Q4
|
|
- **Context Window**: 8K tokens (practical limit)
|
|
- **VRAM Usage**: ~2.5GB
|
|
- **Concurrency**: 1-2 requests max
|
|
|
|
## API
|
|
|
|
Ollama uses OpenAI-compatible API:
|
|
|
|
```bash
|
|
curl http://<1050-ip>:11434/api/chat -d '{
|
|
"model": "phi3:mini-q4_0",
|
|
"messages": [
|
|
{"role": "user", "content": "Hello"}
|
|
],
|
|
"stream": false
|
|
}'
|
|
```
|
|
|
|
## Systemd Service
|
|
|
|
See `ollama-1050.service` for systemd configuration.
|