# 4080 LLM Server (Work Agent) LLM server for work agent running on remote GPU VM. ## Server Information - **Host**: 10.0.30.63 - **Port**: 11434 - **Endpoint**: http://10.0.30.63:11434 - **Service**: Ollama ## Available Models The server has the following models available: - `deepseek-r1:70b` - 70B model (currently configured) - `deepseek-r1:671b` - 671B model - `llama3.1:8b` - Llama 3.1 8B - `qwen2.5:14b` - Qwen 2.5 14B - And others (see `test_connection.py`) ## Configuration Edit `config.py` to change the model: ```python MODEL_NAME = "deepseek-r1:70b" # or your preferred model ``` ## Testing Connection ```bash cd home-voice-agent/llm-servers/4080 python3 test_connection.py ``` This will: 1. Test server connectivity 2. List available models 3. Test chat endpoint with configured model ## API Usage ### List Models ```bash curl http://10.0.30.63:11434/api/tags ``` ### Chat Request ```bash curl http://10.0.30.63:11434/api/chat -d '{ "model": "deepseek-r1:70b", "messages": [ {"role": "user", "content": "Hello"} ], "stream": false }' ``` ### With Function Calling ```bash curl http://10.0.30.63:11434/api/chat -d '{ "model": "deepseek-r1:70b", "messages": [ {"role": "user", "content": "What is the weather in San Francisco?"} ], "tools": [...], "stream": false }' ``` ## Integration The MCP adapter can connect to this server by setting: ```python OLLAMA_BASE_URL = "http://10.0.30.63:11434" ``` ## Notes - The server is already running on the GPU VM - No local installation needed - just configure the endpoint - Model selection can be changed in `config.py` - If you need `llama3.1:70b-q4_0`, pull it on the server: ```bash # On the GPU VM ollama pull llama3.1:70b-q4_0 ```