ilia/llm_council

Fork 0

Irina Levit 3546c04348

CI / backend-test (push) Successful in 4m9s

Details

CI / frontend-test (push) Failing after 3m48s

Details

CI / lint-python (push) Successful in 1m41s

Details

CI / secret-scanning (push) Successful in 1m20s

Details

CI / dependency-scan (push) Successful in 10m50s

Details

CI / workflow-summary (push) Successful in 1m11s

Details

feat: Major UI/UX improvements and production readiness

## Features Added

### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference

### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames

### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges

### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation

### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets

### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory

### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
  - `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
  - `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
  - `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`

## Technical Changes

### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution

### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design

## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py

## Breaking Changes
None - all changes are backward compatible

## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features

2025-12-28 18:15:02 -05:00

6.3 KiB

Raw Permalink Blame History

Deployment Guide

Overview

LLM Council can be deployed in several configurations depending on your needs:

Local Development: Everything runs on your local machine
Hybrid: Frontend/Backend local, LLM server on remote GPU VM
Full Remote: Everything on a server/VM
Production: Professional deployment with proper infrastructure

Architecture Options

Option 1: Hybrid (Recommended for Development)

Setup:

Frontend + Backend: Run on your local machine
LLM Server (Ollama): Run on remote GPU VM

Pros:

Easy development and debugging
GPU resources available remotely
No need to deploy frontend/backend code
Fast iteration

Cons:

Requires network connectivity to GPU VM
Latency for LLM requests

Configuration:

# .env on local machine
USE_LOCAL_OLLAMA=false
OPENAI_COMPAT_BASE_URL=http://your-gpu-vm-ip:11434

Option 2: Full Remote Deployment

Setup:

Everything runs on the GPU VM or dedicated server

Pros:

Centralized deployment
Can be accessed from multiple machines
Better for team use

Cons:

More complex setup
Requires proper security configuration
Slower development iteration

Option 3: Production Deployment (Professional)

Recommended Stack:

Frontend: Serve static build via nginx/CDN
Backend: Run via systemd/gunicorn/uvicorn with reverse proxy
LLM Server: Separate service on GPU VM
Security: TLS/HTTPS, authentication, rate limiting

GPU VM Setup

Prerequisites

GPU VM with:
- NVIDIA GPU with CUDA support
- Sufficient VRAM for your models
- Network access from your local machine

Ollama installed on GPU VM:

curl -fsSL https://ollama.ai/install.sh | sh

Step 1: Configure Ollama to Accept Remote Connections

On GPU VM:

# Option A: Environment variable (temporary)
export OLLAMA_HOST=0.0.0.0:11434

# Option B: Systemd service (persistent - recommended)
sudo systemctl edit ollama

Add to the override file:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_KEEP_ALIVE=24h"
Environment="OLLAMA_MAX_LOADED_MODELS=3"

Then restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

Step 2: Configure Firewall

On GPU VM:

# Allow port 11434 from your local network
sudo ufw allow from YOUR_LOCAL_IP to any port 11434
# Or allow from entire subnet (less secure)
sudo ufw allow 11434/tcp

Step 3: Pull Required Models

On GPU VM:

ollama pull qwen2.5:7b
ollama pull llama3.1:8b
ollama pull qwen2.5:14b
ollama pull qwen2:latest

Step 3.5 (GPU VM): Ensure Ollama Uses GPU + Stores Data on /mnt/data

If your VM has a small root disk, keep Ollama's storage and HOME off / (common cause of weird failures). Also note that on this setup, OLLAMA_LLM_LIBRARY=cuda caused Ollama to skip CUDA libraries; use cuda_v12.

On GPU VM:

sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf >/dev/null <<'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_KEEP_ALIVE=24h"
Environment="OLLAMA_MODELS=/mnt/data/ollama"
Environment="HOME=/mnt/data/ollama/home"
Environment="OLLAMA_LLM_LIBRARY=cuda_v12"
Environment="LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12"
EOF

sudo systemctl daemon-reload
sudo systemctl restart ollama

Verify GPU offload (on GPU VM):

ollama run qwen2:latest "Write 80 words about GPUs."
ollama ps

Step 4: Verify Remote Access

From local machine:

curl http://YOUR_GPU_VM_IP:11434/api/tags
# Should return list of available models
curl http://YOUR_GPU_VM_IP:11434/v1/models

Step 5: Configure LLM Council

On local machine .env:

USE_LOCAL_OLLAMA=false
OPENAI_COMPAT_BASE_URL=http://YOUR_GPU_VM_IP:11434
# Local (small) example:
# COUNCIL_MODELS=llama3.2:1b,qwen2.5:0.5b,gemma2:2b
# CHAIRMAN_MODEL=llama3.2:3b

# GPU (available models):
COUNCIL_MODELS=qwen2.5:7b,llama3.1:8b,qwen2:latest
CHAIRMAN_MODEL=qwen2.5:14b

Security Considerations

For Development/Internal Use

Network Security:
- Use VPN or private network
- Restrict firewall to specific IPs
- Consider SSH tunnel for extra security
Ollama Security:
- Ollama has no built-in authentication
- Only expose on trusted networks
- Consider reverse proxy with auth (nginx + basic auth)

For Production

Authentication:
- Add API key authentication to backend
- Use session-based auth for frontend
- Implement rate limiting
Network Security:
- Use HTTPS/TLS everywhere
- Set up proper firewall rules
- Consider using a reverse proxy (nginx/traefik)
Infrastructure:
- Use container orchestration (Docker Compose/Kubernetes)
- Set up monitoring and logging
- Implement backup strategy for conversations

Deployment Scripts

Quick Start (Local + Remote Ollama)

# 1. Start Ollama on GPU VM (already running if systemd configured)
# 2. On local machine:
./start.sh

Full Remote Deployment

See docs/DEPLOYMENT_FULL.md for complete remote deployment instructions.

Troubleshooting

Connection Timeouts

Check Ollama is listening on all interfaces:

# On GPU VM
sudo netstat -tlnp | grep 11434
# Should show 0.0.0.0:11434, not 127.0.0.1:11434

Check firewall rules:
```
# On GPU VM
sudo ufw status
```

Test connectivity:

# From local machine
curl -v http://GPU_VM_IP:11434/api/tags

Model Loading Issues

Check available VRAM:
```
nvidia-smi
```
Adjust OLLAMA_MAX_LOADED_MODELS if needed
Check model sizes vs available memory

Performance Tuning

Ollama Settings

# On GPU VM, edit systemd override:
Environment="OLLAMA_KEEP_ALIVE=24h"        # Keep models loaded
Environment="OLLAMA_MAX_LOADED_MODELS=3"   # Max concurrent models
Environment="OLLAMA_NUM_PARALLEL=1"        # Parallel requests

LLM Council Timeouts

Adjust in .env:

LLM_TIMEOUT_SECONDS=600.0              # For slow models
CHAIRMAN_TIMEOUT_SECONDS=600.0
OPENAI_COMPAT_TIMEOUT_SECONDS=600.0
OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS=30.0
OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS=30.0
OPENAI_COMPAT_POOL_TIMEOUT_SECONDS=30.0

6.3 KiB Raw Permalink Blame History

Deployment Guide

Overview

Architecture Options

Option 1: Hybrid (Recommended for Development)

Option 2: Full Remote Deployment

Option 3: Production Deployment (Professional)

GPU VM Setup

Prerequisites

Step 1: Configure Ollama to Accept Remote Connections

Step 2: Configure Firewall

Step 3: Pull Required Models

Step 3.5 (GPU VM): Ensure Ollama Uses GPU + Stores Data on /mnt/data

Step 4: Verify Remote Access

Step 5: Configure LLM Council

Security Considerations

For Development/Internal Use

For Production

Deployment Scripts

Quick Start (Local + Remote Ollama)

Full Remote Deployment

Troubleshooting

Connection Timeouts

Model Loading Issues

Performance Tuning

Ollama Settings

LLM Council Timeouts

6.3 KiB

Raw Permalink Blame History