Some checks failed
CI / backend-test (push) Successful in 4m9s
CI / frontend-test (push) Failing after 3m48s
CI / lint-python (push) Successful in 1m41s
CI / secret-scanning (push) Successful in 1m20s
CI / dependency-scan (push) Successful in 10m50s
CI / workflow-summary (push) Successful in 1m11s
## Features Added
### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference
### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames
### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges
### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation
### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets
### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory
### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
- `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
- `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
- `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`
## Technical Changes
### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution
### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design
## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py
## Breaking Changes
None - all changes are backward compatible
## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features
2.6 KiB
2.6 KiB
GPU VM Setup - Quick Reference
Quick Setup Steps
1. On GPU VM: Configure Ollama to Accept Remote Connections
# Create systemd override
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<EOF
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_KEEP_ALIVE=24h"
# Keep Ollama storage off root disk (recommended) and ensure the service user
# has a writable HOME (runners/keys/cache).
Environment="OLLAMA_MODELS=/mnt/data/ollama"
Environment="HOME=/mnt/data/ollama/home"
# IMPORTANT (GPU): on this VM, `OLLAMA_LLM_LIBRARY=cuda` caused Ollama to SKIP CUDA.
# Use the libdir selector instead.
Environment="OLLAMA_LLM_LIBRARY=cuda_v12"
# Ensure the dynamic linker can resolve Ollama's bundled CUDA + ggml libs.
Environment="LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12"
EOF
# Reload and restart
sudo systemctl daemon-reload
sudo systemctl restart ollama
# Verify
curl http://0.0.0.0:11434/api/tags
Verify GPU is actually used (on GPU VM):
ollama run qwen2:latest "Write 80 words about GPUs."
ollama ps
watch -n 0.2 nvidia-smi
2. On GPU VM: Configure Firewall
# Allow port 11434 (adjust IP/subnet as needed)
sudo ufw allow from YOUR_LOCAL_IP to any port 11434
# Or allow from entire subnet (less secure)
sudo ufw allow 11434/tcp
3. On GPU VM: Pull Required Models
ollama pull qwen2.5:7b
ollama pull llama3.1:8b
ollama pull qwen2.5:14b
ollama pull qwen2:latest
4. On Local Machine: Configure .env
USE_LOCAL_OLLAMA=false
OPENAI_COMPAT_BASE_URL=http://YOUR_GPU_VM_IP:11434
# Local (small) example:
# COUNCIL_MODELS=llama3.2:1b,qwen2.5:0.5b,gemma2:2b
# CHAIRMAN_MODEL=llama3.2:3b
# GPU (available models):
COUNCIL_MODELS=qwen2.5:7b,llama3.1:8b,qwen2:latest
CHAIRMAN_MODEL=qwen2.5:14b
5. Test Connection
# From local machine
curl http://YOUR_GPU_VM_IP:11434/api/tags
curl http://YOUR_GPU_VM_IP:11434/v1/models
Troubleshooting
- Connection timeout: Check Ollama is listening on
0.0.0.0:11434(not127.0.0.1) - Firewall blocking: Check
sudo ufw statusand allow port 11434 - CPU instead of GPU: Run
ollama psand confirm it doesn't say100% CPU. If it does:- Ensure
OLLAMA_LLM_LIBRARY=cuda_v12(notcuda) - Ensure
LD_LIBRARY_PATHincludes/usr/local/lib/ollamaand/usr/local/lib/ollama/cuda_v12 - Ensure
OLLAMA_MODELSandHOMEare on a disk with free space (root disk full can break runner/cache)
- Ensure
See DEPLOYMENT.md for detailed instructions.