Some checks failed
CI / backend-test (push) Successful in 4m9s
CI / frontend-test (push) Failing after 3m48s
CI / lint-python (push) Successful in 1m41s
CI / secret-scanning (push) Successful in 1m20s
CI / dependency-scan (push) Successful in 10m50s
CI / workflow-summary (push) Successful in 1m11s
## Features Added
### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference
### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames
### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges
### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation
### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets
### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory
### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
- `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
- `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
- `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`
## Technical Changes
### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution
### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design
## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py
## Breaking Changes
None - all changes are backward compatible
## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features
179 lines
5.1 KiB
Markdown
179 lines
5.1 KiB
Markdown
# Professional Deployment Recommendations
|
|
|
|
## Recommended Architecture
|
|
|
|
### For Development/Personal Use
|
|
|
|
**Hybrid Approach (Recommended):**
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐
|
|
│ Local Machine │ │ GPU VM │
|
|
│ │ │ │
|
|
│ Frontend │ │ Ollama Server │
|
|
│ (React/Vite) │ │ (LLM Models) │
|
|
│ │◄────────┤ │
|
|
│ Backend │ HTTP │ Port 11434 │
|
|
│ (FastAPI) │ │ │
|
|
└─────────────────┘ └──────────────────┘
|
|
```
|
|
|
|
**Why this is best:**
|
|
- ✅ Fast development iteration
|
|
- ✅ Easy debugging (logs on local machine)
|
|
- ✅ GPU resources available remotely
|
|
- ✅ No complex deployment needed
|
|
- ✅ Can work offline (if models cached locally)
|
|
|
|
### For Production/Team Use
|
|
|
|
**Full Remote Deployment:**
|
|
```
|
|
┌─────────────────┐
|
|
│ Users │
|
|
│ (Browsers) │
|
|
└────────┬────────┘
|
|
│ HTTPS
|
|
▼
|
|
┌─────────────────┐
|
|
│ Reverse Proxy │
|
|
│ (nginx/traefik)│
|
|
└────────┬────────┘
|
|
│
|
|
┌────┴────┐
|
|
│ │
|
|
▼ ▼
|
|
┌────────┐ ┌──────────┐
|
|
│Frontend│ │ Backend │
|
|
│(Static)│ │(FastAPI) │
|
|
└────────┘ └────┬─────┘
|
|
│ HTTP
|
|
▼
|
|
┌──────────────┐
|
|
│ GPU VM │
|
|
│ Ollama │
|
|
└──────────────┘
|
|
```
|
|
|
|
## Comparison of Approaches
|
|
|
|
| Aspect | Hybrid (Local + Remote) | Full Remote | Production |
|
|
|--------|------------------------|-------------|------------|
|
|
| **Setup Complexity** | Low | Medium | High |
|
|
| **Development Speed** | Fast | Medium | Slow |
|
|
| **Security** | Medium | Medium | High |
|
|
| **Scalability** | Low | Medium | High |
|
|
| **Cost** | Low | Medium | High |
|
|
| **Best For** | Dev/Personal | Team/Internal | Public/Enterprise |
|
|
|
|
## Security Best Practices
|
|
|
|
### Development/Internal Network
|
|
|
|
1. **Network Isolation:**
|
|
- Use private network/VPN
|
|
- Restrict firewall to specific IPs
|
|
- Consider SSH tunnel for Ollama
|
|
|
|
2. **Ollama Access:**
|
|
```bash
|
|
# Only allow from specific IPs
|
|
sudo ufw allow from YOUR_IP to any port 11434
|
|
```
|
|
|
|
3. **SSH Tunnel Alternative:**
|
|
```bash
|
|
# More secure - no direct network exposure
|
|
ssh -L 11434:localhost:11434 user@gpu-vm
|
|
# Then use localhost:11434 in .env
|
|
```
|
|
|
|
### Production Deployment
|
|
|
|
1. **Authentication:**
|
|
- Add API keys to backend
|
|
- Implement user sessions
|
|
- Use OAuth2/JWT for API
|
|
|
|
2. **Network Security:**
|
|
- HTTPS/TLS everywhere
|
|
- WAF (Web Application Firewall)
|
|
- Rate limiting
|
|
- DDoS protection
|
|
|
|
3. **Infrastructure:**
|
|
- Container orchestration (Docker/K8s)
|
|
- Service mesh for internal communication
|
|
- Monitoring and alerting
|
|
- Automated backups
|
|
|
|
## Deployment Checklist
|
|
|
|
### Hybrid Setup (Recommended for Dev)
|
|
|
|
- [ ] GPU VM has Ollama installed
|
|
- [ ] Ollama configured to listen on 0.0.0.0:11434
|
|
- [ ] Firewall allows connections from local machine
|
|
- [ ] Models pulled on GPU VM
|
|
- [ ] Local `.env` configured with GPU VM IP
|
|
- [ ] Test connection: `curl http://GPU_VM_IP:11434/api/tags`
|
|
|
|
### Full Remote Setup
|
|
|
|
- [ ] Server/VM provisioned
|
|
- [ ] Frontend built and served (nginx/static host)
|
|
- [ ] Backend running as service (systemd/supervisor)
|
|
- [ ] Reverse proxy configured (nginx/traefik)
|
|
- [ ] SSL certificates installed
|
|
- [ ] Authentication implemented
|
|
- [ ] Monitoring set up
|
|
- [ ] Backup strategy in place
|
|
|
|
### Production Setup
|
|
|
|
- [ ] All of Full Remote checklist
|
|
- [ ] Load balancing configured
|
|
- [ ] Database for conversations (optional upgrade)
|
|
- [ ] Logging and monitoring (Prometheus/Grafana)
|
|
- [ ] CI/CD pipeline
|
|
- [ ] Security audit
|
|
- [ ] Documentation for ops team
|
|
- [ ] Disaster recovery plan
|
|
|
|
## Cost Considerations
|
|
|
|
### Hybrid (Local + Remote GPU VM)
|
|
- **Cost**: GPU VM only (~$0.50-2/hour depending on GPU)
|
|
- **Best for**: Development, personal projects, small teams
|
|
|
|
### Full Remote
|
|
- **Cost**: GPU VM + Application Server (~$1-3/hour)
|
|
- **Best for**: Teams, internal tools
|
|
|
|
### Production
|
|
- **Cost**: $100-1000+/month depending on scale
|
|
- **Best for**: Public services, enterprise
|
|
|
|
## Migration Path
|
|
|
|
1. **Start**: Hybrid (local dev, remote GPU)
|
|
2. **Grow**: Full remote (when team needs it)
|
|
3. **Scale**: Production (when going public/enterprise)
|
|
|
|
## Recommendation
|
|
|
|
**For your use case (development/personal):**
|
|
|
|
Use the **Hybrid approach**:
|
|
- Run frontend + backend locally
|
|
- Connect to Ollama on GPU VM
|
|
- Use SSH tunnel for extra security if needed
|
|
- Simple, fast, cost-effective
|
|
|
|
This gives you:
|
|
- Fast development iteration
|
|
- Easy debugging
|
|
- GPU resources when needed
|
|
- Minimal infrastructure complexity
|
|
- Low cost
|
|
|