Some checks failed
CI / backend-test (push) Successful in 4m9s
CI / frontend-test (push) Failing after 3m48s
CI / lint-python (push) Successful in 1m41s
CI / secret-scanning (push) Successful in 1m20s
CI / dependency-scan (push) Successful in 10m50s
CI / workflow-summary (push) Successful in 1m11s
## Features Added
### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference
### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames
### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges
### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation
### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets
### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory
### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
- `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
- `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
- `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`
## Technical Changes
### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution
### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design
## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py
## Breaking Changes
None - all changes are backward compatible
## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features
81 lines
3.4 KiB
Markdown
81 lines
3.4 KiB
Markdown
## Architecture
|
|
|
|
### Overview
|
|
LLM Council is a local web app with:
|
|
- **Frontend**: React + Vite (`frontend/`) on `:5173`
|
|
- **Backend**: FastAPI (`backend/`) on `:8001`
|
|
- **Storage**: JSON conversations + uploaded markdown docs on disk (`data/`)
|
|
- **LLM Provider**: pluggable backend client
|
|
|
|
### Runtime data flow
|
|
1. UI sends a message to backend (`/api/conversations/{id}/message/stream`).
|
|
2. Backend loads any uploaded markdown docs for the conversation and injects them as additional context.
|
|
3. Backend runs a 3-stage pipeline:
|
|
- **Stage 1**: query each council model in parallel
|
|
- **Stage 2**: anonymized peer review + ranking
|
|
- **Stage 3**: chairman synthesis
|
|
|
|
### LLM provider layer
|
|
The backend uses OpenAI-compatible API servers (Ollama, vLLM, TGI, etc.).
|
|
|
|
Configuration:
|
|
- `USE_LOCAL_OLLAMA=true` - automatically sets base URL to `http://localhost:11434`
|
|
- `OPENAI_COMPAT_BASE_URL` - set to your server URL (e.g., `http://remote-server:11434`)
|
|
|
|
The provider (`backend/openai_compat.py`) targets servers that expose:
|
|
- `POST /v1/chat/completions`
|
|
- `GET /v1/models`
|
|
|
|
The council orchestration uses the unified interface in `backend/llm_client.py`.
|
|
|
|
### Document uploads and references
|
|
- Per-conversation markdown documents are stored under: `data/docs/<conversation_id>/`
|
|
- Documents are automatically numbered (1, 2, 3, etc.) based on upload order
|
|
- Documents can be referenced in prompts using:
|
|
- Numeric references: `@1`, `@2`, `@3` (by upload order)
|
|
- Filename references: `@filename` (fuzzy matching)
|
|
- Backend endpoints:
|
|
- `GET /api/conversations/{id}/documents`
|
|
- `POST /api/conversations/{id}/documents` (multipart file)
|
|
- `GET /api/conversations/{id}/documents/{doc_id}` (preview/truncated)
|
|
- `DELETE /api/conversations/{id}/documents/{doc_id}`
|
|
- Document context is automatically injected when referenced in user queries
|
|
- Export reports replace `@1`, `@2` references with actual filenames
|
|
|
|
### Conversation management
|
|
- Conversations stored as JSON files in `data/conversations/`
|
|
- Features:
|
|
- Create, list, get, delete conversations
|
|
- Rename conversations (inline editing)
|
|
- Search conversations by title and message content
|
|
- Export conversations as markdown reports
|
|
- Auto-generate titles from first message
|
|
|
|
### Frontend features
|
|
- **Document autocomplete**: Type `@` to see numbered document list with autocomplete
|
|
- **Conversation search**: Search box filters conversations by title/content
|
|
- **Theme toggle**: Light/dark mode support
|
|
- **Streaming responses**: Real-time updates as models respond
|
|
- **Document preview**: View uploaded documents inline
|
|
- **Export reports**: Download conversations as markdown files
|
|
|
|
### Configuration
|
|
Primary runtime config is via `.env` (gitignored). Key settings:
|
|
- Model configuration: `COUNCIL_MODELS`, `CHAIRMAN_MODEL`
|
|
- Timeouts: `LLM_TIMEOUT_SECONDS`, `CHAIRMAN_TIMEOUT_SECONDS`, `OPENAI_COMPAT_TIMEOUT_SECONDS`
|
|
- HTTP client timeouts: `OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS`, `OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS`, `OPENAI_COMPAT_POOL_TIMEOUT_SECONDS`
|
|
- Document limits: `MAX_DOC_BYTES`, `MAX_DOC_PREVIEW_CHARS`
|
|
|
|
Useful endpoints:
|
|
- `GET /api/llm/status` and `GET /api/llm/status?probe=true`
|
|
- `GET /api/conversations/search?q=...` - Search conversations
|
|
- `PATCH /api/conversations/{id}/title` - Rename conversation
|
|
|
|
### Security model (local dev)
|
|
This is currently built for local/private network usage.
|
|
If you deploy beyond localhost, add:
|
|
- auth (session/token)
|
|
- rate limits
|
|
- upload limits
|
|
- network restrictions / TLS
|