## Features Added
### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference
### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames
### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges
### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation
### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets
### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory
### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
- `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
- `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
- `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`
## Technical Changes
### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution
### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design
## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py
## Breaking Changes
None - all changes are backward compatible
## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features
LLM Council
The idea of this repo is that instead of asking a question to a single LLM, you can group multiple LLMs into your "LLM Council". This repo is a simple, local web app that essentially looks like ChatGPT except it sends your query to multiple LLMs via an OpenAI-compatible API (Ollama, vLLM, TGI, etc.), it then asks them to review and rank each other's work, and finally a Chairman LLM produces the final response.
In a bit more detail, here is what happens when you submit a query:
- Stage 1: First opinions. The user query is given to all LLMs individually, and the responses are collected. The individual responses are shown in a "tab view", so that the user can inspect them all one by one.
- Stage 2: Review. Each individual LLM is given the responses of the other LLMs. Under the hood, the LLM identities are anonymized so that the LLM can't play favorites when judging their outputs. The LLM is asked to rank them in accuracy and insight.
- Stage 3: Final response. The designated Chairman of the LLM Council takes all of the model's responses and compiles them into a single final answer that is presented to the user.
Vibe Code Alert
This project was 99% vibe coded as a fun Saturday hack because I wanted to explore and evaluate a number of LLMs side by side in the process of reading books together with LLMs. It's nice and useful to see multiple responses side by side, and also the cross-opinions of all LLMs on each other's outputs. You're not going to support it in any way, it's provided here as is for other people's inspiration and you don't intend to improve it. Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like.
Setup
1. Install Dependencies
The project uses uv for project management.
Backend:
uv sync
Frontend:
cd frontend
npm install
cd ..
2. Configure Ollama Server
LLM Council requires an OpenAI-compatible API server. The easiest way to get started is with Ollama running locally or on a remote server.
For local Ollama:
- Install and start Ollama: https://ollama.ai
- Pull some models:
ollama pull llama3.2:3b
ollama pull qwen2.5:3b
ollama pull gemma2:2b
3. Configure Environment
Create a .env file in the project root with your configuration:
For local Ollama:
USE_LOCAL_OLLAMA=true
COUNCIL_MODELS=llama3.2:3b,qwen2.5:3b,gemma2:2b
CHAIRMAN_MODEL=llama3.2:3b
MAX_TOKENS=1024
LLM_MAX_CONCURRENCY=1
For remote Ollama or other OpenAI-compatible server:
OPENAI_COMPAT_BASE_URL=http://your-server:11434
COUNCIL_MODELS=llama3.2:3b,qwen2.5:3b,gemma2:2b
CHAIRMAN_MODEL=llama3.2:3b
MAX_TOKENS=2048
LLM_MAX_CONCURRENCY=1
Optional timeout configuration:
LLM_TIMEOUT_SECONDS=120.0 # Default timeout for LLM queries
CHAIRMAN_TIMEOUT_SECONDS=180.0 # Timeout for chairman synthesis
TITLE_GENERATION_TIMEOUT_SECONDS=120.0 # Timeout for title generation
OPENAI_COMPAT_TIMEOUT_SECONDS=300.0 # Timeout for OpenAI-compatible server
OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS=10.0 # HTTP connection timeout
OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS=10.0 # HTTP write timeout
OPENAI_COMPAT_POOL_TIMEOUT_SECONDS=10.0 # HTTP pool timeout
See .env.example for all available configuration options. Alternatively, you can edit backend/config.py directly to set defaults.
Running the Application
Option 1: Use the start script
./start.sh
Option 2: Use Makefile
make dev
Option 3: Run manually
Terminal 1 (Backend):
uv run python -m backend.main
Terminal 2 (Frontend):
cd frontend
npm run dev
Then open http://localhost:5173 in your browser.
Option 4: Test setup with pre-configured conversation
# Set in .env:
# TEST_MESSAGE="Your message"
# TEST_DOCS="doc1.md,doc2.md"
make test-setup
This creates a new conversation with today's date/time, uploads documents, and pre-fills the message in the UI (it does not auto-send).
Frontend theme default (optional)
By default, the UI theme is persisted in localStorage. If there is no saved theme yet, you can set a default theme via a Vite env var:
# Example (starts in dark mode if there's no localStorage value yet)
VITE_DEFAULT_THEME=dark make dev
Using Ollama on a Remote Server
If you have Ollama running on a remote server or VM:
- In your project
.env, set:
OPENAI_COMPAT_BASE_URL=http://your-server-ip:11434
COUNCIL_MODELS=llama3.2:3b,qwen2.5:3b,gemma2:2b
CHAIRMAN_MODEL=llama3.2:3b
MAX_TOKENS=2048
LLM_MAX_CONCURRENCY=1
- Verify connectivity from your machine:
curl http://your-server-ip:11434/api/tags
Using Other OpenAI-Compatible Servers (vLLM, TGI, etc.)
If you're running vLLM, TGI, or another OpenAI-compatible server:
-
Ensure your server exposes:
POST /v1/chat/completionsGET /v1/models
-
In your project
.env, set:
OPENAI_COMPAT_BASE_URL=http://your-server:port
COUNCIL_MODELS=your-model-1,your-model-2,your-model-3
CHAIRMAN_MODEL=your-model-1
MAX_TOKENS=2048
LLM_MAX_CONCURRENCY=1
# (optional) if your server requires auth:
# OPENAI_COMPAT_API_KEY=...
- Verify connectivity:
curl http://your-server:port/v1/models
Documentation
- Architecture - System architecture and design
- Deployment Guide - How to deploy with remote GPU VM
- Deployment Recommendations - Professional deployment options
Tech Stack
- Backend: FastAPI (Python 3.10+), async httpx, OpenAI-compatible API
- Frontend: React + Vite, react-markdown for rendering
- Storage: JSON files in
data/conversations/ - Package Management: uv for Python, npm for JavaScript
- LLM Backend: Ollama, vLLM, TGI, or any OpenAI-compatible server