## Architecture ### Overview LLM Council is a local web app with: - **Frontend**: React + Vite (`frontend/`) on `:5173` - **Backend**: FastAPI (`backend/`) on `:8001` - **Storage**: JSON conversations + uploaded markdown docs on disk (`data/`) - **LLM Provider**: pluggable backend client ### Runtime data flow 1. UI sends a message to backend (`/api/conversations/{id}/message/stream`). 2. Backend loads any uploaded markdown docs for the conversation and injects them as additional context. 3. Backend runs a 3-stage pipeline: - **Stage 1**: query each council model in parallel - **Stage 2**: anonymized peer review + ranking - **Stage 3**: chairman synthesis ### LLM provider layer The backend uses OpenAI-compatible API servers (Ollama, vLLM, TGI, etc.). Configuration: - `USE_LOCAL_OLLAMA=true` - automatically sets base URL to `http://localhost:11434` - `OPENAI_COMPAT_BASE_URL` - set to your server URL (e.g., `http://remote-server:11434`) The provider (`backend/openai_compat.py`) targets servers that expose: - `POST /v1/chat/completions` - `GET /v1/models` The council orchestration uses the unified interface in `backend/llm_client.py`. ### Document uploads and references - Per-conversation markdown documents are stored under: `data/docs//` - Documents are automatically numbered (1, 2, 3, etc.) based on upload order - Documents can be referenced in prompts using: - Numeric references: `@1`, `@2`, `@3` (by upload order) - Filename references: `@filename` (fuzzy matching) - Backend endpoints: - `GET /api/conversations/{id}/documents` - `POST /api/conversations/{id}/documents` (multipart file) - `GET /api/conversations/{id}/documents/{doc_id}` (preview/truncated) - `DELETE /api/conversations/{id}/documents/{doc_id}` - Document context is automatically injected when referenced in user queries - Export reports replace `@1`, `@2` references with actual filenames ### Conversation management - Conversations stored as JSON files in `data/conversations/` - Features: - Create, list, get, delete conversations - Rename conversations (inline editing) - Search conversations by title and message content - Export conversations as markdown reports - Auto-generate titles from first message ### Frontend features - **Document autocomplete**: Type `@` to see numbered document list with autocomplete - **Conversation search**: Search box filters conversations by title/content - **Theme toggle**: Light/dark mode support - **Streaming responses**: Real-time updates as models respond - **Document preview**: View uploaded documents inline - **Export reports**: Download conversations as markdown files ### Configuration Primary runtime config is via `.env` (gitignored). Key settings: - Model configuration: `COUNCIL_MODELS`, `CHAIRMAN_MODEL` - Timeouts: `LLM_TIMEOUT_SECONDS`, `CHAIRMAN_TIMEOUT_SECONDS`, `OPENAI_COMPAT_TIMEOUT_SECONDS` - HTTP client timeouts: `OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS`, `OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS`, `OPENAI_COMPAT_POOL_TIMEOUT_SECONDS` - Document limits: `MAX_DOC_BYTES`, `MAX_DOC_PREVIEW_CHARS` Useful endpoints: - `GET /api/llm/status` and `GET /api/llm/status?probe=true` - `GET /api/conversations/search?q=...` - Search conversations - `PATCH /api/conversations/{id}/title` - Rename conversation ### Security model (local dev) This is currently built for local/private network usage. If you deploy beyond localhost, add: - auth (session/token) - rate limits - upload limits - network restrictions / TLS