llm_council/ARCHITECTURE.md

## Architecture

### Overview
LLM Council is a local web app with:
- **Frontend**: React + Vite (`frontend/`) on `:5173`
- **Backend**: FastAPI (`backend/`) on `:8001`
- **Storage**: JSON conversations + uploaded markdown docs on disk (`data/`)
- **LLM Provider**: pluggable backend client

### Runtime data flow
1. UI sends a message to backend (`/api/conversations/{id}/message/stream`).
2. Backend loads any uploaded markdown docs for the conversation and injects them as additional context.
3. Backend runs a 3-stage pipeline:
   - **Stage 1**: query each council model in parallel
   - **Stage 2**: anonymized peer review + ranking
   - **Stage 3**: chairman synthesis

### LLM provider layer
The backend uses OpenAI-compatible API servers (Ollama, vLLM, TGI, etc.).

Configuration:
- `USE_LOCAL_OLLAMA=true` - automatically sets base URL to `http://localhost:11434`
- `OPENAI_COMPAT_BASE_URL` - set to your server URL (e.g., `http://remote-server:11434`)

The provider (`backend/openai_compat.py`) targets servers that expose:
- `POST /v1/chat/completions`
- `GET /v1/models`

The council orchestration uses the unified interface in `backend/llm_client.py`.

### Document uploads and references
- Per-conversation markdown documents are stored under: `data/docs/<conversation_id>/`
- Documents are automatically numbered (1, 2, 3, etc.) based on upload order
- Documents can be referenced in prompts using:
  - Numeric references: `@1`, `@2`, `@3` (by upload order)
  - Filename references: `@filename` (fuzzy matching)
- Backend endpoints:
  - `GET /api/conversations/{id}/documents`
  - `POST /api/conversations/{id}/documents` (multipart file)
  - `GET /api/conversations/{id}/documents/{doc_id}` (preview/truncated)
  - `DELETE /api/conversations/{id}/documents/{doc_id}`
- Document context is automatically injected when referenced in user queries
- Export reports replace `@1`, `@2` references with actual filenames

### Conversation management
- Conversations stored as JSON files in `data/conversations/`
- Features:
  - Create, list, get, delete conversations
  - Rename conversations (inline editing)
  - Search conversations by title and message content
  - Export conversations as markdown reports
  - Auto-generate titles from first message

### Frontend features
- **Document autocomplete**: Type `@` to see numbered document list with autocomplete
- **Conversation search**: Search box filters conversations by title/content
- **Theme toggle**: Light/dark mode support
- **Streaming responses**: Real-time updates as models respond
- **Document preview**: View uploaded documents inline
- **Export reports**: Download conversations as markdown files

### Configuration
Primary runtime config is via `.env` (gitignored). Key settings:
- Model configuration: `COUNCIL_MODELS`, `CHAIRMAN_MODEL`
- Timeouts: `LLM_TIMEOUT_SECONDS`, `CHAIRMAN_TIMEOUT_SECONDS`, `OPENAI_COMPAT_TIMEOUT_SECONDS`
- HTTP client timeouts: `OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS`, `OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS`, `OPENAI_COMPAT_POOL_TIMEOUT_SECONDS`
- Document limits: `MAX_DOC_BYTES`, `MAX_DOC_PREVIEW_CHARS`

Useful endpoints:
- `GET /api/llm/status` and `GET /api/llm/status?probe=true`
- `GET /api/conversations/search?q=...` - Search conversations
- `PATCH /api/conversations/{id}/title` - Rename conversation

### Security model (local dev)
This is currently built for local/private network usage.
If you deploy beyond localhost, add:
- auth (session/token)
- rate limits
- upload limits
- network restrictions / TLS