llm_council/ARCHITECTURE.md
Irina Levit 3546c04348
Some checks failed
CI / backend-test (push) Successful in 4m9s
CI / frontend-test (push) Failing after 3m48s
CI / lint-python (push) Successful in 1m41s
CI / secret-scanning (push) Successful in 1m20s
CI / dependency-scan (push) Successful in 10m50s
CI / workflow-summary (push) Successful in 1m11s
feat: Major UI/UX improvements and production readiness
## Features Added

### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference

### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames

### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges

### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation

### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets

### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory

### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
  - `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
  - `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
  - `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`

## Technical Changes

### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution

### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design

## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py

## Breaking Changes
None - all changes are backward compatible

## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features
2025-12-28 18:15:02 -05:00

3.4 KiB

Architecture

Overview

LLM Council is a local web app with:

  • Frontend: React + Vite (frontend/) on :5173
  • Backend: FastAPI (backend/) on :8001
  • Storage: JSON conversations + uploaded markdown docs on disk (data/)
  • LLM Provider: pluggable backend client

Runtime data flow

  1. UI sends a message to backend (/api/conversations/{id}/message/stream).
  2. Backend loads any uploaded markdown docs for the conversation and injects them as additional context.
  3. Backend runs a 3-stage pipeline:
    • Stage 1: query each council model in parallel
    • Stage 2: anonymized peer review + ranking
    • Stage 3: chairman synthesis

LLM provider layer

The backend uses OpenAI-compatible API servers (Ollama, vLLM, TGI, etc.).

Configuration:

  • USE_LOCAL_OLLAMA=true - automatically sets base URL to http://localhost:11434
  • OPENAI_COMPAT_BASE_URL - set to your server URL (e.g., http://remote-server:11434)

The provider (backend/openai_compat.py) targets servers that expose:

  • POST /v1/chat/completions
  • GET /v1/models

The council orchestration uses the unified interface in backend/llm_client.py.

Document uploads and references

  • Per-conversation markdown documents are stored under: data/docs/<conversation_id>/
  • Documents are automatically numbered (1, 2, 3, etc.) based on upload order
  • Documents can be referenced in prompts using:
    • Numeric references: @1, @2, @3 (by upload order)
    • Filename references: @filename (fuzzy matching)
  • Backend endpoints:
    • GET /api/conversations/{id}/documents
    • POST /api/conversations/{id}/documents (multipart file)
    • GET /api/conversations/{id}/documents/{doc_id} (preview/truncated)
    • DELETE /api/conversations/{id}/documents/{doc_id}
  • Document context is automatically injected when referenced in user queries
  • Export reports replace @1, @2 references with actual filenames

Conversation management

  • Conversations stored as JSON files in data/conversations/
  • Features:
    • Create, list, get, delete conversations
    • Rename conversations (inline editing)
    • Search conversations by title and message content
    • Export conversations as markdown reports
    • Auto-generate titles from first message

Frontend features

  • Document autocomplete: Type @ to see numbered document list with autocomplete
  • Conversation search: Search box filters conversations by title/content
  • Theme toggle: Light/dark mode support
  • Streaming responses: Real-time updates as models respond
  • Document preview: View uploaded documents inline
  • Export reports: Download conversations as markdown files

Configuration

Primary runtime config is via .env (gitignored). Key settings:

  • Model configuration: COUNCIL_MODELS, CHAIRMAN_MODEL
  • Timeouts: LLM_TIMEOUT_SECONDS, CHAIRMAN_TIMEOUT_SECONDS, OPENAI_COMPAT_TIMEOUT_SECONDS
  • HTTP client timeouts: OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS, OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS, OPENAI_COMPAT_POOL_TIMEOUT_SECONDS
  • Document limits: MAX_DOC_BYTES, MAX_DOC_PREVIEW_CHARS

Useful endpoints:

  • GET /api/llm/status and GET /api/llm/status?probe=true
  • GET /api/conversations/search?q=... - Search conversations
  • PATCH /api/conversations/{id}/title - Rename conversation

Security model (local dev)

This is currently built for local/private network usage. If you deploy beyond localhost, add:

  • auth (session/token)
  • rate limits
  • upload limits
  • network restrictions / TLS