llm_council/ARCHITECTURE.md at main

ilia/llm_council

Fork 0

Irina Levit 3546c04348

CI / backend-test (push) Successful in 4m9s

Details

CI / frontend-test (push) Failing after 3m48s

Details

CI / lint-python (push) Successful in 1m41s

Details

CI / secret-scanning (push) Successful in 1m20s

Details

CI / dependency-scan (push) Successful in 10m50s

Details

CI / workflow-summary (push) Successful in 1m11s

Details

feat: Major UI/UX improvements and production readiness

## Features Added

### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference

### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames

### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges

### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation

### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets

### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory

### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
  - `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
  - `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
  - `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`

## Technical Changes

### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution

### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design

## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py

## Breaking Changes
None - all changes are backward compatible

## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features

2025-12-28 18:15:02 -05:00

3.4 KiB

Raw Permalink Blame History

Architecture

Overview

LLM Council is a local web app with:

Frontend: React + Vite (frontend/) on :5173
Backend: FastAPI (backend/) on :8001
Storage: JSON conversations + uploaded markdown docs on disk (data/)
LLM Provider: pluggable backend client

Runtime data flow

UI sends a message to backend (/api/conversations/{id}/message/stream).
Backend loads any uploaded markdown docs for the conversation and injects them as additional context.
Backend runs a 3-stage pipeline:
- Stage 1: query each council model in parallel
- Stage 2: anonymized peer review + ranking
- Stage 3: chairman synthesis

LLM provider layer

The backend uses OpenAI-compatible API servers (Ollama, vLLM, TGI, etc.).

Configuration:

USE_LOCAL_OLLAMA=true - automatically sets base URL to http://localhost:11434
OPENAI_COMPAT_BASE_URL - set to your server URL (e.g., http://remote-server:11434)

The provider (backend/openai_compat.py) targets servers that expose:

POST /v1/chat/completions
GET /v1/models

The council orchestration uses the unified interface in backend/llm_client.py.

Document uploads and references

Per-conversation markdown documents are stored under: data/docs/<conversation_id>/
Documents are automatically numbered (1, 2, 3, etc.) based on upload order
Documents can be referenced in prompts using:
- Numeric references: @1, @2, @3 (by upload order)
- Filename references: @filename (fuzzy matching)
Backend endpoints:
- GET /api/conversations/{id}/documents
- POST /api/conversations/{id}/documents (multipart file)
- GET /api/conversations/{id}/documents/{doc_id} (preview/truncated)
- DELETE /api/conversations/{id}/documents/{doc_id}
Document context is automatically injected when referenced in user queries
Export reports replace @1, @2 references with actual filenames

Conversation management

Conversations stored as JSON files in data/conversations/
Features:
- Create, list, get, delete conversations
- Rename conversations (inline editing)
- Search conversations by title and message content
- Export conversations as markdown reports
- Auto-generate titles from first message

Frontend features

Document autocomplete: Type @ to see numbered document list with autocomplete
Conversation search: Search box filters conversations by title/content
Theme toggle: Light/dark mode support
Streaming responses: Real-time updates as models respond
Document preview: View uploaded documents inline
Export reports: Download conversations as markdown files

Configuration

Primary runtime config is via .env (gitignored). Key settings:

Model configuration: COUNCIL_MODELS, CHAIRMAN_MODEL
Timeouts: LLM_TIMEOUT_SECONDS, CHAIRMAN_TIMEOUT_SECONDS, OPENAI_COMPAT_TIMEOUT_SECONDS
HTTP client timeouts: OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS, OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS, OPENAI_COMPAT_POOL_TIMEOUT_SECONDS
Document limits: MAX_DOC_BYTES, MAX_DOC_PREVIEW_CHARS

Useful endpoints:

GET /api/llm/status and GET /api/llm/status?probe=true
GET /api/conversations/search?q=... - Search conversations
PATCH /api/conversations/{id}/title - Rename conversation

Security model (local dev)

This is currently built for local/private network usage. If you deploy beyond localhost, add:

auth (session/token)
rate limits
upload limits
network restrictions / TLS

3.4 KiB Raw Permalink Blame History

Architecture

Overview

Runtime data flow

LLM provider layer

Document uploads and references

Conversation management

Frontend features

Configuration

Security model (local dev)

3.4 KiB

Raw Permalink Blame History