Some checks failed
CI / backend-test (push) Successful in 4m9s
CI / frontend-test (push) Failing after 3m48s
CI / lint-python (push) Successful in 1m41s
CI / secret-scanning (push) Successful in 1m20s
CI / dependency-scan (push) Successful in 10m50s
CI / workflow-summary (push) Successful in 1m11s
## Features Added
### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference
### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames
### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges
### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation
### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets
### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory
### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
- `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
- `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
- `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`
## Technical Changes
### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution
### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design
## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py
## Breaking Changes
None - all changes are backward compatible
## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features
110 lines
4.2 KiB
Python
110 lines
4.2 KiB
Python
"""Configuration for the LLM Council."""
|
|
|
|
import os
|
|
from dotenv import load_dotenv
|
|
|
|
load_dotenv()
|
|
|
|
# Helpers
|
|
def _parse_int_env(name: str, default: int) -> int:
|
|
raw = os.getenv(name)
|
|
if raw is None or raw.strip() == "":
|
|
return default
|
|
try:
|
|
return int(raw.strip())
|
|
except ValueError:
|
|
return default
|
|
|
|
|
|
def _parse_float_env(name: str, default: float) -> float:
|
|
raw = os.getenv(name)
|
|
if raw is None or raw.strip() == "":
|
|
return default
|
|
try:
|
|
return float(raw.strip())
|
|
except ValueError:
|
|
return default
|
|
|
|
|
|
def _parse_list_env(name: str) -> list[str] | None:
|
|
"""
|
|
Parses a list from an env var.
|
|
|
|
Supported formats:
|
|
- Comma-separated: "a,b,c"
|
|
- Newline-separated: "a\\nb\\nc"
|
|
"""
|
|
raw = os.getenv(name)
|
|
if raw is None:
|
|
return None
|
|
raw = raw.strip()
|
|
if raw == "":
|
|
return []
|
|
|
|
# Allow either commas or newlines.
|
|
parts = []
|
|
for chunk in raw.replace("\r\n", "\n").split("\n"):
|
|
parts.extend(chunk.split(","))
|
|
return [p.strip() for p in parts if p.strip()]
|
|
|
|
|
|
# Council members - list of model identifiers (Ollama model names)
|
|
# Can be overridden via env var COUNCIL_MODELS (comma or newline separated).
|
|
_DEFAULT_COUNCIL_MODELS = [
|
|
"llama3.2:3b",
|
|
"qwen2.5:3b",
|
|
"gemma2:2b",
|
|
]
|
|
COUNCIL_MODELS = _parse_list_env("COUNCIL_MODELS") or _DEFAULT_COUNCIL_MODELS
|
|
|
|
# Chairman model - synthesizes final response
|
|
CHAIRMAN_MODEL = os.getenv("CHAIRMAN_MODEL") or "llama3.2:3b"
|
|
|
|
# Maximum tokens per request
|
|
# Default: 2048 tokens (reasonable for most responses)
|
|
# Increase if you need longer responses
|
|
MAX_TOKENS = _parse_int_env("MAX_TOKENS", 2048)
|
|
|
|
# Request timeout configuration (in seconds)
|
|
# Default timeout for general LLM queries (Stage 1: council responses)
|
|
# Used by llm_client.py and passed to openai_compat.query_model()
|
|
LLM_TIMEOUT_SECONDS = _parse_float_env("LLM_TIMEOUT_SECONDS", 120.0)
|
|
# Timeout for chairman synthesis (may need longer for complex responses)
|
|
CHAIRMAN_TIMEOUT_SECONDS = _parse_float_env("CHAIRMAN_TIMEOUT_SECONDS", 180.0)
|
|
# Timeout for title generation (short responses)
|
|
TITLE_GENERATION_TIMEOUT_SECONDS = _parse_float_env("TITLE_GENERATION_TIMEOUT_SECONDS", 120.0)
|
|
|
|
# OpenAI-compatible provider tuning (Ollama / vLLM / TGI)
|
|
# If USE_LOCAL_OLLAMA=true, automatically set base URL to localhost:11434 (convenience flag)
|
|
if os.getenv("USE_LOCAL_OLLAMA", "").strip().lower() in ("true", "1", "yes"):
|
|
_openai_compat_base_url = "http://localhost:11434"
|
|
else:
|
|
_openai_compat_base_url = os.getenv("OPENAI_COMPAT_BASE_URL")
|
|
OPENAI_COMPAT_BASE_URL = _openai_compat_base_url
|
|
|
|
# HTTP client timeout (fallback when timeout not explicitly passed to openai_compat functions)
|
|
# Used by: list_models() and as fallback in query_model() if called directly without timeout
|
|
# Should be >= LLM_TIMEOUT_SECONDS for safety, but list_models() is fast so can be lower
|
|
OPENAI_COMPAT_TIMEOUT_SECONDS = _parse_float_env("OPENAI_COMPAT_TIMEOUT_SECONDS", 300.0)
|
|
# HTTP client connection timeout (time to establish connection)
|
|
OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS = _parse_float_env("OPENAI_COMPAT_CONNECT_TIMEOUT_SECONDS", 10.0)
|
|
# HTTP client write timeout (time to send request)
|
|
OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS = _parse_float_env("OPENAI_COMPAT_WRITE_TIMEOUT_SECONDS", 10.0)
|
|
# HTTP client pool timeout (time to get connection from pool)
|
|
OPENAI_COMPAT_POOL_TIMEOUT_SECONDS = _parse_float_env("OPENAI_COMPAT_POOL_TIMEOUT_SECONDS", 10.0)
|
|
# Number of retries for failed requests (retryable HTTP errors: 408, 409, 425, 429, 500, 502, 503, 504)
|
|
OPENAI_COMPAT_RETRIES = _parse_int_env("OPENAI_COMPAT_RETRIES", 2)
|
|
# Exponential backoff base delay between retries (seconds) - actual delay is backoff * (2^attempt)
|
|
OPENAI_COMPAT_RETRY_BACKOFF_SECONDS = _parse_float_env("OPENAI_COMPAT_RETRY_BACKOFF_SECONDS", 0.5)
|
|
|
|
# Debug mode - show debug logs in console (set DEBUG=true in .env)
|
|
DEBUG = os.getenv("DEBUG", "").strip().lower() in ("true", "1", "yes")
|
|
|
|
# Markdown uploads (per-conversation)
|
|
DOCS_DIR = os.getenv("DOCS_DIR") or "data/docs"
|
|
MAX_DOC_BYTES = _parse_int_env("MAX_DOC_BYTES", 1_000_000) # 1MB
|
|
MAX_DOC_PREVIEW_CHARS = _parse_int_env("MAX_DOC_PREVIEW_CHARS", 20_000)
|
|
|
|
# Data directory for conversation storage
|
|
DATA_DIR = "data/conversations"
|