llm_council/scripts/test_model_timeout.py
Irina Levit 3546c04348
Some checks failed
CI / backend-test (push) Successful in 4m9s
CI / frontend-test (push) Failing after 3m48s
CI / lint-python (push) Successful in 1m41s
CI / secret-scanning (push) Successful in 1m20s
CI / dependency-scan (push) Successful in 10m50s
CI / workflow-summary (push) Successful in 1m11s
feat: Major UI/UX improvements and production readiness
## Features Added

### Document Reference System
- Implemented numbered document references (@1, @2, etc.) with autocomplete dropdown
- Added fuzzy filename matching for @filename references
- Document filtering now prioritizes numeric refs > filename refs > all documents
- Autocomplete dropdown appears when typing @ with keyboard navigation (Up/Down, Enter/Tab, Escape)
- Document numbers displayed in UI for easy reference

### Conversation Management
- Added conversation rename functionality with inline editing
- Implemented conversation search (by title and content)
- Search box always visible, even when no conversations exist
- Export reports now replace @N references with actual filenames

### UI/UX Improvements
- Removed debug toggle button
- Improved text contrast in dark mode (better visibility)
- Made input textarea expand to full available width
- Fixed file text color for better readability
- Enhanced document display with numbered badges

### Configuration & Timeouts
- Made HTTP client timeouts configurable (connect, write, pool)
- Added .env.example with all configuration options
- Updated timeout documentation

### Developer Experience
- Added `make test-setup` target for automated test conversation creation
- Test setup script supports TEST_MESSAGE and TEST_DOCS env vars
- Improved Makefile with dev and test-setup targets

### Documentation
- Updated ARCHITECTURE.md with all new features
- Created comprehensive deployment documentation
- Added GPU VM setup guides
- Removed unnecessary markdown files (CLAUDE.md, CONTRIBUTING.md, header.jpg)
- Organized documentation in docs/ directory

### GPU VM / Ollama (Stability + GPU Offload)
- Updated GPU VM docs to reflect the working systemd environment for remote Ollama
- Standardized remote Ollama port to 11434 (and added /v1/models verification)
- Documented required env for GPU offload on this VM:
  - `OLLAMA_MODELS=/mnt/data/ollama`, `HOME=/mnt/data/ollama/home`
  - `OLLAMA_LLM_LIBRARY=cuda_v12` (not `cuda`)
  - `LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/cuda_v12`

## Technical Changes

### Backend
- Enhanced `docs_context.py` with reference parsing (numeric and filename)
- Added `update_conversation_title` to storage.py
- New endpoints: PATCH /api/conversations/{id}/title, GET /api/conversations/search
- Improved report generation with filename substitution

### Frontend
- Removed debugMode state and related code
- Added autocomplete dropdown component
- Implemented search functionality in Sidebar
- Enhanced ChatInterface with autocomplete and improved textarea sizing
- Updated CSS for better contrast and responsive design

## Files Changed
- Backend: config.py, council.py, docs_context.py, main.py, storage.py
- Frontend: App.jsx, ChatInterface.jsx, Sidebar.jsx, and related CSS files
- Documentation: README.md, ARCHITECTURE.md, new docs/ directory
- Configuration: .env.example, Makefile
- Scripts: scripts/test_setup.py

## Breaking Changes
None - all changes are backward compatible

## Testing
- All existing tests pass
- New test-setup script validates conversation creation workflow
- Manual testing of autocomplete, search, and rename features
2025-12-28 18:15:02 -05:00

71 lines
2.4 KiB
Python

#!/usr/bin/env python3
"""Test script to diagnose model timeout issues."""
import asyncio
import time
import httpx
from backend.config import OPENAI_COMPAT_BASE_URL, LLM_TIMEOUT_SECONDS, DEBUG
async def test_model(model: str, max_tokens: int = 10):
"""Test a single model query."""
print(f"\n{'='*60}")
print(f"Testing model: {model}")
print(f"Timeout: {LLM_TIMEOUT_SECONDS}s")
print(f"Base URL: {OPENAI_COMPAT_BASE_URL}")
print(f"{'='*60}")
url = f"{OPENAI_COMPAT_BASE_URL}/v1/chat/completions"
payload = {
"model": model,
"messages": [{"role": "user", "content": "Say hello"}],
"max_tokens": max_tokens
}
start_time = time.time()
try:
async with httpx.AsyncClient(timeout=LLM_TIMEOUT_SECONDS) as client:
print(f"[{time.time() - start_time:.1f}s] Sending request...")
response = await client.post(url, json=payload)
elapsed = time.time() - start_time
print(f"[{elapsed:.1f}s] Response received: Status {response.status_code}")
if response.status_code == 200:
data = response.json()
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
print(f"✓ Success! Response: {content[:100]}")
return True
else:
print(f"✗ Error: {response.status_code}")
print(f" Response: {response.text[:200]}")
return False
except httpx.TimeoutException:
elapsed = time.time() - start_time
print(f"✗ Timeout after {elapsed:.1f}s (limit was {LLM_TIMEOUT_SECONDS}s)")
return False
except Exception as e:
elapsed = time.time() - start_time
print(f"✗ Error after {elapsed:.1f}s: {type(e).__name__}: {e}")
return False
async def main():
models = ["llama3.2:1b", "qwen2.5:0.5b", "gemma2:2b"]
print("Testing models sequentially to diagnose timeout issues...")
print(f"Current timeout setting: {LLM_TIMEOUT_SECONDS}s")
results = {}
for model in models:
results[model] = await test_model(model)
# Small delay between tests
await asyncio.sleep(1)
print(f"\n{'='*60}")
print("Summary:")
for model, success in results.items():
status = "✓ PASS" if success else "✗ FAIL"
print(f" {model}: {status}")
print(f"{'='*60}")
if __name__ == "__main__":
asyncio.run(main())