# LLM Logging & Metrics This module provides structured logging and metrics collection for LLM services. ## Features - **Structured Logging**: JSON-formatted logs with all request details - **Metrics Collection**: Track requests, latency, tokens, errors - **Agent-specific Metrics**: Separate metrics for work and family agents - **Hourly Statistics**: Track trends over time - **Error Tracking**: Log and track errors ## Usage ### Logging ```python from monitoring.logger import get_llm_logger import time logger = get_llm_logger() start_time = time.time() # ... make LLM request ... end_time = time.time() logger.log_request( session_id="session-123", agent_type="family", user_id="user-1", request_id="req-456", prompt="What time is it?", messages=[...], tools_available=18, start_time=start_time, end_time=end_time, response={...}, tools_called=["get_current_time"], model="phi3:mini-q4_0" ) ``` ### Metrics ```python from monitoring.metrics import get_metrics_collector collector = get_metrics_collector() # Record a request collector.record_request( agent_type="family", success=True, latency_ms=450.5, tokens_in=50, tokens_out=25, tools_called=1 ) # Get metrics metrics = collector.get_metrics("family") print(f"Total requests: {metrics['total_requests']}") print(f"Average latency: {metrics['average_latency_ms']}ms") ``` ## Log Format Logs are stored in JSON format with the following fields: - `timestamp`: ISO format timestamp - `session_id`: Conversation session ID - `agent_type`: "work" or "family" - `user_id`: User identifier - `request_id`: Unique request ID - `prompt`: User prompt (truncated to 500 chars) - `messages_count`: Number of messages in context - `tools_available`: Number of tools available - `tools_called`: List of tools called - `latency_ms`: Request latency in milliseconds - `tokens_in`: Input tokens - `tokens_out`: Output tokens - `response_length`: Length of response text - `error`: Error message if any - `model`: Model name used ## Metrics Metrics are tracked per agent: - Total requests - Successful/failed requests - Average latency - Total tokens (in/out) - Tools called count - Last request time ## Storage - **Logs**: `data/logs/llm_YYYYMMDD.log` (JSON format) - **Metrics**: `data/metrics/metrics_YYYYMMDD.json` (JSON format) ## Future Enhancements - GPU usage monitoring (when available) - Real-time dashboard - Alerting for errors or high latency - Cost estimation based on tokens - Request rate limiting based on metrics