✅ TICKET-006: Wake-word Detection Service - Implemented wake-word detection using openWakeWord - HTTP/WebSocket server on port 8002 - Real-time detection with configurable threshold - Event emission for ASR integration - Location: home-voice-agent/wake-word/ ✅ TICKET-010: ASR Service - Implemented ASR using faster-whisper - HTTP endpoint for file transcription - WebSocket endpoint for streaming transcription - Support for multiple audio formats - Auto language detection - GPU acceleration support - Location: home-voice-agent/asr/ ✅ TICKET-014: TTS Service - Implemented TTS using Piper - HTTP endpoint for text-to-speech synthesis - Low-latency processing (< 500ms) - Multiple voice support - WAV audio output - Location: home-voice-agent/tts/ ✅ TICKET-047: Updated Hardware Purchases - Marked Pi5 kit, SSD, microphone, and speakers as purchased - Updated progress log with purchase status 📚 Documentation: - Added VOICE_SERVICES_README.md with complete testing guide - Each service includes README.md with usage instructions - All services ready for Pi5 deployment 🧪 Testing: - Created test files for each service - All imports validated - FastAPI apps created successfully - Code passes syntax validation 🚀 Ready for: - Pi5 deployment - End-to-end voice flow testing - Integration with MCP server Files Added: - wake-word/detector.py - wake-word/server.py - wake-word/requirements.txt - wake-word/README.md - wake-word/test_detector.py - asr/service.py - asr/server.py - asr/requirements.txt - asr/README.md - asr/test_service.py - tts/service.py - tts/server.py - tts/requirements.txt - tts/README.md - tts/test_service.py - VOICE_SERVICES_README.md Files Modified: - tickets/done/TICKET-047_hardware-purchases.md Files Moved: - tickets/backlog/TICKET-006_prototype-wake-word-node.md → tickets/done/ - tickets/backlog/TICKET-010_streaming-asr-service.md → tickets/done/ - tickets/backlog/TICKET-014_tts-service.md → tickets/done/
109 lines
2.5 KiB
Markdown
109 lines
2.5 KiB
Markdown
# LLM Logging & Metrics
|
|
|
|
This module provides structured logging and metrics collection for LLM services.
|
|
|
|
## Features
|
|
|
|
- **Structured Logging**: JSON-formatted logs with all request details
|
|
- **Metrics Collection**: Track requests, latency, tokens, errors
|
|
- **Agent-specific Metrics**: Separate metrics for work and family agents
|
|
- **Hourly Statistics**: Track trends over time
|
|
- **Error Tracking**: Log and track errors
|
|
|
|
## Usage
|
|
|
|
### Logging
|
|
|
|
```python
|
|
from monitoring.logger import get_llm_logger
|
|
import time
|
|
|
|
logger = get_llm_logger()
|
|
|
|
start_time = time.time()
|
|
# ... make LLM request ...
|
|
end_time = time.time()
|
|
|
|
logger.log_request(
|
|
session_id="session-123",
|
|
agent_type="family",
|
|
user_id="user-1",
|
|
request_id="req-456",
|
|
prompt="What time is it?",
|
|
messages=[...],
|
|
tools_available=18,
|
|
start_time=start_time,
|
|
end_time=end_time,
|
|
response={...},
|
|
tools_called=["get_current_time"],
|
|
model="phi3:mini-q4_0"
|
|
)
|
|
```
|
|
|
|
### Metrics
|
|
|
|
```python
|
|
from monitoring.metrics import get_metrics_collector
|
|
|
|
collector = get_metrics_collector()
|
|
|
|
# Record a request
|
|
collector.record_request(
|
|
agent_type="family",
|
|
success=True,
|
|
latency_ms=450.5,
|
|
tokens_in=50,
|
|
tokens_out=25,
|
|
tools_called=1
|
|
)
|
|
|
|
# Get metrics
|
|
metrics = collector.get_metrics("family")
|
|
print(f"Total requests: {metrics['total_requests']}")
|
|
print(f"Average latency: {metrics['average_latency_ms']}ms")
|
|
```
|
|
|
|
## Log Format
|
|
|
|
Logs are stored in JSON format with the following fields:
|
|
|
|
- `timestamp`: ISO format timestamp
|
|
- `session_id`: Conversation session ID
|
|
- `agent_type`: "work" or "family"
|
|
- `user_id`: User identifier
|
|
- `request_id`: Unique request ID
|
|
- `prompt`: User prompt (truncated to 500 chars)
|
|
- `messages_count`: Number of messages in context
|
|
- `tools_available`: Number of tools available
|
|
- `tools_called`: List of tools called
|
|
- `latency_ms`: Request latency in milliseconds
|
|
- `tokens_in`: Input tokens
|
|
- `tokens_out`: Output tokens
|
|
- `response_length`: Length of response text
|
|
- `error`: Error message if any
|
|
- `model`: Model name used
|
|
|
|
## Metrics
|
|
|
|
Metrics are tracked per agent:
|
|
|
|
- Total requests
|
|
- Successful/failed requests
|
|
- Average latency
|
|
- Total tokens (in/out)
|
|
- Tools called count
|
|
- Last request time
|
|
|
|
## Storage
|
|
|
|
- **Logs**: `data/logs/llm_YYYYMMDD.log` (JSON format)
|
|
- **Metrics**: `data/metrics/metrics_YYYYMMDD.json` (JSON format)
|
|
|
|
## Future Enhancements
|
|
|
|
- GPU usage monitoring (when available)
|
|
- Real-time dashboard
|
|
- Alerting for errors or high latency
|
|
- Cost estimation based on tokens
|
|
- Request rate limiting based on metrics
|