ilia/atlas

ilia bdbf09a9ac feat: Implement voice I/O services (TICKET-006, TICKET-010, TICKET-014)

✅ TICKET-006: Wake-word Detection Service
- Implemented wake-word detection using openWakeWord
- HTTP/WebSocket server on port 8002
- Real-time detection with configurable threshold
- Event emission for ASR integration
- Location: home-voice-agent/wake-word/

✅ TICKET-010: ASR Service
- Implemented ASR using faster-whisper
- HTTP endpoint for file transcription
- WebSocket endpoint for streaming transcription
- Support for multiple audio formats
- Auto language detection
- GPU acceleration support
- Location: home-voice-agent/asr/

✅ TICKET-014: TTS Service
- Implemented TTS using Piper
- HTTP endpoint for text-to-speech synthesis
- Low-latency processing (< 500ms)
- Multiple voice support
- WAV audio output
- Location: home-voice-agent/tts/

✅ TICKET-047: Updated Hardware Purchases
- Marked Pi5 kit, SSD, microphone, and speakers as purchased
- Updated progress log with purchase status

📚 Documentation:
- Added VOICE_SERVICES_README.md with complete testing guide
- Each service includes README.md with usage instructions
- All services ready for Pi5 deployment

🧪 Testing:
- Created test files for each service
- All imports validated
- FastAPI apps created successfully
- Code passes syntax validation

🚀 Ready for:
- Pi5 deployment
- End-to-end voice flow testing
- Integration with MCP server

Files Added:
- wake-word/detector.py
- wake-word/server.py
- wake-word/requirements.txt
- wake-word/README.md
- wake-word/test_detector.py
- asr/service.py
- asr/server.py
- asr/requirements.txt
- asr/README.md
- asr/test_service.py
- tts/service.py
- tts/server.py
- tts/requirements.txt
- tts/README.md
- tts/test_service.py
- VOICE_SERVICES_README.md

Files Modified:
- tickets/done/TICKET-047_hardware-purchases.md

Files Moved:
- tickets/backlog/TICKET-006_prototype-wake-word-node.md → tickets/done/
- tickets/backlog/TICKET-010_streaming-asr-service.md → tickets/done/
- tickets/backlog/TICKET-014_tts-service.md → tickets/done/

2026-01-12 22:22:38 -05:00

2.5 KiB

Raw Blame History

LLM Logging & Metrics

This module provides structured logging and metrics collection for LLM services.

Features

Structured Logging: JSON-formatted logs with all request details
Metrics Collection: Track requests, latency, tokens, errors
Agent-specific Metrics: Separate metrics for work and family agents
Hourly Statistics: Track trends over time
Error Tracking: Log and track errors

Usage

Logging

from monitoring.logger import get_llm_logger
import time

logger = get_llm_logger()

start_time = time.time()
# ... make LLM request ...
end_time = time.time()

logger.log_request(
    session_id="session-123",
    agent_type="family",
    user_id="user-1",
    request_id="req-456",
    prompt="What time is it?",
    messages=[...],
    tools_available=18,
    start_time=start_time,
    end_time=end_time,
    response={...},
    tools_called=["get_current_time"],
    model="phi3:mini-q4_0"
)

Metrics

from monitoring.metrics import get_metrics_collector

collector = get_metrics_collector()

# Record a request
collector.record_request(
    agent_type="family",
    success=True,
    latency_ms=450.5,
    tokens_in=50,
    tokens_out=25,
    tools_called=1
)

# Get metrics
metrics = collector.get_metrics("family")
print(f"Total requests: {metrics['total_requests']}")
print(f"Average latency: {metrics['average_latency_ms']}ms")

Log Format

Logs are stored in JSON format with the following fields:

timestamp: ISO format timestamp
session_id: Conversation session ID
agent_type: "work" or "family"
user_id: User identifier
request_id: Unique request ID
prompt: User prompt (truncated to 500 chars)
messages_count: Number of messages in context
tools_available: Number of tools available
tools_called: List of tools called
latency_ms: Request latency in milliseconds
tokens_in: Input tokens
tokens_out: Output tokens
response_length: Length of response text
error: Error message if any
model: Model name used

Metrics

Metrics are tracked per agent:

Total requests
Successful/failed requests
Average latency
Total tokens (in/out)
Tools called count
Last request time

Storage

Logs: data/logs/llm_YYYYMMDD.log (JSON format)
Metrics: data/metrics/metrics_YYYYMMDD.json (JSON format)

Future Enhancements

GPU usage monitoring (when available)
Real-time dashboard
Alerting for errors or high latency
Cost estimation based on tokens
Request rate limiting based on metrics

2.5 KiB Raw Blame History

LLM Logging & Metrics

Features

Usage

Logging

Metrics

Log Format

Metrics

Storage

Future Enhancements

2.5 KiB

Raw Blame History