ilia/atlas

ilia bdbf09a9ac feat: Implement voice I/O services (TICKET-006, TICKET-010, TICKET-014)

✅ TICKET-006: Wake-word Detection Service
- Implemented wake-word detection using openWakeWord
- HTTP/WebSocket server on port 8002
- Real-time detection with configurable threshold
- Event emission for ASR integration
- Location: home-voice-agent/wake-word/

✅ TICKET-010: ASR Service
- Implemented ASR using faster-whisper
- HTTP endpoint for file transcription
- WebSocket endpoint for streaming transcription
- Support for multiple audio formats
- Auto language detection
- GPU acceleration support
- Location: home-voice-agent/asr/

✅ TICKET-014: TTS Service
- Implemented TTS using Piper
- HTTP endpoint for text-to-speech synthesis
- Low-latency processing (< 500ms)
- Multiple voice support
- WAV audio output
- Location: home-voice-agent/tts/

✅ TICKET-047: Updated Hardware Purchases
- Marked Pi5 kit, SSD, microphone, and speakers as purchased
- Updated progress log with purchase status

📚 Documentation:
- Added VOICE_SERVICES_README.md with complete testing guide
- Each service includes README.md with usage instructions
- All services ready for Pi5 deployment

🧪 Testing:
- Created test files for each service
- All imports validated
- FastAPI apps created successfully
- Code passes syntax validation

🚀 Ready for:
- Pi5 deployment
- End-to-end voice flow testing
- Integration with MCP server

Files Added:
- wake-word/detector.py
- wake-word/server.py
- wake-word/requirements.txt
- wake-word/README.md
- wake-word/test_detector.py
- asr/service.py
- asr/server.py
- asr/requirements.txt
- asr/README.md
- asr/test_service.py
- tts/service.py
- tts/server.py
- tts/requirements.txt
- tts/README.md
- tts/test_service.py
- VOICE_SERVICES_README.md

Files Modified:
- tickets/done/TICKET-047_hardware-purchases.md

Files Moved:
- tickets/backlog/TICKET-006_prototype-wake-word-node.md → tickets/done/
- tickets/backlog/TICKET-010_streaming-asr-service.md → tickets/done/
- tickets/backlog/TICKET-014_tts-service.md → tickets/done/

2026-01-12 22:22:38 -05:00

7.0 KiB

Raw Blame History

Tool-Calling Policy

This document defines the policy for when and how LLM agents should call tools in the Atlas voice agent system.

Overview

The tool-calling policy ensures that:

Tools are used appropriately and safely
High-risk actions require confirmation
Agents understand when to use tools vs. respond directly
Tool permissions are clearly defined

Tool Risk Categories

Low-Risk Tools (Always Allowed)

These tools provide information or perform safe operations that don't modify data or have external effects:

get_current_time - Read-only time information
get_date - Read-only date information
get_timezone_info - Read-only timezone information
convert_timezone - Read-only timezone conversion
weather - Read-only weather information (external API, but read-only)
list_tasks - Read-only task listing
list_timers - Read-only timer listing
list_notes - Read-only note listing
read_note - Read-only note reading
search_notes - Read-only note searching

Policy: These tools can be called automatically without user confirmation.

Medium-Risk Tools (Require Context Confirmation)

These tools modify local data but don't have external effects:

add_task - Creates a new task
update_task_status - Moves tasks between columns
create_timer - Creates a timer
create_reminder - Creates a reminder
cancel_timer - Cancels a timer/reminder
create_note - Creates a new note
append_to_note - Modifies an existing note

Policy:

Can be called when the user explicitly requests the action
Should confirm what will be done before execution (e.g., "I'll add 'buy milk' to your todo list")
No explicit user approval token required, but agent should be confident about user intent

High-Risk Tools (Require Explicit Confirmation)

These tools have external effects or significant consequences:

Future tools (not yet implemented):
- send_email - Sends email to external recipients
- create_calendar_event - Creates calendar events
- modify_calendar_event - Modifies existing events
- set_smart_home_device - Controls smart home devices
- purchase_item - Makes purchases
- execute_shell_command - Executes system commands

Policy:

MUST require explicit user confirmation token
Agent should explain what will happen
User must approve via client interface (not just LLM decision)
Confirmation token must be signed/validated

Tool Permission Matrix

Tool	Family Agent	Work Agent	Confirmation Required
`get_current_time`	✅	✅	No
`get_date`	✅	✅	No
`get_timezone_info`	✅	✅	No
`convert_timezone`	✅	✅	No
`weather`	✅	✅	No
`add_task`	✅ (home only)	✅ (work only)	Context
`update_task_status`	✅ (home only)	✅ (work only)	Context
`list_tasks`	✅ (home only)	✅ (work only)	No
`create_timer`	✅	✅	Context
`create_reminder`	✅	✅	Context
`list_timers`	✅	✅	No
`cancel_timer`	✅	✅	Context
`create_note`	✅ (home only)	✅ (work only)	Context
`read_note`	✅ (home only)	✅ (work only)	No
`append_to_note`	✅ (home only)	✅ (work only)	Context
`search_notes`	✅ (home only)	✅ (work only)	No
`list_notes`	✅ (home only)	✅ (work only)	No

Tool-Calling Guidelines

When to Call Tools

Always call tools when:

User explicitly requests information that requires a tool (e.g., "What time is it?")
User explicitly requests an action that requires a tool (e.g., "Add a task")
Tool would provide significantly better information than guessing
Tool is necessary to complete the user's request

Don't call tools when:

You can answer directly from context
User is asking a general question that doesn't require specific data
Tool call would be redundant (e.g., calling weather twice in quick succession)
User hasn't explicitly requested the action

Tool Selection

Choose the most specific tool:

If user asks "What time is it?", use get_current_time (not get_date)
If user asks "Set a timer", use create_timer (not create_reminder)
If user asks "What's on my todo list?", use list_tasks with status filter

Combine tools when helpful:

If user asks "What's the weather and what time is it?", call both weather and get_current_time
If user asks "What tasks do I have and what reminders?", call both list_tasks and list_timers

Error Handling

When a tool fails:

Explain what went wrong in user-friendly terms
Suggest alternatives if available
Don't retry automatically unless it's a transient error
If it's a permission error, explain the limitation clearly

Example: "I couldn't access that file because it's outside my allowed directories. I can only access files in the home notes directory."

Confirmation Flow

For Medium-Risk Tools

Agent explains action: "I'll add 'buy groceries' to your todo list."
Agent calls tool: Execute the tool call
Agent confirms completion: "Done! I've added it to your todo list."

For High-Risk Tools (Future)

Agent explains action: "I'm about to send an email to john@example.com with subject 'Meeting Notes'. Should I proceed?"
Agent requests confirmation: Wait for user approval token
If approved: Execute tool call
If rejected: Acknowledge and don't execute

Tool Argument Validation

Before calling a tool:

Validate required arguments are present
Validate argument types match schema
Validate argument values are reasonable (e.g., duration > 0)
Sanitize user input if needed

If validation fails:

Don't call the tool
Explain what's missing or invalid
Ask user to provide correct information

Rate Limiting

Some tools have rate limits:

weather: 60 requests/hour (enforced by tool)
Other tools: No explicit limits, but use reasonably

Guidelines:

Don't call the same tool repeatedly in quick succession
Cache results when appropriate
If rate limit is hit, explain and suggest waiting

Tool Result Handling

After tool execution:

Parse result: Extract relevant information from tool response
Format for user: Present result in user-friendly format
Provide context: Add relevant context or suggestions
Handle empty results: If no results, explain clearly

Example:

Tool returns: {"tasks": []}
Agent says: "You don't have any tasks in your todo list right now. Would you like me to add one?"

Escalation Rules

If user requests something you cannot do:

Explain the limitation clearly
Suggest alternatives if available
Don't attempt to bypass restrictions
Be helpful about what you CAN do

Example: "I can't access work files, but I can help you with home tasks and notes. Would you like me to create a note about what you need to do?"

Version

Version: 1.0
Last Updated: 2026-01-06
Applies To: Both Family Agent and Work Agent

7.0 KiB Raw Blame History