diff --git a/.gitignore b/.gitignore index 684a756..9720f3b 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,7 @@ *.pyc dist/ build/ +docs/ *.egg-info/ *.egg *.pyc diff --git a/README.md b/README.md index 71d425b..3440fdc 100644 --- a/README.md +++ b/README.md @@ -60,6 +60,12 @@ ## πŸ“¦ Install +**Install with [uv](https://github.com/astral-sh/uv)** (recommended for speed) + +```bash +uv tool install nanobot-ai +``` + **Install from PyPi** ```bash @@ -74,6 +80,14 @@ cd nanobot pip install -e . ``` +**Install with uv** + +```bash +uv venv +source .venv/bin/activate +uv pip install nanobot-ai +``` + ## πŸš€ Quick Start > [!TIP] @@ -331,22 +345,29 @@ nanobot/ └── cli/ # πŸ–₯️ Commands ``` -## πŸ—ΊοΈ Roadmap +## 🀝 Contribute & Roadmap +PRs welcome! The codebase is intentionally small and readable. πŸ€— + +**Roadmap** β€” Pick an item and [open a PR](https://github.com/HKUDS/nanobot/pulls)! + +- [x] **Voice Transcription** β€” Support for Groq Whisper (Issue #13) - [ ] **Multi-modal** β€” See and hear (images, voice, video) - [ ] **Long-term memory** β€” Never forget important context - [ ] **Better reasoning** β€” Multi-step planning and reflection - [ ] **More integrations** β€” Discord, Slack, email, calendar - [ ] **Self-improvement** β€” Learn from feedback and mistakes -**Want to help?** Pick an item and [open a PR](https://github.com/HKUDS/nanobot/pulls)! +### Contributors + + + + --- ## ⭐ Star History -*Community Growth Trajectory* -
@@ -357,12 +378,6 @@ nanobot/
---- - -## 🀝 Contribute - -PRs welcome! The codebase is intentionally small and readable. πŸ€— -

Thanks for visiting ✨ nanobot!

Views diff --git a/bridge/src/whatsapp.ts b/bridge/src/whatsapp.ts index 4185632..a3a82fc 100644 --- a/bridge/src/whatsapp.ts +++ b/bridge/src/whatsapp.ts @@ -160,6 +160,11 @@ export class WhatsAppClient { return `[Document] ${message.documentMessage.caption}`; } + // Voice/Audio message + if (message.audioMessage) { + return `[Voice Message]`; + } + return null; } diff --git a/nanobot/agent/context.py b/nanobot/agent/context.py index aaba890..f70103d 100644 --- a/nanobot/agent/context.py +++ b/nanobot/agent/context.py @@ -1,5 +1,7 @@ """Context builder for assembling agent prompts.""" +import base64 +import mimetypes from pathlib import Path from typing import Any @@ -114,32 +116,53 @@ When remembering something, write to {workspace_path}/memory/MEMORY.md""" self, history: list[dict[str, Any]], current_message: str, - skill_names: list[str] | None = None + skill_names: list[str] | None = None, + media: list[str] | None = None, ) -> list[dict[str, Any]]: """ Build the complete message list for an LLM call. - + Args: history: Previous conversation messages. current_message: The new user message. skill_names: Optional skills to include. - + media: Optional list of local file paths for images/media. + Returns: List of messages including system prompt. """ messages = [] - + # System prompt system_prompt = self.build_system_prompt(skill_names) messages.append({"role": "system", "content": system_prompt}) - + # History messages.extend(history) - - # Current message - messages.append({"role": "user", "content": current_message}) - + + # Current message (with optional image attachments) + user_content = self._build_user_content(current_message, media) + messages.append({"role": "user", "content": user_content}) + return messages + + def _build_user_content(self, text: str, media: list[str] | None) -> str | list[dict[str, Any]]: + """Build user message content with optional base64-encoded images.""" + if not media: + return text + + images = [] + for path in media: + p = Path(path) + mime, _ = mimetypes.guess_type(path) + if not p.is_file() or not mime or not mime.startswith("image/"): + continue + b64 = base64.b64encode(p.read_bytes()).decode() + images.append({"type": "image_url", "image_url": {"url": f"data:{mime};base64,{b64}"}}) + + if not images: + return text + return images + [{"type": "text", "text": text}] def add_tool_result( self, diff --git a/nanobot/agent/loop.py b/nanobot/agent/loop.py index 6fe2cfd..4a96b84 100644 --- a/nanobot/agent/loop.py +++ b/nanobot/agent/loop.py @@ -152,7 +152,8 @@ class AgentLoop: # Build initial messages (use get_history for LLM-formatted messages) messages = self.context.build_messages( history=session.get_history(), - current_message=msg.content + current_message=msg.content, + media=msg.media if msg.media else None, ) # Agent loop @@ -189,7 +190,8 @@ class AgentLoop: # Execute tools for tool_call in response.tool_calls: - logger.debug(f"Executing tool: {tool_call.name}") + args_str = json.dumps(tool_call.arguments) + logger.debug(f"Executing tool: {tool_call.name} with arguments: {args_str}") result = await self.tools.execute(tool_call.name, tool_call.arguments) messages = self.context.add_tool_result( messages, tool_call.id, tool_call.name, result @@ -281,7 +283,8 @@ class AgentLoop: ) for tool_call in response.tool_calls: - logger.debug(f"Executing tool: {tool_call.name}") + args_str = json.dumps(tool_call.arguments) + logger.debug(f"Executing tool: {tool_call.name} with arguments: {args_str}") result = await self.tools.execute(tool_call.name, tool_call.arguments) messages = self.context.add_tool_result( messages, tool_call.id, tool_call.name, result diff --git a/nanobot/channels/base.py b/nanobot/channels/base.py index d83367c..8f16399 100644 --- a/nanobot/channels/base.py +++ b/nanobot/channels/base.py @@ -72,7 +72,14 @@ class BaseChannel(ABC): if not allow_list: return True - return str(sender_id) in allow_list + sender_str = str(sender_id) + if sender_str in allow_list: + return True + if "|" in sender_str: + for part in sender_str.split("|"): + if part and part in allow_list: + return True + return False async def _handle_message( self, diff --git a/nanobot/channels/manager.py b/nanobot/channels/manager.py index 04abf5f..73c3334 100644 --- a/nanobot/channels/manager.py +++ b/nanobot/channels/manager.py @@ -37,7 +37,9 @@ class ChannelManager: try: from nanobot.channels.telegram import TelegramChannel self.channels["telegram"] = TelegramChannel( - self.config.channels.telegram, self.bus + self.config.channels.telegram, + self.bus, + groq_api_key=self.config.providers.groq.api_key, ) logger.info("Telegram channel enabled") except ImportError as e: diff --git a/nanobot/channels/telegram.py b/nanobot/channels/telegram.py index 840c250..23e1de0 100644 --- a/nanobot/channels/telegram.py +++ b/nanobot/channels/telegram.py @@ -85,9 +85,10 @@ class TelegramChannel(BaseChannel): name = "telegram" - def __init__(self, config: TelegramConfig, bus: MessageBus): + def __init__(self, config: TelegramConfig, bus: MessageBus, groq_api_key: str = ""): super().__init__(config, bus) self.config: TelegramConfig = config + self.groq_api_key = groq_api_key self._app: Application | None = None self._chat_ids: dict[str, int] = {} # Map sender_id to chat_id for replies @@ -199,8 +200,10 @@ class TelegramChannel(BaseChannel): user = update.effective_user chat_id = message.chat_id - # Get sender identifier (prefer username, fallback to user_id) - sender_id = str(user.username or user.id) + # Use stable numeric ID, but keep username for allowlist compatibility + sender_id = str(user.id) + if user.username: + sender_id = f"{sender_id}|{user.username}" # Store chat_id for replies self._chat_ids[sender_id] = chat_id @@ -247,7 +250,20 @@ class TelegramChannel(BaseChannel): await file.download_to_drive(str(file_path)) media_paths.append(str(file_path)) - content_parts.append(f"[{media_type}: {file_path}]") + + # Handle voice transcription + if media_type == "voice" or media_type == "audio": + from nanobot.providers.transcription import GroqTranscriptionProvider + transcriber = GroqTranscriptionProvider(api_key=self.groq_api_key) + transcription = await transcriber.transcribe(file_path) + if transcription: + logger.info(f"Transcribed {media_type}: {transcription[:50]}...") + content_parts.append(f"[transcription: {transcription}]") + else: + content_parts.append(f"[{media_type}: {file_path}]") + else: + content_parts.append(f"[{media_type}: {file_path}]") + logger.debug(f"Downloaded {media_type} to {file_path}") except Exception as e: logger.error(f"Failed to download media: {e}") diff --git a/nanobot/channels/whatsapp.py b/nanobot/channels/whatsapp.py index efbd3e1..c14a6c3 100644 --- a/nanobot/channels/whatsapp.py +++ b/nanobot/channels/whatsapp.py @@ -107,6 +107,11 @@ class WhatsAppChannel(BaseChannel): # Extract just the phone number as chat_id chat_id = sender.split("@")[0] if "@" in sender else sender + # Handle voice transcription if it's a voice message + if content == "[Voice Message]": + logger.info(f"Voice message received from {chat_id}, but direct download from bridge is not yet supported.") + content = "[Voice Message: Transcription not available for WhatsApp yet]" + await self._handle_message( sender_id=chat_id, chat_id=sender, # Use full JID for replies diff --git a/nanobot/cli/commands.py b/nanobot/cli/commands.py index 8dcc460..d293564 100644 --- a/nanobot/cli/commands.py +++ b/nanobot/cli/commands.py @@ -506,6 +506,7 @@ def cron_add( at: str = typer.Option(None, "--at", help="Run once at time (ISO format)"), deliver: bool = typer.Option(False, "--deliver", "-d", help="Deliver response to channel"), to: str = typer.Option(None, "--to", help="Recipient for delivery"), + channel: str = typer.Option(None, "--channel", help="Channel for delivery (e.g. 'telegram', 'whatsapp')"), ): """Add a scheduled job.""" from nanobot.config.loader import get_data_dir @@ -534,6 +535,7 @@ def cron_add( message=message, deliver=deliver, to=to, + channel=channel, ) console.print(f"[green]βœ“[/green] Added job '{job.name}' ({job.id})") @@ -624,11 +626,13 @@ def status(): has_openrouter = bool(config.providers.openrouter.api_key) has_anthropic = bool(config.providers.anthropic.api_key) has_openai = bool(config.providers.openai.api_key) + has_gemini = bool(config.providers.gemini.api_key) has_vllm = bool(config.providers.vllm.api_base) console.print(f"OpenRouter API: {'[green]βœ“[/green]' if has_openrouter else '[dim]not set[/dim]'}") console.print(f"Anthropic API: {'[green]βœ“[/green]' if has_anthropic else '[dim]not set[/dim]'}") console.print(f"OpenAI API: {'[green]βœ“[/green]' if has_openai else '[dim]not set[/dim]'}") + console.print(f"Gemini API: {'[green]βœ“[/green]' if has_gemini else '[dim]not set[/dim]'}") vllm_status = f"[green]βœ“ {config.providers.vllm.api_base}[/green]" if has_vllm else "[dim]not set[/dim]" console.print(f"vLLM/Local: {vllm_status}") diff --git a/nanobot/config/schema.py b/nanobot/config/schema.py index e30fbb2..71e3361 100644 --- a/nanobot/config/schema.py +++ b/nanobot/config/schema.py @@ -50,7 +50,10 @@ class ProvidersConfig(BaseModel): anthropic: ProviderConfig = Field(default_factory=ProviderConfig) openai: ProviderConfig = Field(default_factory=ProviderConfig) openrouter: ProviderConfig = Field(default_factory=ProviderConfig) + groq: ProviderConfig = Field(default_factory=ProviderConfig) + zhipu: ProviderConfig = Field(default_factory=ProviderConfig) vllm: ProviderConfig = Field(default_factory=ProviderConfig) + gemini: ProviderConfig = Field(default_factory=ProviderConfig) class GatewayConfig(BaseModel): @@ -89,19 +92,24 @@ class Config(BaseSettings): return Path(self.agents.defaults.workspace).expanduser() def get_api_key(self) -> str | None: - """Get API key in priority order: OpenRouter > Anthropic > OpenAI > vLLM.""" + """Get API key in priority order: OpenRouter > Anthropic > OpenAI > Gemini > Zhipu > Groq > vLLM.""" return ( self.providers.openrouter.api_key or self.providers.anthropic.api_key or self.providers.openai.api_key or + self.providers.gemini.api_key or + self.providers.zhipu.api_key or + self.providers.groq.api_key or self.providers.vllm.api_key or None ) def get_api_base(self) -> str | None: - """Get API base URL if using OpenRouter or vLLM.""" + """Get API base URL if using OpenRouter, Zhipu or vLLM.""" if self.providers.openrouter.api_key: return self.providers.openrouter.api_base or "https://openrouter.ai/api/v1" + if self.providers.zhipu.api_key: + return self.providers.zhipu.api_base if self.providers.vllm.api_base: return self.providers.vllm.api_base return None diff --git a/nanobot/providers/litellm_provider.py b/nanobot/providers/litellm_provider.py index 4e7305b..547626d 100644 --- a/nanobot/providers/litellm_provider.py +++ b/nanobot/providers/litellm_provider.py @@ -13,7 +13,7 @@ class LiteLLMProvider(LLMProvider): """ LLM provider using LiteLLM for multi-provider support. - Supports OpenRouter, Anthropic, OpenAI, and many other providers through + Supports OpenRouter, Anthropic, OpenAI, Gemini, and many other providers through a unified interface. """ @@ -47,6 +47,12 @@ class LiteLLMProvider(LLMProvider): os.environ.setdefault("ANTHROPIC_API_KEY", api_key) elif "openai" in default_model or "gpt" in default_model: os.environ.setdefault("OPENAI_API_KEY", api_key) + elif "gemini" in default_model.lower(): + os.environ.setdefault("GEMINI_API_KEY", api_key) + elif "zhipu" in default_model or "glm" in default_model or "zai" in default_model: + os.environ.setdefault("ZHIPUAI_API_KEY", api_key) + elif "groq" in default_model: + os.environ.setdefault("GROQ_API_KEY", api_key) if api_base: litellm.api_base = api_base @@ -81,11 +87,24 @@ class LiteLLMProvider(LLMProvider): if self.is_openrouter and not model.startswith("openrouter/"): model = f"openrouter/{model}" + # For Zhipu/Z.ai, ensure prefix is present + # Handle cases like "glm-4.7-flash" -> "zhipu/glm-4.7-flash" + if ("glm" in model.lower() or "zhipu" in model.lower()) and not ( + model.startswith("zhipu/") or + model.startswith("zai/") or + model.startswith("openrouter/") + ): + model = f"zhipu/{model}" + # For vLLM, use hosted_vllm/ prefix per LiteLLM docs # Convert openai/ prefix to hosted_vllm/ if user specified it if self.is_vllm: model = f"hosted_vllm/{model}" + # For Gemini, ensure gemini/ prefix if not already present + if "gemini" in model.lower() and not model.startswith("gemini/"): + model = f"gemini/{model}" + kwargs: dict[str, Any] = { "model": model, "messages": messages, diff --git a/nanobot/providers/transcription.py b/nanobot/providers/transcription.py new file mode 100644 index 0000000..8ce909b --- /dev/null +++ b/nanobot/providers/transcription.py @@ -0,0 +1,65 @@ +"""Voice transcription provider using Groq.""" + +import os +from pathlib import Path +from typing import Any + +import httpx +from loguru import logger + + +class GroqTranscriptionProvider: + """ + Voice transcription provider using Groq's Whisper API. + + Groq offers extremely fast transcription with a generous free tier. + """ + + def __init__(self, api_key: str | None = None): + self.api_key = api_key or os.environ.get("GROQ_API_KEY") + self.api_url = "https://api.groq.com/openai/v1/audio/transcriptions" + + async def transcribe(self, file_path: str | Path) -> str: + """ + Transcribe an audio file using Groq. + + Args: + file_path: Path to the audio file. + + Returns: + Transcribed text. + """ + if not self.api_key: + logger.warning("Groq API key not configured for transcription") + return "" + + path = Path(file_path) + if not path.exists(): + logger.error(f"Audio file not found: {file_path}") + return "" + + try: + async with httpx.AsyncClient() as client: + with open(path, "rb") as f: + files = { + "file": (path.name, f), + "model": (None, "whisper-large-v3"), + } + headers = { + "Authorization": f"Bearer {self.api_key}", + } + + response = await client.post( + self.api_url, + headers=headers, + files=files, + timeout=60.0 + ) + + response.raise_for_status() + data = response.json() + return data.get("text", "") + + except Exception as e: + logger.error(f"Groq transcription error: {e}") + return "" diff --git a/workspace/AGENTS.md b/workspace/AGENTS.md index 0e5a055..a99a7b4 100644 --- a/workspace/AGENTS.md +++ b/workspace/AGENTS.md @@ -22,6 +22,16 @@ You have access to: - Use `memory/` directory for daily notes - Use `MEMORY.md` for long-term information +## Scheduled Reminders + +When user asks for a reminder at a specific time, use `exec` to run: +``` +nanobot cron add --name "reminder" --message "Your message" --at "YYYY-MM-DDTHH:MM:SS" --deliver --to "USER_ID" --channel "CHANNEL" +``` +Get USER_ID and CHANNEL from the current session (e.g., `8281248569` and `telegram` from `telegram:8281248569`). + +**Do NOT just write reminders to MEMORY.md** β€” that won't trigger actual notifications. + ## Heartbeat Tasks `HEARTBEAT.md` is checked every 30 minutes. You can manage periodic tasks by editing this file: