Back to home
Table of contents

Documentation

Everything you need to install, configure, and use OpenYabby, the voice-first AI system that commands your machine with autonomous agent teams.

v0.1.3 · Open-source release · Node.js + Claude CLI
🦞

What is OpenYabby?

OpenYabby is an open-source, voice-first AI assistant with multi-agent orchestration. It combines OpenAI's Realtime API (WebRTC) for bidirectional voice with Claude CLI spawned as child processes for autonomous task execution.

Say "Yabby" and watch a team of AI agents plan, code, design, and deploy projects with full system access to your Mac (terminal, files, browser, AppleScript, GUI). No buttons, no typing. Just your voice.

🎤 Voice-First

Bidirectional audio via WebRTC. Wake word "Yabby" with client-side VAD. Speak naturally in any language.

🤖 Multi-Agent Teams

Lead agents create managers and sub-agents. Auto-orchestration handles coordination, reviews, and QA.

💻 Full System Access

Each agent runs as a Claude CLI process with access to bash, AppleScript, file system, and browser automation.

🧠 Persistent Memory

Mem0 extracts facts from conversation. Yabby learns who you are, your preferences, your context across sessions.

💬 Multi-Channel

Interact via voice, web dashboard, WhatsApp, Telegram, Slack, Discord, or Signal.

🔌 37 Connectors & growing

GitHub, Linear, Jira, Slack, Stripe, Google Sheets, and more via built-in integrations or MCP servers.

Tech Stack

TechnologyRole
OpenAI Realtime APIVoice I/O via WebRTC
Claude CLIAgent brain (task execution)
Node.js + ExpressBackend server
PostgreSQLSource of truth database
RedisCache, pub/sub, live status
Mem0 + QdrantPersistent memory & vector search
WhisperAudio transcription (wake word)
Silero VADClient-side voice activity detection (ONNX)
BaileysWhatsApp Web protocol
grammYTelegram Bot API
📦

Installation

Prerequisites

  • Node.js 20+ (nodejs.org)
  • PostgreSQL:create a database named yabby
  • Redis running on localhost:6379
  • Claude CLI:npm install -g @anthropic-ai/claude-code
  • OpenAI API key with Realtime API access

Quick Start

Terminal
# Clone the repository
git clone https://github.com/OpenYabby/OpenYabby.git
cd OpenYabby

# Install dependencies
npm install

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Start (migrations run automatically)
npm run dev
Recommended: Use npm run dev to start both Node.js (port 3000) and the Speaker verification service (port 3001). Open http://localhost:3000 in your browser.

Environment Variables

VariableRequiredDescription
OPENAI_API_KEYYesOpenAI key (Realtime API, Whisper, Mem0)
PG_HOST / PG_PORT / PG_DATABASE / PG_USER / PG_PASSWORDYesPostgreSQL connection
REDIS_URLYesRedis connection string
CLAUDE_CMDNoPath to Claude CLI binary (default: claude)
PORTNoServer port (default: 3000)
SPEAKER_SERVICE_URLNoSpeaker verification endpoint (default: http://localhost:3001)
SPEAKER_THRESHOLDNoVoice matching strictness 0.0–1.0 (default: 0.25)
DISABLE_TUNNELNoSet true to skip relay tunnel
SANDBOX_ROOTNoOverride project sandbox location

Database Migrations

Migrations run automatically on startup. All migrations are idempotent (IF NOT EXISTS, ON CONFLICT). No manual migration step needed. Just start the server.

Docker (Partial)

Terminal
# Start PostgreSQL + Redis via Docker
docker-compose up -d

# Then start Yabby natively (Claude CLI can't run in Docker)
npm run dev
Note: Claude CLI cannot run inside Docker containers. Use Docker only for PostgreSQL and Redis, then run the Node.js server natively.
📊

Web Dashboard

The OpenYabby web interface is a single-page application served at http://localhost:3000. It provides a complete control center for managing agents, projects, tasks, and voice interaction.

Pages

PageRouteDescription
Dashboard/Overview with running/done/error task stats, project cards with progress, and real-time activity feed
Projects/projectsGrid of all projects with status filters, progress rings, inline editing
Tasks/tasksFull task manager with status filters, activity logs, pause/resume/kill controls
Simple Tasks/simple-tasksSimplified task view for quick monitoring
Agents/agentsAgent directory with Yabby (main AI) pinned at top, standalone and project agents listed
Scheduling/scheduled-tasksScheduled/recurring task management with cron and interval support
Channels/channelsMessaging channel configuration (WhatsApp, Telegram, Slack, Discord, Signal)
Connectors/connectorsExternal integrations catalog with MCP server management
Settings/settingsVoice, memory, auth, project, and usage configuration

Features

  • Real-time updates via SSE (Server-Sent Events): tasks, heartbeats, notifications stream live
  • Global search (Ctrl+/): search across projects, tasks, and agents
  • Keyboard shortcuts:Ctrl+1 Dashboard, Ctrl+2 Tasks, Ctrl+3 Agents, Cmd+K toggle Yabby chat
  • Collapsible sidebar with state persisted to localStorage
  • Notification bell with dropdown for speaker notifications and plan reviews
  • i18n:supports French, English, Spanish, German
  • Dark/Light theme toggle
  • Onboarding wizard on first visit
🎤

Voice Commands

OpenYabby is designed voice-first. The entire system (apps, agents, projects, files, terminal) can be controlled hands-free through natural conversation. Voice is the primary way to interact with Yabby.

Getting Started: Activate Voice

1

Click the Mic

Click the pulsing orb (bottom-right of the dashboard) to activate your microphone and start the WebRTC voice session.

2

Say "Yabby"

The wake word activates the AI. Just say "Yabby" followed by your command. The orb pulses green when listening.

3

Always Listening

Once activated, the mic stays on. Yabby listens continuously. After 10 min of silence, it auto-suspends and listens for the wake word again.

Visual indicators: The orb at the bottom-right of the screen reflects the voice state. Pulsing purple = idle/wake word listening. Green glow = actively listening to you. Blue = Yabby is working on a task. Solid green = task completed.

Voice Pipeline: How It Works

1

Wake Word Detection

When the session is suspended, the browser listens using Silero VAD (ONNX model, runs locally in your browser). When speech is detected, audio is sent to the server for Whisper transcription. If "Yabby" is detected, the full voice session resumes instantly.

2

WebRTC Connection

The browser establishes a WebRTC connection via OpenAI's Realtime API. Audio streams bidirectionally: you speak, Yabby responds with voice. Tool calls are received via DataChannel.

3

Tool Execution

When Yabby decides to act, it calls tools (create project, spawn task, open app...). The call is dispatched client-side to the backend REST API. Results are sent back via DataChannel and Yabby announces the result by voice simultaneously.

4

Auto-Suspend

After 10 minutes of inactivity, the session suspends and wake word listening resumes. Active tasks extend the timeout automatically. Yabby won't fall asleep while your agents are working.

🎤 How voice works

Say "Yabby" once to wake up the voice session. After that, just speak naturally. No need to repeat "Yabby" before every command. The session stays active until you put it to sleep or 10 minutes of inactivity pass.

🚀

Build & Ship Projects

From idea to deployed project. One voice command kicks off the full orchestration pipeline.

"Build me a SaaS landing page with Stripe checkout"
Creates a project, assigns a lead agent who recruits designers and devs, sequences tasks, and delivers a working site
"Set up a REST API with auth, user CRUD, and PostgreSQL"
Lead agent creates the architecture, spins up backend + DB agents in parallel, runs tests, and delivers the project
"Refactor the auth module to use JWT"
Spawns a task in the project sandbox. The agent reads the existing code, rewrites auth, updates tests, and commits
⌨️

System & Dev Tools

Full terminal access. File operations, package management, git, servers, scripts, and more.

"Commit everything with message 'fix login flow'"
Executes git add . && git commit -m "fix login flow" in the current project sandbox
"Start the dev server"
Runs npm run dev in the project directory and confirms the server is up
"Open Safari"
Runs open -a Safari. Works with any installed Mac application
"Install ffmpeg via Homebrew"
Spawns a task that runs brew install ffmpeg and reports when done
"Take a screenshot"
Acquires the GUI lock, captures the screen, and saves it to the desktop
🤖

Manage Agents

Create agents, talk to them, switch context, check their status. All hands-free.

"Create an agent named Luna, role: crypto analyst"
Creates a standalone agent "Luna" ready to receive instructions. Agents persist across sessions.
"Tell Nolan to fix the login bug"
Sends the instruction to agent "Nolan" via talk_to_agent. A task spawns in Nolan's context to execute the fix.
"What's John's status?"
Checks John's heartbeat, active task, progress percentage, and reads the summary aloud
"Switch me to Sofia"
Switches voice context to agent Sofia (switch_to_agent). You now talk directly to her. Say "back" to switch back.
📁

Manage Projects

Create, monitor, and interact with full projects. Yabby handles the entire lifecycle.

"Create a new project for an e-commerce site"
Creates the project, assigns a lead agent, sets up the sandbox directory, and starts the discovery phase. The lead may ask you clarifying questions by voice.
"How's the MathCalc project going?"
Checks all agent heartbeats, task statuses, and progress. Reads a summary: "3 tasks done, 1 running, 65% overall"
"Open the project folder in VS Code"
Runs code ~/Desktop/Yabby\ Projects/e-commerce-abc123/ to open the project sandbox
"Approve the plan"
Approves the lead agent's pending plan review. Execution phase begins immediately.
🔄

Complete Voice Workflow

Commands chain naturally, like a conversation. Here's a full project workflow done entirely by voice:

🎬 Example: Building a recipe website by voice
🎤 "Yabby" (wake word)
→ Session starts. Yabby is listening.
🎤 "Create a project for a recipe website"
→ Project created. Lead agent "Marc" assigned. Discovery phase starts.
💬 Marc asks: "Light or dark theme? How many recipe categories?"
→ You answer by voice. Marc saves the answers and moves to planning.
📋 Marc submits a plan. A modal appears for your approval.
→ You say "Approve the plan" or click Approve. Execution begins.
🎤 "Tell the lead to start with the design"
→ Instruction forwarded to lead. He delegates to Sofia (Designer).
🎤 "Show me the project status"
→ "Sofia: design running (45%). Lucia: waiting. Karim: waiting. Overall: 15%"
🎤 "Open the project folder in VS Code"
→ VS Code opens with the project sandbox. You can inspect files while agents work.
🔔 Yabby: "The Recipe project is complete. QA passed, 12 files created."
→ Voice notification. Zero micromanagement required.
🎛️

Control Yabby

Pause, resume, interrupt. You're always in charge.

"Go to sleep"
Activates sleep_mode. Voice session suspends, wake word listening resumes. Tasks keep running.
"Yabby" (wake word to resume)
Wakes up from sleep. Full voice session resumes instantly. Then give your next command.
"Stop the current task"
Pauses or kills the currently running task (SIGTERM). You can resume it later.
"Back"
If you're talking to an agent (Sofia, Marc...), this switches back to Yabby's main context via back_to_yabby.
🎨

Everyday Tasks & Mac Control

Full Mac access: AppleScript, bash, GUI control, screenshots, browser automation.

"Play some jazz on Spotify"
Opens Spotify via AppleScript and starts playing a jazz playlist
"Set the volume to 30%"
Adjusts system volume via osascript -e 'set volume output volume 30'
"Send an email to Pierre about tomorrow's meeting"
Composes and sends an email via AppleScript (Mail.app) or a connected email connector
"What's the weather in Paris?"
Fetches weather data via a web search or connected API and reads the forecast aloud
"Summarize the last 10 commits in my repo"
Runs git log --oneline -10 and reads back a digest of recent changes
💡 Tips & Best Practices
  • Speak naturally. No rigid syntax needed. "Can you open the terminal?" works as well as "Open the terminal".
  • Chain commands. After one command completes, just keep talking. No need to say "Yabby" again while the session is active.
  • Interrupt anytime. You can speak while Yabby is responding. It stops and listens to your new instruction.
  • Use any language. Yabby understands French, English, Spanish, German, and more. You can mix freely.
  • Monitor visually. The dashboard updates in real time. Watch task progress while you give voice commands.
  • Text fallback. Press Cmd+K to open the chat window. Type commands when you can't speak.
  • Agent conversations. Say "Switch me to [name]" to talk directly to an agent. Say "back" to return.
  • Background work. Say "go to sleep". Tasks and agents keep running. You'll get a voice notification when something finishes.

Speaker Verification (Optional)

An optional Python microservice (FastAPI + SpeechBrain ECAPA-TDNN) filters wake word detection to only your voice. This reduces false positives by 90%+ in multi-person environments.

  • Go to Settings → Speaker Verification
  • Record 3 voice samples saying "Yabby"
  • Once enrolled, only your voice triggers wake word detection
  • Embeddings are stored locally (never sent to cloud)
  • If the service is down, detection continues normally (fail-open)

Voice Configuration

OptionValuesDefault
Voiceash, ballad, coral, sage, verse, marinmarin
Noise Reductionnear_field, far_field, offnear_field
Turn Detectionserver_vad, semantic_vadserver_vad
Mic Enabledtrue / falsetrue

Text-Only Mode

You can also interact with Yabby by typing in the chat window (Cmd+K to toggle). Text messages are sent via DataChannel or trigger a text-only connection if the session is not active. Useful when you can't speak.

Noise Filter

Client-side regex filters block low-value utterances ("ok", "oui", "mmh", "non"... 44 patterns) from being sent to the voice model. Short audio clips (<0.2s or <3 words) are also discarded.

🤖

Agents

Agents are the core workforce of OpenYabby. Each agent is a Claude CLI process with its own session, system prompt, and working directory.

Agent Tiers

Lead Agent

Project Director

Full API access. Orchestrates discovery, planning, execution, and QA. Creates teams, submits plans for approval, reports via voice.

Manager

Coordinator

Mid-level. Manages sub-agents, auto-triggered for review when children complete tasks. Reports to lead or parent manager.

Sub-Agent

Executor

Executes specific work (code, design, QA). Sends completion report to parent agent via agent-bus messaging.

Standalone Agents

You can create agents outside of any project. These are standalone agents that persist across sessions and can be instructed directly via voice or chat.

  • Standalone agents must have a valid human first name (e.g., "Marc", "Sofia", "Karim"). Technical names are rejected.
  • They get their own working directory and conversation thread
  • You can switch to an agent's context via voice: "Yabby, parle à Marc"
  • Agents can be suspended, activated, or deleted from the Agents page

Agent Communication

Agents communicate via Redis pub/sub (yabby:agent-bus channel). When a sub-agent completes a task, the orchestrator automatically triggers a review task on the parent manager (with 60-second debounce to batch simultaneous completions).

Agent Voice Switching

You can talk directly to a specific agent. Yabby swaps the voice session instructions to that agent's context (same WebRTC connection persists). Say "Back to Yabby" to return.

📁

Projects

Projects are containers for multi-agent work. Each project gets an isolated sandbox directory at ~/Desktop/Yabby Projects/{name}-{id}/ (configurable), auto-initialized with src/, docs/, README, .gitignore, and git init.

Project Lifecycle (5 Phases)

1

Discovery

The lead agent asks clarifying questions before planning. Questions appear as modals or voice prompts. Supports text fields, dropdowns, and connector selection questions.

2

Planning

The lead writes a PLAN.md and submits it for your approval. You can approve, revise (with feedback), or cancel the project.

3

Execution

The lead creates agents (managers and sub-agents), assigns tasks, and coordinates parallel work. Agents have full system access within the sandbox.

4

Review

When sub-agents complete tasks, the orchestrator auto-triggers the lead for review. The lead inspects deliverables and decides next steps.

5

QA

Specialized QA agents run test plans. Corrections loop until clean. You get a voice notification when the project is complete. Zero micromanagement.

Creating a Project

There are several ways to create a project:

  • Voice: "Yabby, crée-moi un portfolio dark mode avec une section blog"
  • Dashboard: Click "New Project" on the Projects page
  • Channel: Send a project request via WhatsApp, Telegram, or any connected channel

Tasks

A task is a Claude CLI child process spawned to execute a specific instruction. Each task runs with full system access (bash, AppleScript, GUI, file system, browser).

Task Lifecycle

StatusDescription
runningCLI process is active and executing
doneTask completed successfully
errorTask failed or process crashed
pausedTask paused (SIGTERM sent)
killedTask force-killed (SIGKILL)
paused_llm_limitClaude CLI hit daily quota. Auto-resumes when limit resets

Task Runners

OpenYabby supports multiple CLI runners for task execution:

RunnerDescription
claudeClaude Code CLI (default). Full-featured, recommended
codexOpenAI Codex CLI
aiderAider (AI pair programming)
gooseGoose AI coding assistant
clineCline CLI
continueContinue CLI
customCustom binary path

Logging

Every task produces two log files:

  • logs/{taskId}-activity.log: Structured activity log (tool calls, status changes)
  • logs/{taskId}-raw.log: Raw CLI output

Background Tasks (v0.1.3+)

When an agent calls Bash(run_in_background=true) for a long-running job (batch script, dev server, scraper), OpenYabby tracks it as a background task, separate from the parent CLI task. The bg process survives the CLI's exit and is monitored at the OS level via PID polling — the agent gets notified asynchronously when it ends, even minutes or hours later.

How tracking works

  1. A PreToolUse hook intercepts each Bash(run_in_background=true) call and writes a per-call bookkeeper script that captures the host PID and exit code via wait $C; rc=$?.
  2. The bookkeeper detaches via nohup so the bg child survives the CLI exit.
  3. A central bg-watcher polls kill -0 <pid> every 30 seconds. When the PID disappears, it reads the exit code and routes the agent's next-turn notification:
Final statusTriggerNotification to agent
completedexit code 0, not a service[BG_COMPLETED] — success report
failednon-zero exit or SIGKILL/OOM (no exit code captured)[BG_FAILED] — diagnostic prompt
service_diedtagged with [bg:service], any exit[BG_SERVICE_DIED] — neutral, ask what to do
orphanedrow was running at startup but PID is gonenone (server restart recovery)
stoppedkilled by kill_bg_task tool or manual user actionnone (intentional)

Service tag

For permanent services (Node/PHP/Streamlit dev servers, watchers), the agent appends [bg:service] to the tool's description:

Bash(
  command="node server.js",
  description="dev preview server :3000 [bg:service]",
  run_in_background=true
)

The hook drops a marker file so the watcher uses the [BG_SERVICE_DIED] wording instead of treating an exit as a "completion".

Introspection & control tools

Agents have 5 dedicated tools to inspect and control their bg tasks (callable via POST /api/tools/execute):

ToolPurpose
list_bg_tasksCompact list of the agent's bg tasks. Running first, status counts, optional filter.
bg_task_detailFull info on one task: real-time PID liveness check, elapsed, exit code, exit signal, output path.
get_bg_task_logRead the output. Modes: tail (default 4 KB), head, or grep with a pattern. Hard cap 64 KB per call.
kill_bg_taskStop a running task: SIGTERM, then SIGKILL after 3 s if still alive. Marks the row stopped.
register_external_bgRegister a process spawned outside the run_in_background path (typical start.sh doing php -S ... & internally). Provide the PID + description + is_service flag.

Web UI

The Activity page shows a Background Tasks panel below the CLI events stream, with running-first sort, status badges, elapsed timer, service marker, and a 15-second auto-refresh. Both sections scroll independently.

Resilience

On server restart, the startup sweep doesn't blindly orphan rows in running state — it checks each row's PID with kill -0 first. Live bg jobs survive Yabby restarts (the watcher re-attaches automatically). Only truly-gone PIDs are marked orphaned.

Retry Detection

OpenYabby monitors task activity for infinite retry loops. When a pattern of repeated tool calls is detected (30 calls scanned), the system intervenes to unblock the stuck agent.

GUI Lock

Tasks that need to interact with the GUI (screenshots, mouse clicks) must acquire a lock (yabby:gui_lock in Redis). Only one task can control the GUI at a time. Lock auto-expires after 5 minutes or when the task finishes.

💬

WhatsApp

OpenYabby connects directly to WhatsApp Web via the Baileys protocol. No Meta Business API, no approval process, no cost. Just scan a QR code and you're live. Everything runs inside isolated groups so your personal chats are never touched.

Setup (2 minutes)

Getting WhatsApp connected is dead simple:

  1. Open the dashboard → ChannelsConfig
  2. Toggle WhatsApp to enabled → Save
  3. A QR code appears on screen
  4. On your phone: WhatsApp → Settings → Linked Devices → Link a Device
  5. Scan the QR code. That's it.
No phone number needed in the config. Yabby links to your WhatsApp session via QR — same as linking WhatsApp Web. Your credentials are stored locally in data/whatsapp-auth/ and persist across restarts (no re-scanning).

What Happens After Connection

The moment the QR code is scanned:

  1. Yabby automatically creates a group called “🤖 Yabby Assistant”
  2. A welcome message is sent to confirm the connection
  3. All Yabby interactions happen exclusively in this group
  4. Your personal chats, other groups, and DMs are completely ignored

If the server restarts, the existing group is recovered automatically from the database — no duplicates.

Per-Agent Groups

This is where it gets powerful. When you create an agent (via voice, API, or dashboard), Yabby automatically creates a dedicated WhatsApp group for that agent:

  • Group named “💬 [Role] [Name]” — e.g., “💬 Frontend Dev [Lucia]”
  • Messages in that group are routed directly to that specific agent
  • Each agent has its own isolated conversation thread
Add people to agent groups. Anyone you invite into an agent's WhatsApp group can communicate with that agent directly. Great for team collaboration: add a colleague, and they can chat with the agent too. All participants share the same conversation context.

Workspace Switching

You can change an agent's working directory on the fly:

API
POST /api/agents/:id/change-workspace
{
  "workspace_path": "/Users/me/Projects/my-app",
  "reason": "Switch to my-app repo"
}

When the workspace changes:

  • The agent's running task is gracefully stopped
  • A new session starts in the new directory
  • Previous conversation context is injected so the agent remembers what it was doing
  • All future tasks run in the new workspace

Voice Messages

Send a voice note in the WhatsApp group and Yabby will auto-transcribe it via Whisper, then process it as a text instruction. The full voice pipeline works: your audio becomes an action.

Features

  • Group isolation: Yabby only responds in its dedicated groups, never in personal chats
  • Voice notes: Auto-transcribed via Whisper, treated as instructions
  • Message chunking: Long responses auto-split at 4096 characters
  • Deduplication: Redis-based tracking prevents double-processing
  • Spam filter: Short messages (“ok”, “oui”, emojis) debounced over 2 seconds
  • Notifications: Task completions, errors, and milestones broadcast to the group
  • Auto-reconnect: Exponential backoff (5s → 60s max, 10 attempts) on disconnect
  • Session persistence: Credentials survive restarts, no re-scanning

Slash Commands

CommandDescription
/statusShow running, completed, and failed task counts
/newStart a fresh conversation (clears history)
/resetSame as /new
/helpList available commands

Tips

💡 Pro tips
  • You can create agents by voice (“Yabby, create an agent named Marc for frontend”) and the WhatsApp group is created automatically
  • Add your team members to agent groups — everyone can collaborate with the same agent
  • Switch workspaces anytime via the API or by asking Yabby in voice: “Switch Marc to my-app folder”
  • If the connection drops, it auto-recovers. To force a fresh login: stop the channel with clearSession: true
  • Voice notes sent in WhatsApp are auto-transcribed, so you can give instructions hands-free
✈️

Telegram

Connect a Telegram Bot to interact with Yabby from anywhere. Supports both text and voice messages.

Setup

  1. Create a bot via @BotFather on Telegram
  2. Copy the Bot Token
  3. Go to Channels → Config in the dashboard
  4. Enable Telegram and paste the bot token
  5. Start chatting with your bot on Telegram

Features

  • Text messages:Send instructions naturally, auto-chunked at 4096 chars
  • Voice messages:Send a voice note, Yabby transcribes it via Whisper and responds
  • Voice replies:Yabby can respond with voice notes (TTS → OGG/Opus format)
  • Audio file support:Attach audio files for transcription
  • Same tools as voice:Full function-calling loop with max 5 tool iterations
  • Slash commands:/status, /new, /reset, /help
📡

Other Channels

Beyond WhatsApp and Telegram, OpenYabby supports Discord, Slack, and Signal as messaging channels.

ChannelLibraryAuthNotes
Discorddiscord.jsBot TokenFull bot integration with slash commands
Slack@slack/boltBot Token + App TokenWorkspace integration with mention gating
SignalSignal APIPhone number + API URLPrivacy-focused, supports QR code linking

Channel Configuration

All channels are configured from the Channels → Config tab:

  • Enable/disable each channel independently
  • DM policy: open (anyone can talk) or closed (whitelist only)
  • Mention gating: in groups, Yabby only responds when @mentioned
  • Allowed users: whitelist with recent user suggestions

Unified Conversation

All channels share the same conversation context as voice. A fact mentioned via WhatsApp is remembered when you speak to Yabby by voice, and vice versa.

🔌

Connectors & MCP

Connectors let agents interact with external services. v0.1.3 ships with a growing catalog of 37 connectors organized by category, plus support for any MCP (Model Context Protocol) server.

Connector Categories

💻 Development

GitHub, Linear, Sentry, Git

📋 Project Management

Jira, Asana, Monday, ClickUp

📈 CRM

Salesforce, HubSpot, Pipedrive

📊 Data

Google Sheets, Airtable, Stripe

📨 Communication

Slack, Discord integrations

⚙️ Custom

Any MCP server or custom API

How Connectors Work

  1. Browse the catalog from Connectors page
  2. Click a connector → enter credentials (API key, OAuth, etc.)
  3. Test the connection. Yabby validates credentials before saving
  4. Once connected, the connector's tools become available to all agents
  5. Connectors can be scoped to specific projects or kept global

MCP Servers

MCP (Model Context Protocol) allows agents to use external tool servers. OpenYabby can:

  • Auto-configure MCP servers from the catalog (command, args, env variables)
  • Custom MCP: Add any MCP server by specifying command and arguments
  • Bridge tools: MCP tool schemas are automatically converted to OpenAI function-calling format
  • .mcp.json generation: When a task spawns, Yabby generates a .mcp.json in the working directory so the CLI runner can access all connected MCP servers

Credential Security

All connector credentials are encrypted at rest using AES-256-GCM. The encryption key is derived from YABBY_SECRET (or auto-derived from OPENAI_API_KEY).

Scheduled Tasks

Schedule recurring tasks that Yabby executes automatically. The scheduler ticks every 30 seconds and supports three schedule types.

Schedule Types

TypeConfigExample
intervalFixed time between runsEvery 2 hours, every 30 minutes
cronCron expression0 9 * * 1-5 (weekdays at 9am)
manualNo scheduleTriggered only via button click

Creating a Scheduled Task

  1. Go to Scheduling page
  2. Click New Scheduled Task
  3. Enter: name, description, task prompt/template
  4. Choose schedule type and configure timing
  5. Optionally assign to a project or standalone agent
  6. Set max retries and retry delay

Features

  • Run history:View all past executions with status, task ID, and results
  • Manual trigger:Run any scheduled task on demand
  • Retry logic:Configurable max retries (default 3) with delay between attempts
  • Pause/Resume:Temporarily disable without deleting
  • Orphan recovery:Missed runs are detected and recovered on startup
  • Agent queue integration:For standalone agents, tasks are queued to preserve session state
⚙️

Settings

All configuration is managed through the Settings page, organized in 5 tabs. Changes are validated in real-time and hot-reloaded without server restart.

General

  • UI Language: French, English, Spanish, German
  • Speech Language: Language for voice recognition
  • Voice settings: Model, voice selection (6 voices), noise reduction, VAD type, mic toggle
  • Memory: Extraction model, embedder, extraction frequency (every N turns)
  • TTS Provider: Edge TTS (free), ElevenLabs, OpenAI, System
  • Task Runner: Claude, Codex, Aider, Goose, Cline, Continue, or custom binary
  • LLM providers: API keys for OpenAI, Anthropic, Google, Groq, Mistral, Ollama, OpenRouter

Speaker Verification

  • Enrollment status and calibration UI
  • Record 3 voice samples using browser microphone + Silero VAD
  • Delete enrollment to reset

Projects

  • Sandbox root: Where project files are stored (default: ~/Desktop/Yabby Projects)
  • Clean on archive: Delete files when a project is archived

Authentication

  • Enable/disable gateway auth
  • Gateway password for web access
  • Session TTL (days)
  • API token generation for programmatic access

Usage

  • 30-day cost summary across all LLM providers
  • Breakdown by provider: calls, input/output tokens, cost in USD
  • Daily usage chart
🏗️

Architecture

System Overview

Browser ──WebRTC──► OpenAI Realtime API ──► Bidirectional voice │ │ │ DataChannel (tool calls) │ ▼ ▼ Frontend (vanilla JS SPA) Voice tool dispatch │ │ ▼ ▼ Express Server ◄──────────────────── REST API │ ├── Claude CLI (child processes) ──► Task execution ├── PostgreSQL + Redis ──► Persistent state ├── Mem0 (Qdrant + SQLite) ──► Memory ├── Redis pub/sub ──► Agent-bus messaging └── Channels ──► WhatsApp, Telegram, Slack, Discord, Signal

Voice Pipeline

Browser Mic ──► Silero VAD (client ONNX) │ ▼ (voice detected) Speaker Verify ──► Python/ECAPA-TDNN (optional) │ ▼ (speaker match) Whisper ──► Transcribe audio │ ▼ (wake word match: /yab+[iy]e?/i) WebRTC Session ──► Full bidirectional voice active

Key Design Patterns

  • Dual-write cache: All writes go to PostgreSQL + Redis simultaneously. Reads check Redis first (24h TTL), fallback to PG.
  • Soft delete: Nothing is actually deleted. Status is set to archived. All queries filter WHERE status != 'archived'.
  • Name resolution: Tools accept ID or name. Resolution: exact ID → exact name → ILIKE contains → fuzzy match → role match.
  • SSE + WebSocket: Both channels emit identical events. Frontend uses SSE; WebSocket for presence/typing.
  • Config hot-reload: Settings changes propagate via Redis pub/sub. No server restart needed.
  • Fail-open: Optional services (speaker verification, tunnel) fail gracefully. The system continues working.

21 Voice Tools

These are the base tools available to Yabby during voice sessions (agents and connectors add more):

Task Management

create_task, check_task_status, list_tasks, kill_task, pause_task, resume_task

Projects & Agents

create_project, list_projects, create_agent, assign_agent, list_agents, talk_to_agent

Communication

switch_to_agent, back_to_yabby, send_notification, sleep_mode

Connectors & Skills

list_connectors, use_connector, list_skills, attach_skill

Relay Tunnel

OpenYabby includes a WebSocket tunnel to relay.openyabby.com for mobile access. A tunnel code is assigned and persisted to .env. Proxies HTTP + WebSocket traffic to localhost with auto-reconnect (exponential backoff: 2s → 30s max). Disable with DISABLE_TUNNEL=true.

Built with claws by the OpenYabby community. Star on GitHub