🦞

What is OpenYabby?

OpenYabby is an open-source, voice-first AI assistant with multi-agent orchestration. It combines OpenAI's Realtime API (WebRTC) for bidirectional voice with Claude CLI spawned as child processes for autonomous task execution.

Say "Yabby" and watch a team of AI agents plan, code, design, and deploy projects with full system access to your Mac (terminal, files, browser, AppleScript, GUI). No buttons, no typing. Just your voice.

🎤 Voice-First

Bidirectional audio via WebRTC. Wake word "Yabby" with client-side VAD. Speak naturally in any language.

🤖 Multi-Agent Teams

Lead agents create managers and sub-agents. Auto-orchestration handles coordination, reviews, and QA.

💻 Full System Access

Each agent runs as a Claude CLI process with access to bash, AppleScript, file system, and browser automation.

🧠 Persistent Memory

Mem0 extracts facts from conversation. Yabby learns who you are, your preferences, your context across sessions.

💬 Multi-Channel

Interact via voice, web dashboard, WhatsApp, Telegram, Slack, Discord, or Signal.

🔌 37 Connectors & growing

GitHub, Linear, Jira, Slack, Stripe, Google Sheets, and more via built-in integrations or MCP servers.

Tech Stack

Technology	Role
OpenAI Realtime API	Voice I/O via WebRTC
Claude CLI	Agent brain (task execution)
Node.js + Express	Backend server
PostgreSQL	Source of truth database
Redis	Cache, pub/sub, live status
Mem0 + Qdrant	Persistent memory & vector search
Whisper	Audio transcription (wake word)
Silero VAD	Client-side voice activity detection (ONNX)
Baileys	WhatsApp Web protocol
grammY	Telegram Bot API

📦

Installation

Prerequisites

Node.js 20+ (nodejs.org)
PostgreSQL:create a database named yabby
Redis running on localhost:6379
Claude CLI:npm install -g @anthropic-ai/claude-code
OpenAI API key with Realtime API access

Quick Start

Terminal

# Clone the repository
git clone https://github.com/OpenYabby/OpenYabby.git
cd OpenYabby

# Install dependencies
npm install

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Start (migrations run automatically)
npm run dev

Recommended: Use npm run dev to start both Node.js (port 3000) and the Speaker verification service (port 3001). Open http://localhost:3000 in your browser.

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	Yes	OpenAI key (Realtime API, Whisper, Mem0)
`PG_HOST` / `PG_PORT` / `PG_DATABASE` / `PG_USER` / `PG_PASSWORD`	Yes	PostgreSQL connection
`REDIS_URL`	Yes	Redis connection string
`CLAUDE_CMD`	No	Path to Claude CLI binary (default: `claude`)
`PORT`	No	Server port (default: `3000`)
`SPEAKER_SERVICE_URL`	No	Speaker verification endpoint (default: `http://localhost:3001`)
`SPEAKER_THRESHOLD`	No	Voice matching strictness 0.0–1.0 (default: `0.25`)
`DISABLE_TUNNEL`	No	Set `true` to skip relay tunnel
`SANDBOX_ROOT`	No	Override project sandbox location

Database Migrations

Migrations run automatically on startup. All migrations are idempotent (IF NOT EXISTS, ON CONFLICT). No manual migration step needed. Just start the server.

Docker (Partial)

Terminal

# Start PostgreSQL + Redis via Docker
docker-compose up -d

# Then start Yabby natively (Claude CLI can't run in Docker)
npm run dev

Note: Claude CLI cannot run inside Docker containers. Use Docker only for PostgreSQL and Redis, then run the Node.js server natively.

📊

Web Dashboard

The OpenYabby web interface is a single-page application served at http://localhost:3000. It provides a complete control center for managing agents, projects, tasks, and voice interaction.

Pages

Page	Route	Description
Dashboard	`/`	Overview with running/done/error task stats, project cards with progress, and real-time activity feed
Projects	`/projects`	Grid of all projects with status filters, progress rings, inline editing
Tasks	`/tasks`	Full task manager with status filters, activity logs, pause/resume/kill controls
Simple Tasks	`/simple-tasks`	Simplified task view for quick monitoring
Agents	`/agents`	Agent directory with Yabby (main AI) pinned at top, standalone and project agents listed
Scheduling	`/scheduled-tasks`	Scheduled/recurring task management with cron and interval support
Channels	`/channels`	Messaging channel configuration (WhatsApp, Telegram, Slack, Discord, Signal)
Connectors	`/connectors`	External integrations catalog with MCP server management
Settings	`/settings`	Voice, memory, auth, project, and usage configuration

Features

Real-time updates via SSE (Server-Sent Events): tasks, heartbeats, notifications stream live
Global search (Ctrl+/): search across projects, tasks, and agents
Keyboard shortcuts:Ctrl+1 Dashboard, Ctrl+2 Tasks, Ctrl+3 Agents, Cmd+K toggle Yabby chat
Collapsible sidebar with state persisted to localStorage
Notification bell with dropdown for speaker notifications and plan reviews
i18n:supports French, English, Spanish, German
Dark/Light theme toggle
Onboarding wizard on first visit

🎤

Voice Commands

OpenYabby is designed voice-first. The entire system (apps, agents, projects, files, terminal) can be controlled hands-free through natural conversation. Voice is the primary way to interact with Yabby.

Getting Started: Activate Voice

1

Click the Mic

Click the pulsing orb (bottom-right of the dashboard) to activate your microphone and start the WebRTC voice session.

2

Say "Yabby"

The wake word activates the AI. Just say "Yabby" followed by your command. The orb pulses green when listening.

3

Always Listening

Once activated, the mic stays on. Yabby listens continuously. After 10 min of silence, it auto-suspends and listens for the wake word again.

Visual indicators: The orb at the bottom-right of the screen reflects the voice state. Pulsing purple = idle/wake word listening. Green glow = actively listening to you. Blue = Yabby is working on a task. Solid green = task completed.

Voice Pipeline: How It Works

1

Wake Word Detection

When the session is suspended, the browser listens using Silero VAD (ONNX model, runs locally in your browser). When speech is detected, audio is sent to the server for Whisper transcription. If "Yabby" is detected, the full voice session resumes instantly.

2

WebRTC Connection

The browser establishes a WebRTC connection via OpenAI's Realtime API. Audio streams bidirectionally: you speak, Yabby responds with voice. Tool calls are received via DataChannel.

3

Tool Execution

When Yabby decides to act, it calls tools (create project, spawn task, open app...). The call is dispatched client-side to the backend REST API. Results are sent back via DataChannel and Yabby announces the result by voice simultaneously.

4

Auto-Suspend

After 10 minutes of inactivity, the session suspends and wake word listening resumes. Active tasks extend the timeout automatically. Yabby won't fall asleep while your agents are working.

🎤 How voice works

Say "Yabby" once to wake up the voice session. After that, just speak naturally. No need to repeat "Yabby" before every command. The session stays active until you put it to sleep or 10 minutes of inactivity pass.

🚀

Build & Ship Projects

From idea to deployed project. One voice command kicks off the full orchestration pipeline.

"Build me a SaaS landing page with Stripe checkout"

→ Creates a project, assigns a lead agent who recruits designers and devs, sequences tasks, and delivers a working site

"Set up a REST API with auth, user CRUD, and PostgreSQL"

→ Lead agent creates the architecture, spins up backend + DB agents in parallel, runs tests, and delivers the project

"Refactor the auth module to use JWT"

→ Spawns a task in the project sandbox. The agent reads the existing code, rewrites auth, updates tests, and commits

⌨️

System & Dev Tools

Full terminal access. File operations, package management, git, servers, scripts, and more.

"Commit everything with message 'fix login flow'"

→ Executes git add . && git commit -m "fix login flow" in the current project sandbox

"Start the dev server"

→ Runs npm run dev in the project directory and confirms the server is up

"Open Safari"

→ Runs open -a Safari. Works with any installed Mac application

"Install ffmpeg via Homebrew"

→ Spawns a task that runs brew install ffmpeg and reports when done

"Take a screenshot"

→ Acquires the GUI lock, captures the screen, and saves it to the desktop

🤖

Manage Agents

Create agents, talk to them, switch context, check their status. All hands-free.

"Create an agent named Luna, role: crypto analyst"

→ Creates a standalone agent "Luna" ready to receive instructions. Agents persist across sessions.

"Tell Nolan to fix the login bug"

→ Sends the instruction to agent "Nolan" via talk_to_agent. A task spawns in Nolan's context to execute the fix.

"What's John's status?"

→ Checks John's heartbeat, active task, progress percentage, and reads the summary aloud

"Switch me to Sofia"

→ Switches voice context to agent Sofia (switch_to_agent). You now talk directly to her. Say "back" to switch back.

📁

Manage Projects

Create, monitor, and interact with full projects. Yabby handles the entire lifecycle.

"Create a new project for an e-commerce site"

→ Creates the project, assigns a lead agent, sets up the sandbox directory, and starts the discovery phase. The lead may ask you clarifying questions by voice.

"How's the MathCalc project going?"

→ Checks all agent heartbeats, task statuses, and progress. Reads a summary: "3 tasks done, 1 running, 65% overall"

"Open the project folder in VS Code"

→ Runs code ~/Desktop/Yabby\ Projects/e-commerce-abc123/ to open the project sandbox

"Approve the plan"

→ Approves the lead agent's pending plan review. Execution phase begins immediately.

🔄

Complete Voice Workflow

Commands chain naturally, like a conversation. Here's a full project workflow done entirely by voice:

🎬 Example: Building a recipe website by voice

🎤 "Yabby" (wake word)

→ Session starts. Yabby is listening.

🎤 "Create a project for a recipe website"

→ Project created. Lead agent "Marc" assigned. Discovery phase starts.

💬 Marc asks: "Light or dark theme? How many recipe categories?"

→ You answer by voice. Marc saves the answers and moves to planning.

📋 Marc submits a plan. A modal appears for your approval.

→ You say "Approve the plan" or click Approve. Execution begins.

🎤 "Tell the lead to start with the design"

→ Instruction forwarded to lead. He delegates to Sofia (Designer).

🎤 "Show me the project status"

→ "Sofia: design running (45%). Lucia: waiting. Karim: waiting. Overall: 15%"

🎤 "Open the project folder in VS Code"

→ VS Code opens with the project sandbox. You can inspect files while agents work.

🔔 Yabby: "The Recipe project is complete. QA passed, 12 files created."

→ Voice notification. Zero micromanagement required.

🎛️

Control Yabby

Pause, resume, interrupt. You're always in charge.

"Go to sleep"

→ Activates sleep_mode. Voice session suspends, wake word listening resumes. Tasks keep running.

"Yabby" (wake word to resume)

→ Wakes up from sleep. Full voice session resumes instantly. Then give your next command.

"Stop the current task"

→ Pauses or kills the currently running task (SIGTERM). You can resume it later.

"Back"

→ If you're talking to an agent (Sofia, Marc...), this switches back to Yabby's main context via back_to_yabby.

🎨

Everyday Tasks & Mac Control

Full Mac access: AppleScript, bash, GUI control, screenshots, browser automation.

"Play some jazz on Spotify"

→ Opens Spotify via AppleScript and starts playing a jazz playlist

"Set the volume to 30%"

→ Adjusts system volume via osascript -e 'set volume output volume 30'

"Send an email to Pierre about tomorrow's meeting"

→ Composes and sends an email via AppleScript (Mail.app) or a connected email connector

"What's the weather in Paris?"

→ Fetches weather data via a web search or connected API and reads the forecast aloud

"Summarize the last 10 commits in my repo"

→ Runs git log --oneline -10 and reads back a digest of recent changes

💡 Tips & Best Practices

Speak naturally. No rigid syntax needed. "Can you open the terminal?" works as well as "Open the terminal".
Chain commands. After one command completes, just keep talking. No need to say "Yabby" again while the session is active.
Interrupt anytime. You can speak while Yabby is responding. It stops and listens to your new instruction.
Use any language. Yabby understands French, English, Spanish, German, and more. You can mix freely.
Monitor visually. The dashboard updates in real time. Watch task progress while you give voice commands.
Text fallback. Press Cmd+K to open the chat window. Type commands when you can't speak.
Agent conversations. Say "Switch me to [name]" to talk directly to an agent. Say "back" to return.
Background work. Say "go to sleep". Tasks and agents keep running. You'll get a voice notification when something finishes.

Speaker Verification (Optional)

An optional Python microservice (FastAPI + SpeechBrain ECAPA-TDNN) filters wake word detection to only your voice. This reduces false positives by 90%+ in multi-person environments.

Go to Settings → Speaker Verification
Record 3 voice samples saying "Yabby"
Once enrolled, only your voice triggers wake word detection
Embeddings are stored locally (never sent to cloud)
If the service is down, detection continues normally (fail-open)

Voice Configuration

Option	Values	Default
Voice	ash, ballad, coral, sage, verse, marin	marin
Noise Reduction	near_field, far_field, off	near_field
Turn Detection	server_vad, semantic_vad	server_vad
Mic Enabled	true / false	true

Text-Only Mode

You can also interact with Yabby by typing in the chat window (Cmd+K to toggle). Text messages are sent via DataChannel or trigger a text-only connection if the session is not active. Useful when you can't speak.

Noise Filter

Client-side regex filters block low-value utterances ("ok", "oui", "mmh", "non"... 44 patterns) from being sent to the voice model. Short audio clips (<0.2s or <3 words) are also discarded.

🤖

Agents

Agents are the core workforce of OpenYabby. Each agent is a Claude CLI process with its own session, system prompt, and working directory.

Agent Tiers

Lead Agent

Project Director

Full API access. Orchestrates discovery, planning, execution, and QA. Creates teams, submits plans for approval, reports via voice.

Manager

Coordinator

Mid-level. Manages sub-agents, auto-triggered for review when children complete tasks. Reports to lead or parent manager.

Sub-Agent

Executor

Executes specific work (code, design, QA). Sends completion report to parent agent via agent-bus messaging.

Standalone Agents

You can create agents outside of any project. These are standalone agents that persist across sessions and can be instructed directly via voice or chat.

Standalone agents must have a valid human first name (e.g., "Marc", "Sofia", "Karim"). Technical names are rejected.
They get their own working directory and conversation thread
You can switch to an agent's context via voice: "Yabby, parle à Marc"
Agents can be suspended, activated, or deleted from the Agents page

Agent Communication

Agents communicate via Redis pub/sub (yabby:agent-bus channel). When a sub-agent completes a task, the orchestrator automatically triggers a review task on the parent manager (with 60-second debounce to batch simultaneous completions).

Agent Voice Switching

You can talk directly to a specific agent. Yabby swaps the voice session instructions to that agent's context (same WebRTC connection persists). Say "Back to Yabby" to return.

📁

Projects

Projects are containers for multi-agent work. Each project gets an isolated sandbox directory at ~/Desktop/Yabby Projects/{name}-{id}/ (configurable), auto-initialized with src/, docs/, README, .gitignore, and git init.

Project Lifecycle (5 Phases)

1

Discovery

The lead agent asks clarifying questions before planning. Questions appear as modals or voice prompts. Supports text fields, dropdowns, and connector selection questions.

2

Planning

The lead writes a PLAN.md and submits it for your approval. You can approve, revise (with feedback), or cancel the project.

3

Execution

The lead creates agents (managers and sub-agents), assigns tasks, and coordinates parallel work. Agents have full system access within the sandbox.

4

Review

When sub-agents complete tasks, the orchestrator auto-triggers the lead for review. The lead inspects deliverables and decides next steps.

5

QA

Specialized QA agents run test plans. Corrections loop until clean. You get a voice notification when the project is complete. Zero micromanagement.

Creating a Project

There are several ways to create a project:

Voice: "Yabby, crée-moi un portfolio dark mode avec une section blog"
Dashboard: Click "New Project" on the Projects page
Channel: Send a project request via WhatsApp, Telegram, or any connected channel

⚡

Tasks

A task is a Claude CLI child process spawned to execute a specific instruction. Each task runs with full system access (bash, AppleScript, GUI, file system, browser).

Task Lifecycle

Status	Description
`running`	CLI process is active and executing
`done`	Task completed successfully
`error`	Task failed or process crashed
`paused`	Task paused (SIGTERM sent)
`killed`	Task force-killed (SIGKILL)
`paused_llm_limit`	Claude CLI hit daily quota. Auto-resumes when limit resets

Task Runners

OpenYabby supports multiple CLI runners for task execution:

Runner	Description
`claude`	Claude Code CLI (default). Full-featured, recommended
`codex`	OpenAI Codex CLI
`aider`	Aider (AI pair programming)
`goose`	Goose AI coding assistant
`cline`	Cline CLI
`continue`	Continue CLI
`custom`	Custom binary path

Logging

Every task produces two log files:

logs/{taskId}-activity.log: Structured activity log (tool calls, status changes)
logs/{taskId}-raw.log: Raw CLI output

Background Tasks (v0.1.3+)

When an agent calls Bash(run_in_background=true) for a long-running job (batch script, dev server, scraper), OpenYabby tracks it as a background task, separate from the parent CLI task. The bg process survives the CLI's exit and is monitored at the OS level via PID polling — the agent gets notified asynchronously when it ends, even minutes or hours later.

How tracking works

A PreToolUse hook intercepts each Bash(run_in_background=true) call and writes a per-call bookkeeper script that captures the host PID and exit code via wait $C; rc=$?.
The bookkeeper detaches via nohup so the bg child survives the CLI exit.
A central bg-watcher polls kill -0 <pid> every 30 seconds. When the PID disappears, it reads the exit code and routes the agent's next-turn notification:

Final status	Trigger	Notification to agent
`completed`	exit code 0, not a service	`[BG_COMPLETED]` — success report
`failed`	non-zero exit or SIGKILL/OOM (no exit code captured)	`[BG_FAILED]` — diagnostic prompt
`service_died`	tagged with `[bg:service]`, any exit	`[BG_SERVICE_DIED]` — neutral, ask what to do
`orphaned`	row was `running` at startup but PID is gone	none (server restart recovery)
`stopped`	killed by `kill_bg_task` tool or manual user action	none (intentional)

Service tag

For permanent services (Node/PHP/Streamlit dev servers, watchers), the agent appends [bg:service] to the tool's description:

Bash(
  command="node server.js",
  description="dev preview server :3000 [bg:service]",
  run_in_background=true
)

The hook drops a marker file so the watcher uses the [BG_SERVICE_DIED] wording instead of treating an exit as a "completion".

Introspection & control tools

Agents have 5 dedicated tools to inspect and control their bg tasks (callable via POST /api/tools/execute):

Tool	Purpose
`list_bg_tasks`	Compact list of the agent's bg tasks. Running first, status counts, optional filter.
`bg_task_detail`	Full info on one task: real-time PID liveness check, elapsed, exit code, exit signal, output path.
`get_bg_task_log`	Read the output. Modes: `tail` (default 4 KB), `head`, or `grep` with a pattern. Hard cap 64 KB per call.
`kill_bg_task`	Stop a running task: SIGTERM, then SIGKILL after 3 s if still alive. Marks the row `stopped`.
`register_external_bg`	Register a process spawned outside the `run_in_background` path (typical `start.sh` doing `php -S ... &` internally). Provide the PID + description + `is_service` flag.

Web UI

The Activity page shows a Background Tasks panel below the CLI events stream, with running-first sort, status badges, elapsed timer, service marker, and a 15-second auto-refresh. Both sections scroll independently.

Resilience

On server restart, the startup sweep doesn't blindly orphan rows in running state — it checks each row's PID with kill -0 first. Live bg jobs survive Yabby restarts (the watcher re-attaches automatically). Only truly-gone PIDs are marked orphaned.

Retry Detection

OpenYabby monitors task activity for infinite retry loops. When a pattern of repeated tool calls is detected (30 calls scanned), the system intervenes to unblock the stuck agent.

GUI Lock

Tasks that need to interact with the GUI (screenshots, mouse clicks) must acquire a lock (yabby:gui_lock in Redis). Only one task can control the GUI at a time. Lock auto-expires after 5 minutes or when the task finishes.

💬

WhatsApp

OpenYabby connects directly to WhatsApp Web via the Baileys protocol. No Meta Business API, no approval process, no cost. Just scan a QR code and you're live. Everything runs inside isolated groups so your personal chats are never touched.

Setup (2 minutes)

Getting WhatsApp connected is dead simple:

Open the dashboard → Channels → Config
Toggle WhatsApp to enabled → Save
A QR code appears on screen
On your phone: WhatsApp → Settings → Linked Devices → Link a Device
Scan the QR code. That's it.

No phone number needed in the config. Yabby links to your WhatsApp session via QR — same as linking WhatsApp Web. Your credentials are stored locally in data/whatsapp-auth/ and persist across restarts (no re-scanning).

What Happens After Connection

The moment the QR code is scanned:

Yabby automatically creates a group called “🤖 Yabby Assistant”
A welcome message is sent to confirm the connection
All Yabby interactions happen exclusively in this group
Your personal chats, other groups, and DMs are completely ignored

If the server restarts, the existing group is recovered automatically from the database — no duplicates.

Per-Agent Groups

This is where it gets powerful. When you create an agent (via voice, API, or dashboard), Yabby automatically creates a dedicated WhatsApp group for that agent:

Group named “💬 [Role] [Name]” — e.g., “💬 Frontend Dev [Lucia]”
Messages in that group are routed directly to that specific agent
Each agent has its own isolated conversation thread

Add people to agent groups. Anyone you invite into an agent's WhatsApp group can communicate with that agent directly. Great for team collaboration: add a colleague, and they can chat with the agent too. All participants share the same conversation context.

Workspace Switching

You can change an agent's working directory on the fly:

API

POST /api/agents/:id/change-workspace
{
  "workspace_path": "/Users/me/Projects/my-app",
  "reason": "Switch to my-app repo"
}

When the workspace changes:

The agent's running task is gracefully stopped
A new session starts in the new directory
Previous conversation context is injected so the agent remembers what it was doing
All future tasks run in the new workspace

Voice Messages

Send a voice note in the WhatsApp group and Yabby will auto-transcribe it via Whisper, then process it as a text instruction. The full voice pipeline works: your audio becomes an action.

Features

Group isolation: Yabby only responds in its dedicated groups, never in personal chats
Voice notes: Auto-transcribed via Whisper, treated as instructions
Message chunking: Long responses auto-split at 4096 characters
Deduplication: Redis-based tracking prevents double-processing
Spam filter: Short messages (“ok”, “oui”, emojis) debounced over 2 seconds
Notifications: Task completions, errors, and milestones broadcast to the group
Auto-reconnect: Exponential backoff (5s → 60s max, 10 attempts) on disconnect
Session persistence: Credentials survive restarts, no re-scanning

Slash Commands

Command	Description
`/status`	Show running, completed, and failed task counts
`/new`	Start a fresh conversation (clears history)
`/reset`	Same as `/new`
`/help`	List available commands

Tips

💡 Pro tips

You can create agents by voice (“Yabby, create an agent named Marc for frontend”) and the WhatsApp group is created automatically
Add your team members to agent groups — everyone can collaborate with the same agent
Switch workspaces anytime via the API or by asking Yabby in voice: “Switch Marc to my-app folder”
If the connection drops, it auto-recovers. To force a fresh login: stop the channel with clearSession: true
Voice notes sent in WhatsApp are auto-transcribed, so you can give instructions hands-free

✈️

Telegram

Connect a Telegram Bot to interact with Yabby from anywhere. Supports both text and voice messages.

Setup

Create a bot via @BotFather on Telegram
Copy the Bot Token
Go to Channels → Config in the dashboard
Enable Telegram and paste the bot token
Start chatting with your bot on Telegram

Features

Text messages:Send instructions naturally, auto-chunked at 4096 chars
Voice messages:Send a voice note, Yabby transcribes it via Whisper and responds
Voice replies:Yabby can respond with voice notes (TTS → OGG/Opus format)
Audio file support:Attach audio files for transcription
Same tools as voice:Full function-calling loop with max 5 tool iterations
Slash commands:/status, /new, /reset, /help

📡

Other Channels

Beyond WhatsApp and Telegram, OpenYabby supports Discord, Slack, and Signal as messaging channels.

Channel	Library	Auth	Notes
Discord	discord.js	Bot Token	Full bot integration with slash commands
Slack	@slack/bolt	Bot Token + App Token	Workspace integration with mention gating
Signal	Signal API	Phone number + API URL	Privacy-focused, supports QR code linking

Channel Configuration

All channels are configured from the Channels → Config tab:

Enable/disable each channel independently
DM policy: open (anyone can talk) or closed (whitelist only)
Mention gating: in groups, Yabby only responds when @mentioned
Allowed users: whitelist with recent user suggestions

Unified Conversation

All channels share the same conversation context as voice. A fact mentioned via WhatsApp is remembered when you speak to Yabby by voice, and vice versa.

🔌

Connectors & MCP

Connectors let agents interact with external services. v0.1.3 ships with a growing catalog of 37 connectors organized by category, plus support for any MCP (Model Context Protocol) server.

Connector Categories

💻 Development

GitHub, Linear, Sentry, Git

📋 Project Management

Jira, Asana, Monday, ClickUp

📈 CRM

Salesforce, HubSpot, Pipedrive

📊 Data

Google Sheets, Airtable, Stripe

📨 Communication

Slack, Discord integrations

⚙️ Custom

Any MCP server or custom API

How Connectors Work

Browse the catalog from Connectors page
Click a connector → enter credentials (API key, OAuth, etc.)
Test the connection. Yabby validates credentials before saving
Once connected, the connector's tools become available to all agents
Connectors can be scoped to specific projects or kept global

MCP Servers

MCP (Model Context Protocol) allows agents to use external tool servers. OpenYabby can:

Auto-configure MCP servers from the catalog (command, args, env variables)
Custom MCP: Add any MCP server by specifying command and arguments
Bridge tools: MCP tool schemas are automatically converted to OpenAI function-calling format
.mcp.json generation: When a task spawns, Yabby generates a .mcp.json in the working directory so the CLI runner can access all connected MCP servers

Credential Security

All connector credentials are encrypted at rest using AES-256-GCM. The encryption key is derived from YABBY_SECRET (or auto-derived from OPENAI_API_KEY).

⏰

Scheduled Tasks

Schedule recurring tasks that Yabby executes automatically. The scheduler ticks every 30 seconds and supports three schedule types.

Schedule Types

Type	Config	Example
`interval`	Fixed time between runs	Every 2 hours, every 30 minutes
`cron`	Cron expression	`0 9 * * 1-5` (weekdays at 9am)
`manual`	No schedule	Triggered only via button click

Creating a Scheduled Task

Go to Scheduling page
Click New Scheduled Task
Enter: name, description, task prompt/template
Choose schedule type and configure timing
Optionally assign to a project or standalone agent
Set max retries and retry delay

Features

Run history:View all past executions with status, task ID, and results
Manual trigger:Run any scheduled task on demand
Retry logic:Configurable max retries (default 3) with delay between attempts
Pause/Resume:Temporarily disable without deleting
Orphan recovery:Missed runs are detected and recovered on startup
Agent queue integration:For standalone agents, tasks are queued to preserve session state

⚙️

Settings

All configuration is managed through the Settings page, organized in 5 tabs. Changes are validated in real-time and hot-reloaded without server restart.

General

UI Language: French, English, Spanish, German
Speech Language: Language for voice recognition
Voice settings: Model, voice selection (6 voices), noise reduction, VAD type, mic toggle
Memory: Extraction model, embedder, extraction frequency (every N turns)
TTS Provider: Edge TTS (free), ElevenLabs, OpenAI, System
Task Runner: Claude, Codex, Aider, Goose, Cline, Continue, or custom binary
LLM providers: API keys for OpenAI, Anthropic, Google, Groq, Mistral, Ollama, OpenRouter

Speaker Verification

Enrollment status and calibration UI
Record 3 voice samples using browser microphone + Silero VAD
Delete enrollment to reset

Projects

Sandbox root: Where project files are stored (default: ~/Desktop/Yabby Projects)
Clean on archive: Delete files when a project is archived

Authentication

Enable/disable gateway auth
Gateway password for web access
Session TTL (days)
API token generation for programmatic access

Usage

30-day cost summary across all LLM providers
Breakdown by provider: calls, input/output tokens, cost in USD
Daily usage chart

🏗️

Architecture

System Overview

Browser ──WebRTC──► OpenAI Realtime API ──► Bidirectional voice │ │ │ DataChannel (tool calls) │ ▼ ▼ Frontend (vanilla JS SPA) Voice tool dispatch │ │ ▼ ▼ Express Server ◄──────────────────── REST API │ ├── Claude CLI (child processes) ──► Task execution ├── PostgreSQL + Redis ──► Persistent state ├── Mem0 (Qdrant + SQLite) ──► Memory ├── Redis pub/sub ──► Agent-bus messaging └── Channels ──► WhatsApp, Telegram, Slack, Discord, Signal

Voice Pipeline

Browser Mic ──► Silero VAD (client ONNX) │ ▼ (voice detected) Speaker Verify ──► Python/ECAPA-TDNN (optional) │ ▼ (speaker match) Whisper ──► Transcribe audio │ ▼ (wake word match: /yab+[iy]e?/i) WebRTC Session ──► Full bidirectional voice active

Key Design Patterns

Dual-write cache: All writes go to PostgreSQL + Redis simultaneously. Reads check Redis first (24h TTL), fallback to PG.
Soft delete: Nothing is actually deleted. Status is set to archived. All queries filter WHERE status != 'archived'.
Name resolution: Tools accept ID or name. Resolution: exact ID → exact name → ILIKE contains → fuzzy match → role match.
SSE + WebSocket: Both channels emit identical events. Frontend uses SSE; WebSocket for presence/typing.
Config hot-reload: Settings changes propagate via Redis pub/sub. No server restart needed.
Fail-open: Optional services (speaker verification, tunnel) fail gracefully. The system continues working.

21 Voice Tools

These are the base tools available to Yabby during voice sessions (agents and connectors add more):

Task Management

create_task, check_task_status, list_tasks, kill_task, pause_task, resume_task

Projects & Agents

create_project, list_projects, create_agent, assign_agent, list_agents, talk_to_agent

Communication

switch_to_agent, back_to_yabby, send_notification, sleep_mode

Connectors & Skills

list_connectors, use_connector, list_skills, attach_skill

Relay Tunnel

OpenYabby includes a WebSocket tunnel to relay.openyabby.com for mobile access. A tunnel code is assigned and persisted to .env. Proxies HTTP + WebSocket traffic to localhost with auto-reconnect (exponential backoff: 2s → 30s max). Disable with DISABLE_TUNNEL=true.

Documentation

What is OpenYabby?

Tech Stack

Installation

Prerequisites

Quick Start

Environment Variables

Database Migrations

Docker (Partial)

Web Dashboard

Pages

Features

Voice Commands

Getting Started: Activate Voice

Click the Mic

Say "Yabby"

Always Listening

Voice Pipeline: How It Works

Wake Word Detection

WebRTC Connection

Tool Execution

Auto-Suspend

Build & Ship Projects

System & Dev Tools

Manage Agents

Manage Projects

Complete Voice Workflow

Control Yabby

Everyday Tasks & Mac Control

Speaker Verification (Optional)

Voice Configuration

Text-Only Mode

Noise Filter

Agents

Agent Tiers

Project Director

Coordinator

Executor

Standalone Agents

Agent Communication

Agent Voice Switching

Projects

Project Lifecycle (5 Phases)

Discovery

Planning

Execution

Review

QA

Creating a Project

Tasks

Task Lifecycle

Task Runners

Logging

Background Tasks (v0.1.3+)

How tracking works

Service tag

Introspection & control tools

Web UI

Resilience

Retry Detection

GUI Lock

WhatsApp

Setup (2 minutes)

What Happens After Connection

Per-Agent Groups

Workspace Switching

Voice Messages

Features

Slash Commands

Tips

Telegram

Setup

Features

Other Channels

Channel Configuration

Unified Conversation

Connectors & MCP

Connector Categories

How Connectors Work

MCP Servers