Why I built OpenYabby
I started OpenYabby because I was annoyed.
Not at the AI. Claude Code was already producing better diffs than I would have, in half the time. I was annoyed at myself, sitting at my desk, typing the same kind of prompt for the fifth time that week. "Take this codebase. Find the bug in the migration. Tell me before you change anything." Type, wait, read, type, wait, read. The AI got fast. I didn't.
That feeling is the whole reason this project exists. It's open source today. Here's why I built it, what's underneath, and what I learned along the way that I wish someone had told me when I started.
The actual problem isn't the LLM
I've been building software for a long time. A lot of full-stack work, a lot of AI lately, a fair amount of weird complex stuff in between (custom CRMs, SaaS apps, mobile apps, a few things I can't even talk about because of NDAs). I write JavaScript and TypeScript every day, Python a lot of the time, and PHP and SQL when the job calls for it. I'm self-taught. Nobody handed me a CS degree. I learned by shipping things that broke and figuring out why.
That kind of background teaches you something specific. Most of the time, the bottleneck is never the part you're staring at. The bottleneck is two layers up, in the workflow you've stopped noticing.
When generative AI got good enough to actually write code, I noticed something. The new bottleneck wasn't the model. The new bottleneck was me.
- I was the one re-typing context every time I switched tasks.
- I was the one opening three terminal tabs to run two parallel features and acting as the manual orchestrator between them.
- I was the one forgetting what I'd asked Claude to do yesterday because I'd closed the session.
- I was the one who couldn't safely walk away from a long task because I didn't trust the agent to come back to me when it hit a real decision point.
That's a workflow problem, not a model problem. And workflow problems compound. I was losing maybe two hours a day just to coordination friction. That's a quarter of my workday gone to copy-pasting context between windows.
Voice was the unlock
The honest origin of OpenYabby is mundane. I was in the kitchen reheating leftover pasta and I realized I wanted to ask my Mac something while my hands were full. Not "play music." Not "set a timer." Something like "Yabby, the failing test in that PR. Is it the migration order or the seed data? Don't change anything yet, just tell me."
That's not a Siri sentence. That's a colleague sentence. I realized I wanted my Mac to be a colleague.
The voice loop turned out to be the easy part. OpenAI's Realtime API gives you near-instant back-and-forth voice in the browser. You hand the model a list of things it's allowed to do, and it calls them when the conversation calls for it. Two weekends of fiddling and I had a working prototype that could create projects, spawn tasks, and report status. All by voice.
The hard part started immediately after.
What's actually hard about voice-first agents
Three problems took me months to solve. Each one would have killed the project if I hadn't worked through it.
1. Stopping agents from chasing their own tails
The first multi-agent setup I built had a lead agent that could spawn sub-agents. The lead would say "go fix this," the sub-agent would try, fail, report back, the lead would think for 15 seconds, spawn another sub-agent to "review" the failure, the reviewer would propose another fix, that would spawn another agent, and so on.
I left it running one Saturday afternoon, came back a couple of hours later, and there were 47 agents in the queue all reviewing each other's reviews. My OpenAI bill jumped by €18 in those two hours. It was a small Cambrian explosion of agents arguing about a typo.
The fix was harder than I expected. There's a small watchdog now that keeps an eye on what each agent has been doing. If it spots the same kind of action repeating, it pauses the agent and forces it to either ask me a question or stop. Boring guardrail, but it ended my €18 weekends. The code is here if you're curious.
2. Letting agents touch your real Mac without breaking it
The whole pitch is "the agent does work on your machine." That means real bash, real AppleScript, real GUI control when the bash route isn't available. Which means at some point an agent is going to try to do something destructive while you're using your computer.
I lost a Spotify playlist this way. An agent tried to "test" something via AppleScript, ended up sending keystrokes to my front window, which happened to be Spotify, which deleted a playlist. I rebuilt the playlist. Then I built the GUI lock.
The GUI lock is a shared "do not disturb" flag the agents have to claim before they're allowed to touch the keyboard or mouse. Only one agent can hold it at a time. If an agent crashes while it has it, the flag releases on its own after a few minutes. Boring solution, but it's what makes everything else safe.
3. Memory that actually works the next morning
The first version of OpenYabby had no persistent memory. Every voice session started from zero. I'd come back the next day, say "where were we on that learning app?" and Yabby would say "What learning app?" Excruciating.
The fix was a memory layer that quietly extracts facts from our conversations every few turns and tucks them away so Yabby can recall them the next morning. I tried a cheaper model for the extraction first to save money. It missed French names. Kept calling my partner "the user's friend" because the cheap model apparently wasn't trained on enough French data. I switched to a slightly bigger one and left a comment in the config telling future-me not to undo this decision. If you ever wonder why open-source projects have weird-looking pinned decisions in their code, it's because someone got burned.
What runs underneath
The full stack, for the curious.
- Voice. OpenAI's Realtime API. The browser captures the audio, the server keeps things tidy.
- Wake word. A small speaker-verification model so Yabby only wakes for me, not for whoever else is in the room.
- Runners. Six interchangeable coding assistants under the hood (Claude Code, Codex, Aider, Goose, Cline, Continue). Default is Claude Code because it's the most polished, but I've built features end-to-end with each one.
- Persistence. A regular database for state, a fast in-memory store for live status, and a vector store for memory. A fresh install on someone else's machine just works.
- Channels. Discord, Slack, Telegram, WhatsApp, and Signal. Same agents respond from any channel. Same conversation context.
- Connectors. 37 in the catalog (Notion, Linear, GitHub, Stripe, calendar APIs, that kind of thing). Anyone can plug in their own.
None of those individually is novel. The novel part, and the part I'm proudest of, is that you can pick up an OpenYabby agent, swap the coding assistant it's using, and it keeps working. That took an embarrassing number of weekends to get right, because every assistant has its own opinion about how it should be talked to.
What I got wrong
I want to be honest about this part, because the launch posts will sand the edges off.
For a while I built the wrong thing. I thought voice was the product. It isn't. Voice is the input modality. The product is the multi-agent orchestration. I had to throw away a lot of UI work when I realized the dashboard was where users would actually live.
I was too clever about how agents take turns. The first version had a sophisticated way of figuring out which task should run before which. Beautiful, totally over-engineered. The current version is just a number on each task: same number means run together, lower number means runs first. That's it. Covers 95% of what you actually need. The fancy version is in the git history somewhere. I should delete it.
I should have open-sourced earlier. I sat on this for a couple of months longer than I needed to because I wanted it to be "ready." It isn't ready. It's v0.1.0. There are bugs in the channel handler I know about right now and haven't fixed. There are TODOs in the spawner. Mac-only. Five major limitations in the README. But shipped beats perfect, and waiting another two months would have meant shipping into a different market. Anthropic and OpenAI both shipped agentic features recently that overlap with this. The window is now.
What it's actually for
I use OpenYabby every working day for three things.
Long parallel work that I can walk away from. Refactoring across multiple modules. Spinning up new microservices. Migration scripts. I describe the goal, the lead agent plans, asks me to approve the plan, then runs the work. I get a summary when it's done. If something blocks, the lead pauses and asks me a question (voice or modal popup). I don't have to babysit.
Creative iteration where I'm thinking out loud. "Yabby, what if we tried X for the search ranking? Show me three approaches with the tradeoffs, don't write any code yet." Then we talk about it. Then I tell it to implement one.
Triaging the morning backlog. Channel notifications, GitHub issues, scheduled tasks that ran overnight. Yabby summarizes, I prioritize verbally, the team handles execution.
What it isn't for. Writing tweets, generating marketing copy, "AI girlfriend" use cases, or replacing a human collaborator who can actually disagree with you. Be sceptical of anyone who tells you their multi-agent system can do those things. Mine can't, and I'd be lying if I said otherwise.
Why open source
Three reasons, in order of honesty.
The selfish one. I want this to outlive my interest in it. Open-sourcing software you actually use is the closest thing to insurance against your own attention drift. If I get bored someday, someone else can fork it.
The strategic one. The thing that makes voice-first agents valuable is exactly the thing that makes them hard to centralize. Your machine, your microphone, your data, your AppleScript bridge that touches your actual files. Self-hosted and open source is the only honest deployment model for this kind of tool. Anyone shipping a closed-source equivalent will eventually have to ship a cloud version, and the cloud version will have to compromise on the local-system access part. So the only way for this category to exist properly is for someone to ship the open-source reference implementation. Might as well be me.
The political one. The agent-orchestration layer is going to be one of the most important pieces of personal software in the next decade. I don't want to use a closed version of it. I don't want my friends to either.
What's next
Linux and Windows ports are the obvious next thing. The blocker is the part that lets agents drive the keyboard and mouse. Mac has a clean way to do it. Linux and Windows don't. The other obvious thing is running the voice part on your own machine, so it doesn't need an internet round-trip. There's a credible path to it but it's a real project, not a weekend.
Beyond that, I want to see what people actually build with this before I decide where to invest. The roadmap in the README is six items long on purpose. I'd rather ship six things well than six hundred half-things.
If you build something with OpenYabby, or if you read the code and want to push back on a design choice, or if you just want to lurk in the Discord and watch, I'd love to have you.
Try OpenYabby
Open source, MIT licensed, runs on macOS. About 15 minutes to set up via ./setup.sh.