Zacharias MalguitouZacharias MalguitouLou BichardLou Bichard
/April 21, 2026AIEngineering

Building a software factory: Week 1, zero to product

Five days. Over 130 PRs merged. 12,202 lines of code. No human-written code.
Here's what we learned in week one of the software factory livestream.

Everybody is talking about software factories. Few have built one. Fewer still show the process. So we are documenting the whole thing in public: empty GitHub repo to self-shipping product, live, every day.

The product is Memo, a Notion-like note-taking app. But this is not really about the product. It is about the processes that cover each step of the SDLC autonomously. Humans steer intent. Ona does the heavy lifting and maintains the codebase: initial buildout, ongoing maintenance, feature additions. All in public.

The question we want to answer: can agents take a product from idea to execution on their own? Not just write code, but handle the full lifecycle.

Week one is done. Here is what happened.

By the numbers

Day 1: The rules of the factory

0 PRs. 0 LOC.

We started with an empty repo and one question: what does a software factory actually need?

The answer is a set of automations that chain together to cover every stage of the SDLC. Planning breaks a spec into issues. Build picks up issues and writes code. Review checks every PR before it merges. Verification smoke-tests after deployment. Operations monitors production errors and triages them back into the build loop. Each stage hands off to the next without human intervention.

The rules: no human-written code. Human input is limited to the product spec, automation design, and review when agents escalate. Everything else is the factory's job.

We wrote AGENTS.md, a single file that tells every agent how to behave in the repo: code style, architecture decisions, testing expectations, PR conventions. Think of it as the factory floor manual. The quality of this file directly correlates with the quality of what the factory produces.

The stack: Next.js 16, Supabase for auth and Postgres, Sentry for error monitoring, Vercel for deployment. All orchestrated through Ona.

Day 2: First scaffold, first automation

17 PRs merged. 217 LOC.

The first test: can the factory actually produce working code?

We pointed an agent at the product spec and told it to scaffold the app. Within minutes, the first PRs started landing. Next.js 16, Supabase auth, Sentry error tracking, a testing framework, CI pipeline. 17 PRs merged by end of day. The repo went from empty to a deployable app with auth, monitoring, and automated checks.

Then we turned on the first automation: the PR Reviewer. From this point on, no code merges without an agent reviewing it first. This is the piece that turns "agents writing code" into something closer to a production line. Every PR gets checked against the conventions documented in our markdown files, tested, and either approved or sent back with comments.

The factory had its first moving part.

Day 3: Spec to working app in a day

54 PRs merged. 7,848 LOC.

The scaffold from Day 2 gave us a deployable shell: auth, monitoring, CI. But no product features. On Day 3, we fed the factory a detailed product spec: what Memo should do, what it should achieve, how it should look and feel.

The Feature Planner, one of our core automations, broke the spec into sequential GitHub issues with acceptance criteria and dependency chains. The Feature Builder picked them up and implemented them. The PR Reviewer reviewed each PR as it came in. By stream time, over 50 PRs had merged autonomously. The app was live with workspaces, pages, a Lexical block editor, full-text search, markdown import/export, and member invites.

Detailed spec to working product in under a day. No human touched the code.

Chris (Ona's CTO) walked through the two-loop automation pattern: one set of automations creates work (planning, triage), another does the work (build, review, merge). Progressive escalation: low-risk changes auto-merge, high-risk changes get flagged for human review.

"Think of agents like an afterburner you strap onto your organization. Either you withstand the acceleration or you come undone in midair."

Christian Weichel, Ona CTO and Co-founder

Day 4: Under the hood

89 PRs merged. 10,410 LOC.

By Day 4, the factory had been running for two days and we had not actually shown what is inside it. Time to open it up.

14 automations run the factory, organized into five layers:

Each layer feeds the next. The output of planning is the input of build. The output of build is the input of quality. Failures in verification loop back to build.

We showed a real PR where agents wrote code, reviewed each other's comments, resolved review feedback, and merged. No human involved at any point. That is the part that makes it a factory rather than just "agents writing code": the quality layer runs autonomously too.

It is not all on autopilot. Sometimes you have to step in and tune the machines. We hit an agent stuck on a PR, asking for human guidance. In a single Ona conversation, we fixed the PR and improved the automation config so it would not get stuck the same way again. The factory learned from the failure.

"You're no longer in the loop. You're on the loop. You're designing the feedback loops as opposed to implementing them."

Day 5: What the factory catches (and what it misses)

132 PRs merged. 12,202 LOC.

Ona COO Philipp Pietsch joined for the Week 1 finale. He had been stress-testing the app and showed up with 12 bugs: drag-and-drop issues, invisible checkboxes, broken hyperlinks, a slash menu that jumped when scrolling.

The factory had been fixing bugs too. Just not his. Through Sentry, it had caught and fixed runtime errors no human reported: a Lexical editor error breaking link editing, hydration mismatches, network retry gaps, a Safari router bug. The factory sees what crashes. It does not see what looks wrong.

This is where the quality controls in our harness earn their keep. We use quality.md as a self-assessment file the factory maintains, grading every feature area. When the backlog runs empty, the Feature Planner reads quality.md and creates issues for anything below standard. It drove real improvements: test coverage went from zero to a working suite, error handling moved from console.error to proper Sentry capture. But it has a ceiling. It grades what you can measure in code, not what you can only see by using the product.

What we learned

Speed is not the bottleneck. Over 130 PRs in five days. The initial feature set shipped in under a day. The factory builds faster than you can steer it.

The quality layer is what makes it a factory. Agents reviewing agents, verifying deployments, triaging errors. Without this, you have agents writing code. With it, you have a production line.

The human role shifts, not shrinks. Build, review, merge, deploy, verify all run autonomously. What does not run autonomously is taste, product direction, and redesigning the system when it hits a new class of failure. The human moves from writing code to designing the factory and deciding what good looks like.

Be specific. The bugs Philipp found were not factory failures. They were spec gaps. The more precise the input, the less you rely on a human to catch what is wrong on the other end.

What's next

The factory can build. Now it needs to build well. We have one week left of daily livestreams. We will be looking at design systems, feedback loops, and the feature roadmap.

The repo is public. The app is live. We are streaming daily through April 25.

Watch the streams · Explore the repo · Try the app

Built with Ona.

Join 440K engineers getting biweekly insights on building AI organizations and practices

Related blogs

This website uses cookies to enhance the user experience. Read our cookie policy for more info.