Over two days and thirteen sessions, speakers from Stripe, Uber, Monzo, Cloudflare, and more converged on the same architecture for production agent infrastructure.
When we first floated the idea of a summit dedicated entirely to background agents, the honest question inside the team was: is this a real category yet? The term barely existed a year ago. We had a microsite, a thesis, and a handful of companies we knew were building this way, but we weren't sure the broader market was ready for it.
Then the registrations started coming in from CEOs, CTOs, VPs of Engineering at Fortune 500 companies, solution architects at major automotive OEMs, analyst firms, and platform teams at some of the largest pharmaceutical and financial services companies in the world. By the time we went live on May 6, the audience had outgrown the category we thought we were building for.
Over two days and thirteen sessions, we heard from speakers at Stripe, Harvey, Uber, Monzo, Cloudflare, AWS, Genentech, incident.io, Tessl, and more. The recordings are now available on demand.
This post is what we took away from it: the patterns that emerged when we stepped back and looked at all 13 talks together.
The summit opened with a question we'd been hearing on every call for months: "We rolled out Copilot. Engineers love it. Why hasn't cycle time improved?"
That question is the reason the summit exists. The gap between individual developer speed and organizational velocity is the defining problem of this moment in software engineering. Coding assistants made engineers faster. But faster engineers don't fix the bottleneck when the bottleneck is coordination, review queues, legacy migrations, and the 47 repos that need the same security patch applied, reviewed, and merged.
We built background-agents.com to name this problem. We called it the false summit. The virtual summit was our attempt to bring together the teams who had already climbed past it.
The first session set the tone. Alistair Gray from Stripe walked through Minions, their one-shot coding agents running on a 30-million-line Ruby codebase. He said something that stuck with us for the rest of the event: "Dev boxes were a strategy credit." Stripe built reproducible cloud environments years before they pointed an agent at a codebase. That infrastructure was built for developers. The agents just inherited it.
Then Nikhil Ramakrishnan from Uber described Minion (yes, Stripe's is Minions, Uber's is Minion, and no, they didn't coordinate), their background agent that now accounts for 11% of generated PRs. Different company, different stack, same pattern: cloud-based isolated environments, event triggers, fleet orchestration.
Then Joey Wang from Harvey showed Spectre, Cole Murray showed Open Inspect, modeled after Ramp's internal system, and Rajesh Bhatia described Cloudflare's stack.
None of these teams talked to each other, but all of them landed on the same five primitives: sandboxed environments, context connectivity, triggers, fleet orchestration, and governance.
We expected debate. What we got was convergence, and that's the single most important signal from the summit. These teams arrived at the same architecture independently, solving the same problem under different constraints.
Some sessions confirmed what we already believed, and a few changed our minds.
Uber started with boring work. Nikhil Ramakrishnan offered the most useful correction of the summit: "If your AI program is still chasing novelty, start with the boring work that steals engineering focus." Uber didn't start with code generation. They started with migration tooling, CI improvements, and review routing. The developer platform investments they made years ago became the substrate for useful agents. The lesson: don't chase the flashy use case. Chase the one your engineers dread.
Monzo proved that constraints help. Suhail Patel described how Monzo's opinionated platform (3,000+ microservices in a monorepo, static analysis, data-flow controls) made AI adoption more practical, not less. The engineering system already had strong defaults. Agents inherited those defaults. For every team worried that regulation or strict architecture will slow down AI adoption: Monzo's story says the opposite. Strong defaults are an accelerant.
Patrick Debois coined a new lifecycle. The person who coined "DevOps" stood up and argued that context now needs its own engineering discipline. He called it the Context Development Lifecycle: generate context (specs, AGENTS.md, MCP connections), evaluate it (LLM-as-judge, task-key unit tests), distribute it (skill registries, versioned packages), and observe it (agent logs, production errors fed back). His line that landed hardest: "People are writing documentation for the first time in their lives. Because it helps them in their job." When context directly improves agent output, the incentive to write things down flips.
Shardul Vaidya showed the endgame. His session on "dark factories" (borrowed from manufacturing: a factory with no humans, so why keep the lights on) was the most forward-looking talk. He built a working factory and demoed it live. An orchestrator decomposes requirements into a DAG, dispatches coding agents into isolated sandboxes, runs them through verification gates, and loops failures back through rework. Over 300 commits. The UI itself was built entirely by the factory. His key distinction: "When a task fails, the full context of why it failed is fed forward into the next attempt. That's not a retry. That's a rework." Most teams aren't here yet. But the path from background agents to software factory is shorter than it looks.
Security moved below the agent. Two sessions tackled this, and both arrived at the same place. Leo Di Donato and Lorenzo Fontana (co-creators of Falco) showed that agents reason around rules written at their own level. Editable command policies are too easy to bypass. Stephen Parkinson from Nono demonstrated a three-layer model: enforce what an agent can do at the kernel, attest the files and policies that steer it, and decide how headless agents request expanded capabilities. The takeaway for regulated industries: prompt-level guardrails are not governance. Infrastructure-level enforcement is.
Three things we didn't see coming:
Genomics. Xiucheng Quek from Genentech showed background agents operating across scientific domain knowledge, domain-specific file formats, and cloud jobs that fan out across thousands of instances. He surfaced a counterintuitive finding: highly specific skills can make agents slower. Specialization has diminishing returns when the context window fills up with domain knowledge that crowds out reasoning. We expected background agents to be a software engineering story. Genentech showed us it's a compute story that happens to start with code.
Who showed up. We built this for platform engineers and engineering leaders. We got that, plus Fortune 500 pharma companies, global automotive OEMs, Tier-1 financial institutions, major consulting firms, and industry analysts. Background agents aren't a developer tool decision anymore. They're an infrastructure decision that lands on the VP of Engineering's desk.
Agents change who writes code. Joey Wang from Harvey showed Spectre treating a Slack thread as a shared workspace where engineers, PMs, researchers, and legal experts align around one agent run. Cole Murray reinforced this: "We're starting to see PMs contributing code and able to deliver on their own product specifications." Lawrence Jones from incident.io showed the agent as a team member in Slack, fitting the rhythms of responders and engineers. Background agents change who can participate in the codebase, not just how fast engineers work. We didn't plan for this to be a theme, but it became one.
One number from the summit deserves its own section: 93%.
Rajesh Bhatia described how Cloudflare moved from assisted coding to delegated engineering across 93% of R&D. Ninety-three percent of their engineering organization, not a pilot team.
The how: platform primitives, identity controls, context systems, and code-review gates. Nothing exotic. The same five primitives, applied consistently. If your rollout is stuck with a few early adopters, this is the session to watch.
The summit answered the "what does production agent infrastructure look like?" question more clearly than we expected. The five primitives, the convergence, and the demand from engineering leaders outside the usual developer tooling audience all held up across every session.
All 13 sessions are available on demand. If you're short on time, start with three:
For the full framework, we synthesized the case studies, the five primitives, a maturity model, and the build-vs-buy decision into one reference: An engineering leader's guide to background agents.
We went into this summit wondering if "background agents" was a real category. The answer is yes. The teams that moved first on cloud development environments are now the teams shipping agents at scale, and the infrastructure investment they made years before the agent era is paying compound returns.
We're already planning what comes next. If you watched the sessions and have thoughts on what we should cover, we'd like to hear them.

Stripe
RampThe software factory is here. Now what?
Stripe built their agent platform before GPT-3 existed. Ramp hand-rolled theirs on Modal and Cloudflare. Here's the full stack breakdown to help you weigh build vs buy.
Figma made everyone a designer. Standardized environments, optimized for agents, do the same for software.
This website uses cookies to enhance the user experience. Read our cookie policy for more info.