|
November 2025
the real bottleneck of ai software engineering
|
We need to stop celebrating how fast AI can write code and start worrying about who is going to maintain it. The uncomfortable truth of this cycle is that we are currently using LLMs to generate the largest wave of legacy code in history. As the "magic" of benchmarks settles into a predictable curve and the reality of "Software 3.0" sets in, the bottleneck has shifted.
The problem isn't generating syntax; it's managing the crushing semantic load of a system that can type faster than you can think. We believe the future belongs to teams who treat AI not as a junior developer, but as an infrastructure layer for mass-refactoring and policy enforcement.
|
TL;DR
| • |
AWS re:Invent: Meet us in Las Vegas, Dec 1–4, at Booth 632 to walk through live Ona use cases like Java/.NET migrations and policy-driven updates inside your perimeter.
|
| • |
Benchmark signals: Data shows most LLM leaderboard gains roll into a single general capability factor
|
| • |
Agents are still hard: a breakdown on real design and engineering pain points in production agents
|
| • |
Launching automations: drive large, cross-repo initiatives instantly without the months of manual coordination they require today
|
| • |
Prebuilds: Cut down on startup time by orders of magnitude as environments start from a ready-to-work snapshot
|
|
Burnham analyzes a large table of Gemini 3 benchmark results and shows that most benchmark variation collapses into a single 'general capability' factor. Using the Epoch Capabilities Index and PCA, he then surfaces a smaller second component that tracks a Claude-style profile: strong on agentic and OS-style tasks, weaker on some vision and math benchmarks. The takeaway is that leaderboards mostly measure one dominant ability and can be misleading.
Ronacher walks through concrete lessons from building production agents. He describes why generic agent SDK abstractions break once you add real tools, how explicit cache management on platforms like Anthropic makes costs and behavior more predictable, and how reinforcement inside the loop becomes a core design tool. He also stresses strict failure isolation, shared file-system-style state between tools, and the difficulty of testing and evals, which still lag behind the rest of the stack.
This thread centers on a veteran engineer's view that the real bottleneck in software is semantic load, not typing speed. The post leans on Fred Brooks' 'No Silver Bullet' idea and argues that essential complexity lives in requirements, ambiguity resolution, and system design, while coding is mostly transcription. Top comments reinforce that writing good tickets, clarifying requirements, and capturing domain knowledge are still where most teams struggle.
Hadfield and Koh survey how autonomous AI agents could interact with humans and with each other inside markets and institutions. They outline questions around incentives, contracts, liability, and governance when economic activity increasingly runs through software actors that can negotiate, execute, and adapt on their own.They highlight that designing agent systems is as much an economics and law problem as it is an engineering one, especially once agents start making commitments and trading on behalf of organizations.
Oh contrasts 'Software 1.0' deterministic code with 'Software 2.0' and '3.0' systems that rely on LLMs and agents. He describes how unit tests behave differently when the 'computer' changes its mind, why engineers need to think in terms of semantic similarity instead of exact matches, and how latency starts to reflect depth of reasoning rather than just inefficiency. The article pushes engineers to treat prompts, datasets, and autonomy levels as first-class design decisions.
Gemini 3 is here, but how are users reacting? This thread mixes benchmark excitement with implementation details and skepticism. Commenters point to large gains on math and reasoning benchmarks and speculate that verified search and backtracking, not just bigger models, drive the improvement.
Max Kanat-Alexander of Capital One focuses on core aspects of agent scaffolding: development environments, inputs, and review quality as 'no regrets' investments that help both humans and agents. He stresses that high-quality code review is critical, since rubber-stamped PRs slowly degrade the codebase and, over time, the agents that rely on it. His message is clear: improving environments and review discipline is a timely and urgent part of AI-assisted development.
If you're attending re:Invent next month, we'd love to meet you in person. Book time with our team to walk through concrete Ona use cases like large scale code migrations, cross-repository refactors, and policy-driven updates that run inside your perimeter. Use this slot as a working session with us if you need to make near-term decisions on agent platforms or how to move forward with your AI SDLC initiatives.
Book a meeting ahead of time or stop by Booth 632 on the main floor to meet the team and see Ona's newest capabilities in action.
With Automations, we give engineering teams a way to drive large, cross-repo changes from one place instead of coordinating dozens of projects by hand. You define a workflow once (by combining prompts, scripts, and integrations) and we run it inside the same production-grade development environments your engineers use, with full logs and human review where you need it. Teams use Automations for CVE sweeps, documentation and config updates, and more, without relying on local setups or CI glue.
Prebuilds cut environment startup time by running your dev container build, lifecycle commands, and automation tasks ahead of time and storing the result as a snapshot. When someone starts an environment for a project with prebuilds enabled, they land in a ready-to-work state instead of waiting for setup to finish.
In this webinar, our Field CTO Lou Bichard walks through what it takes to run agent-driven migrations safely at scale. We focus on concrete requirements: secure execution environments, governance and audit, data sovereignty, agent quality checks, and where to keep humans in the loop. The session is aimed at teams that currently manage migrations with spreadsheets and emails and want a clear view of when agents are ready to take on CVE remediation, language migrations, and platform updates.
Slash commands let you standardize common prompts across your organization so people do not have to remember or paste long instructions. Admins can define commands like `/review-code` once in Settings, including the underlying prompt, and everyone can trigger them from Ona Agent's chat by typing `/` and selecting the right entry. This keeps reviews, test strategies, and other recurring tasks consistent while still allowing engineers to add context inline for each use.
AWS re:Invent – Las Vegas, Dec 1–4, 2025
We'll be at booth 632 on the expo floor, walking through real migrations, background agents, and Enterprise Runner setups. Use the event page to book a working session with our team while you're in Vegas.
May your builds stay swift and your agents never drift,
Your friends at Ona
|