Documentation Index
Fetch the complete documentation index at: https://ona.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Increased runner capacity for large deployments
Large runners now use 16 vCPU and 32 GB of memory, up from 8 vCPU and 16 GB. This fixes CPU saturation observed on busy runners handling hundreds of concurrent environments. No configuration changes are needed. The new sizing takes effect after the infrastructure upgrade.Infrastructure upgrade required
The upgrade updates the Fargate task definition and adds a memory-based scaling policy for the proxy service.To upgrade, go to Settings > Runners, select your runner, open the three-dot menu, and click Upgrade runner. See the upgrade documentation for step-by-step instructions.What else is in this release
Improvements
Improvements
- Proxy service now scales on memory utilization in addition to CPU, preventing exhaustion from long-lived connections.
- Shell history is now shared across terminal tabs within the same environment for bash and zsh.
- Ona agent sessions resume immediately when a devcontainer rebuild finishes.
Availability Zone capacity fallback
Environment launches now automatically retry in a different Availability Zone when one runs out of EC2 capacity, instead of failing immediately. Fallback subnets are tried in random order to distribute load evenly. This eliminates a class of launch failures observed during high-concurrency workloads where a single AZ exhausts its instance capacity while others remain available.What else is in this release
Improvements
Improvements
- Environments are no longer incorrectly reported as stopped during shard handoff on multi-replica runners, preventing orphaned VMs.
- Agent executions are no longer orphaned or duplicated after shard handoff. Reconcilers are drained on lost shards and pending work is re-discovered on the new owner.
- Environments stopped by the disconnected timeout now restart correctly when a user sends a new message to an agent, instead of hanging indefinitely.
- Supervisor restart no longer fails when orphaned child processes hold the SSH proxy port. All processes in the supervisor cgroup are now terminated on stop.
- Load balancer health checks verify the proxy is serving HTTP responses instead of only checking TCP connectivity.
- Security dependency upgrades address critical and high-severity CVEs in pgx, go-jose, jsonparser, and OpenTelemetry exporters.
Security: Ubuntu 26.04 and CVE reduction
Environment VMs now run Ubuntu 26.04 with kernel 7.0, reducing total CVEs from 6,731 to 275 (96% reduction). The Docker stack is bumped to 29.4.3, BuildKit to v0.29.0, and all rootfs binaries are compiled with Go 1.25.10, fixing 12 additional Go stdlib CVEs.What else is in this release
Improvements
Improvements
- Environments no longer get stuck in STOPPING state. Snapshot preparation gives up after 10 minutes on transport errors instead of retrying indefinitely, and batch stop failures fall back to stopping instances individually.
- Environments stopped by the disconnected timeout are no longer restarted by the agent reconciler, fixing a ~36-minute bounce loop.
- On dual-disk runners, the data disk resize now completes before content initialization starts, fixing
ENOSPCerrors with large container images. - Warm pool claims work correctly across workers on multi-replica runners, preventing unnecessary cold launches.
- Prebuild environments start with a clean data disk instead of inheriting stale base snapshots.
- Bitbucket repository search and organization listing work again after Bitbucket deprecated cross-workspace APIs.
- Agent goal status now reaches the dashboard correctly.
- Inline image data in agent conversations is offloaded to blob storage before entering live streams and history, reducing bandwidth.
- Runner updates apply with zero downtime — new Fargate tasks are healthy before old ones drain.
- Agent executions are picked up immediately after shard handoff on multi-replica runners, instead of waiting up to 1 hour.
- Agent conversation streams are protected against corruption during shard handoffs on multi-replica runners.
- Agent conversation history loads up to 10x faster for long conversations.
Faster startup and credential redaction
Environment startup is faster. Disk warming for startup-critical paths now runs in parallel, host binaries (docker, containerd, runc, node, buildkitd) are pre-warmed alongside data disk paths, and warm pool scaling targets adapt dynamically to EBS snapshot size so large prebuilds are fully hydrated before instances are claimed.Credentials printed to process output (AWS keys, GitHub tokens, bearer tokens, basic-auth URLs) are now redacted before they reach environment status messages, on-disk state, logs, and tracing spans.This release also patches CVE-2026-5450 (Critical, glibc) along with four High-severity CVEs in glibc and OpenSSL via a base image digest bump.What else is in this release
New
New
- Automation services support a configurable readiness timeout. Environments where the supervisor fails to start are now stopped instead of hanging indefinitely.
- The SCM organization list in the project creation flow supports pagination and search for GitLab.
Improvements
Improvements
- Prebuild snapshots correctly take precedence over base snapshots on dual-disk environments, fixing cases where prebuild data was discarded.
- The prebuild executor’s git identity is cleared from the data disk before snapshot, preventing identity leakage to environments started from that prebuild.
- Binary downloads use atomic writes to prevent truncated files. SHA-256 mismatches are retried automatically.
- The supervisor recovers from stale git config lock files left after an unclean shutdown, instead of entering a panic loop.
- File watch self-healing for the security agent works correctly in Docker-in-Docker environments, including after devcontainer rebuilds.
- AWS
DescribeImagesAPI calls are scoped to owned AMIs, reducing hundreds of paginated API calls per sync cycle to a handful. - The runner proxy auto-scales (2-5 replicas) and uses larger task sizes for large runners.
- Environment logs remain accessible after instance termination.
- Updated VM images for AWS runners.
- CloudFormation descriptions updated to use Ona branding.
Performance and operational improvements
This release improves startup performance, reliability, and operational visibility for EC2 runners. To that end, this release introduces a managed metrics pipeline that lets you export runner metrics for monitoring runner health, environment lifecycle, and resource utilization. Every payload is written to S3 for auditing. Contact your account team to enable it.New
New
- Terminals are now killed when the dev container is rebuilt, preventing unresponsive sessions after a rebuild.
- When multiple MCP servers expose tools with the same name, tool names are automatically prefixed with the server name to prevent silent overwrites.
Improvements
Improvements
- Environment startup is faster. Independent supervisor initialization steps now run concurrently, and disk pre-warming runs for all instances with startup-critical paths prioritized.
- SCM context parsing uses ETag-based caching, reducing latency for repeated operations.
- Environments with a configured idle timeout now auto-stop correctly when all SSH connections close.
- OAuth token refresh is more resilient. The token cache is invalidated on permanent errors, and retries use exponential backoff.
- The “All Changes” diff view no longer shows stale or empty results when starting environments from pull requests.
- Git status parsing correctly handles renamed files, fixing broken tree rendering and diff fetching.
- Devcontainer features referenced by local path no longer break the cache key computation.
- Instances under memory pressure now receive stop commands promptly.
- The
ReadFileAPI no longer returns stale content due to cache collisions. - CORS headers are now set on the in-environment browser proxy, fixing silent failures for cross-origin requests.
- Agent SCM tool registration errors are no longer fatal, preventing empty system prompts when tool setup fails.
- The runner-side agent now shows the “MCP servers taking longer than expected” warning.
- GitHub PR agent reactions fire reliably when mentioning the agent.
- Core dumps are disabled at supervisor startup, preventing potential secret leakage.
- Updated Node.js to v24.14.1 (security) and BuildKit to v0.28.1.
Faster startup and reliability improvements
Environment startup is 1-2 seconds faster. Automation trigger API calls now run in parallel instead of sequentially, and the devcontainer reconciler caches configuration reads in steady state, saving an additional ~130ms per cycle.What else is in this release
Improvements
Improvements
- Automation-triggered agent executions no longer get stuck in a waiting state when the agent attempts to ask for user input. The request is rejected immediately so the agent can proceed autonomously.
- File watch self-healing now works correctly in all configurations. The discovery agent starts when watch mode is enabled, and the path denylist updates after a denylisted file is unlinked and recreated.
- BPF watch-only mode emits
WATCH_WRITEandWATCH_MMAPevents correctly when untouchable mode is off. - Fixed a runner manager startup panic when multiple managed runners run in the same process.
- Updated VM images for AWS runners.
- Security dependency update:
go-jose/v4bumped to v4.1.4 (fixes GHSA-78h2-9frx-2jm8).
Warm pools now GA
Warm pools keep pre-initialized EC2 instances running from the latest prebuild snapshot. When you create an environment, Ona claims an instance that is already running with the snapshot loaded instead of launching a new one. Startup drops from minutes to around 10 seconds.Enable warm pools per environment class in your project’s prebuild settings. The runner dynamically scales the pool between 0 and your configured maximum (up to 10 in the dashboard, up to 20 via the CLI) based on demand. It also handles replenishment and automatic snapshot rotation when new prebuilds complete.Requires an Enterprise plan. Currently available on EC2 runners only. See the warm pools documentation for prerequisites and setup instructions.Infrastructure upgrade required
This release requires a CloudFormation stack update.The full update takes ~30 minutes. Your data and environments are preserved. Running environments reconnect automatically after the update completes.Before you upgrade
- Note your Prometheus metrics settings. The upgrade resets them. You will re-enter them afterward. See Custom metrics pipeline.
-
Internet Gateway users (no NAT gateway): You must set Assign Public IP to
truein the Network Configuration section during the CloudFormation parameter review step. - Templates from January 2025 or earlier: Either stop and discard existing environments before upgrading, or add port 22 to your security group first.
Upgrade steps
- Go to Settings > Runners and select your runner
- Open the three-dot menu and click Upgrade runner
- Follow the dialog to update your CloudFormation stack
- Re-enter your Prometheus metrics settings after the update completes
What else is in this release
New
New
- Fargate replaces EC2 instances for the runner service. No more AMI allowlisting or update bottlenecks.
- MemoryDB persists Ona agent conversations in real time, with S3 as a durable backup. This is a new billable AWS resource in your account.
- Runner sizing lets you choose between
smallandlargeinfrastructure via a CloudFormation parameter. Selectlargeif your organization runs many concurrent agent sessions. - Runner update windows let you control when your runner applies updates. Set a maintenance window to avoid disruptions during peak hours.
Improvements
Improvements
- Environment startup is faster thanks to earlier Docker socket activation and optimized content initialization.
- Runner updates no longer cause brief user disconnects. The proxy now runs as a separate service.