Documentation Index
Fetch the complete documentation index at: https://ona.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Warm pools now available on GCP
Warm pools keep pre-initialized Compute Engine instances in a suspended state, ready to resume when you create an environment. Instead of provisioning a new VM and loading the prebuild snapshot from scratch, Ona claims a suspended instance and resumes it. Startup drops from minutes to around 10 seconds.Enable warm pools per environment class in your project’s prebuild settings. The runner dynamically scales the pool between your configured minimum and maximum based on demand, and rotates instances automatically when new prebuilds complete.Requires an Enterprise plan. See the warm pools documentation for prerequisites and setup instructions.Infrastructure upgrade required
This release requires a Terraform module upgrade to v2.0.0 to enable warm pools and apply IAM changes.New IAM permissions added to the runner custom role:| Permission | Purpose |
|---|---|
compute.autoscalers.create | Manage MIG autoscalers for dynamic warm pool scaling |
compute.autoscalers.delete | Clean up autoscalers when warm pools are removed |
compute.autoscalers.get | Read autoscaler state during reconciliation |
compute.autoscalers.update | Adjust autoscaler targets as demand changes |
compute.instanceGroupManagers.use | Required for autoscaler to manage MIG instances |
compute.instances.listReferrers | Discover which MIG owns a VM during warm pool operations |
compute.instances.resume | Resume suspended warm pool VMs on claim |
monitoring.timeSeries.create | Publish scaling metrics that drive the autoscaler |
- The project-level
iam.serviceAccounts.actAsandiam.serviceAccounts.getAccessTokenpermissions have been removed from the runner custom role. - Instead, the runner SA is granted
roles/iam.serviceAccountUseron three specific service accounts:runner_sa,environment_vm_sa, andproxy_vm_sa. This limits impersonation to only the SAs the runner attaches to instances. - The runner assets bucket role has been elevated from
roles/storage.objectViewertoroles/storage.objectAdminto support writing managed metrics audit payloads.
- Unused service accounts (
build_cache,secret_manager,pubsub_processor) are removed. - Environment UDP egress is now restricted to DNS, NTP, and QUIC.
Upgrade steps
- Update the
versionconstraint in yourmain.tfmodule block tov2.0.0. See the release page for details. - Run
terraform init -upgradeto fetch the new module. - Run
terraform plan -out=tfplanand review the changes, paying attention to IAM and firewall rule updates. - Run
terraform apply tfplan. - If you use pre-created service accounts, you must:
- Add the new custom role permissions listed above.
- Grant
roles/iam.serviceAccountUseron therunner_sa,environment_vm_sa, andproxy_vm_saservice accounts to the runner SA.
What else is in this release
New
New
- Managed metrics pipeline lets you export runner metrics via Prometheus
remote_writefor monitoring runner health, environment lifecycle, and resource utilization. Contact your account team to enable it. - Quota and capacity errors from GCP are now surfaced as clear machine failure messages instead of generic errors.
- Automation services support a configurable readiness timeout, preventing services from hanging indefinitely when a health check never passes.
- Orphaned MIGs, autoscalers, instance templates, and warm pool instances are automatically cleaned up, preventing resource leaks.
Improvements
Improvements
- Environment startup is faster. Supervisor initialization steps now run concurrently, disk pre-warming prioritizes startup-critical paths, and git configuration runs in fewer round trips.
- Warm pool claim reliability is improved. The runner picks the oldest available instance, skips in-flight instances, and recovers the default network route after resuming a suspended VM.
- Async VM creation failures are now surfaced via Pub/Sub instead of silently failing.
- Log line ordering within the same timestamp is now preserved.
- The agent operations proxy is more resilient to transient connection failures.
- Prebuild snapshots no longer carry stale git identity from the prebuild executor.
- File watch self-healing works correctly when a denylisted file is unlinked and recreated inside Docker-in-Docker.
- The runner recovers gracefully from stale gitconfig lock files.
Security
Security
- Updated
go-jose/v4to v4.1.4 (High severity, GHSA-78h2-9frx-2jm8). - Updated
go.opentelemetry.io/otel/sdkto v1.43.0 (High severity). - Updated Node.js to v24.14.1 (High severity).
- Updated base container images and Prometheus for CVE fixes.
- Go toolchain bumped to go1.26.2 (fixes CVE-2026-27143).