> ## Documentation Index
> Fetch the complete documentation index at: https://ona.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# GCP Runner Releases

> Version releases and infrastructure updates for the GCP runner

<Update label="20260608.995" description="June 9, 2026">
  ## Dual-disk recovery and capacity fallback

  GCP runners can recover dual-disk environments more reliably after the original VM is gone. Environments with preserved data disks or completed data snapshots can be listed as stopped and started again with their data intact.

  This release also adds an optional cross-zone restart path. When `enable_cross_zone_restart` is enabled in the Terraform module, a stopped dual-disk environment can fall back to a ready data snapshot in another zone if the original zone has no VM capacity.

  ## Infrastructure upgrade required

  This release requires a Terraform module upgrade to [v2.0.3](https://github.com/gitpod-io/terraform-google-ona-runner/releases/tag/v2.0.3) ([Terraform Registry](https://registry.terraform.io/modules/gitpod-io/ona-runner/google/latest)).

  Key infrastructure changes:

  * The runner custom role now includes disk label, disk update, disk snapshot, snapshot cleanup, snapshot read, and autoscaler list permissions used by dual-disk recovery and warm-pool cleanup.
  * The root module and `examples/runner-with-networking` wrapper include an optional `enable_cross_zone_restart` input. The default is `false`, so existing deployments keep their current behavior unless you enable it.
  * The runner VM cloud-init passes the cross-zone restart setting to the GCP runner process.

  If you use pre-created service accounts or custom IAM roles, add the new permissions documented in the module release before applying the new runner version.

  #### Upgrade steps

  1. Update the `version` constraint in your `main.tf` module block to `v2.0.3`. See the [release page](https://github.com/gitpod-io/terraform-google-ona-runner/releases/tag/v2.0.3) for details.
  2. Run `terraform init -upgrade` to fetch the new module.
  3. Run `terraform plan -out=tfplan` and review the IAM and runner VM metadata changes.
  4. Run `terraform apply tfplan`.
  5. Optional: set `enable_cross_zone_restart = true` if you want stopped dual-disk environments to retry in another zone after a capacity error.

  Full walkthrough: [Upgrade GCP runner infrastructure](/ona/runners/gcp/update-runner#updating-infrastructure)

  ## What else is in this release

  <AccordionGroup>
    <Accordion title="New" icon="sparkles">
      * Warm pools can use dual-disk prebuild snapshots and preserve claimed data disks for assigned environments.
    </Accordion>

    <Accordion title="Improvements" icon="arrow-up-right">
      * Warm-pool assignment returns sooner because the runner no longer waits for GCP label, metadata, and disk auto-delete operations to finish before reporting the claimed instance.
      * Hyperdisk data disks restored from snapshots can temporarily use higher provisioned performance during hydration, then return to baseline after the GCP rate-limit window.
      * Data disk discovery polls more responsively while waiting for hot-attached disks, reducing tail latency during dual-disk startup.
      * GCP VM start, stop, delete, and resume failures now include enough INFO-level logging to diagnose common GCP API failures from support bundles.
      * Runner request streams reconnect faster after transient backend unavailability.
      * Credential proxy setup blocks less of environment startup while keeping private repository clone support available before content initialization.
      * Consumed dual-disk data snapshots are cleaned up after the environment is running.
    </Accordion>

    <Accordion title="Security" icon="shield-halved">
      * The GCP VM image now uses updated Google guest-agent and OS Config agent builds with patched gRPC dependencies.
      * VM image build components were rebuilt with patched Go toolchains, and Docker Engine was updated to 29.5.3.
      * Runner Go dependencies including `golang.org/x/crypto`, `golang.org/x/net`, and `github.com/cloudflare/circl` were updated to address fixable CVEs.
    </Accordion>
  </AccordionGroup>
</Update>

<Update label="20260527.1096" description="May 27, 2026">
  ## Updated VM image with Ubuntu 26.04

  Environment VMs now run Ubuntu 26.04 with kernel 7.0 and Docker 29.4.3. This upgrade reduces the total CVE count from 6,731 to 275 (a 96% reduction). The remaining CVEs are in upstream binaries (NVIDIA toolkit, Google guest agent) that we do not compile.

  No action is required. The new VM image is applied automatically when environments start.

  ## What else is in this release

  <AccordionGroup>
    <Accordion title="New" icon="sparkles">
      * Shell command history is now synced across terminal tabs within the same environment for bash and zsh.
      * Agents resume automatically when a devcontainer rebuild completes, instead of waiting for the next periodic check.
    </Accordion>

    <Accordion title="Improvements" icon="arrow-up-right">
      * Warm pool resume on GCP is now non-blocking. The dashboard shows environment status immediately instead of waiting up to two minutes for the VM to resume.
      * Warm pool instances are claimed only when running, avoiding resume timeout failures on suspended instances.
      * Data disk resize completes before content initialization in dual-disk mode, preventing out-of-space errors with large container images.
      * Environments stopped by inactivity timeout are no longer restarted in a loop by the agent reconciler.
      * Container service status correctly reports as stopped when the devcontainer stops, instead of showing stale running status.
      * The agent reconciler now waits for devcontainer readiness during rebuilds instead of failing with connection errors.
      * Bitbucket repository search and workspace listing work correctly after Bitbucket deprecated cross-workspace APIs.
      * Conversation chunk reads are batched with concurrent fan-out, reducing page load latency by up to 10x.
    </Accordion>

    <Accordion title="Security" icon="shield-halved">
      * Upgraded OTel exporters, gRPC, go-jose (High), and jsonparser (High) to address known CVEs.
      * Go bumped to 1.25.10 in VM build scripts.
      * The legacy credential proxy MITM architecture has been replaced with eBPF-based request rewriting.
    </Accordion>
  </AccordionGroup>
</Update>

<Update label="20260508.526" description="May 8, 2026">
  ## Zone failover for capacity errors

  If a GCP zone lacks capacity to create an environment, the runner now retries in a different zone automatically. This reduces the impact of zonal capacity exhaustion, though it does not eliminate it entirely. Runners configured with multiple zones benefit most.

  ## Infrastructure upgrade required

  This release requires a Terraform module upgrade to [v2.0.1](https://github.com/gitpod-io/terraform-google-ona-runner/releases/tag/v2.0.1) ([Terraform Registry](https://registry.terraform.io/modules/gitpod-io/ona-runner/google/latest)).

  Key infrastructure changes:

  * SSH access restricted to IAP-only (port 22 no longer open to `0.0.0.0/0`).
  * Shielded VM hardening enabled with Secure Boot, vTPM, and integrity monitoring. Project-wide SSH keys blocked on runner and proxy VMs.
  * Flow logging added to security-critical firewall rules.
  * Memory and CPU limits added to all Docker containers on the runner VM.
  * TLS certificate rotation fixed for the auth proxy.
  * Honeycomb API key removed from Terraform configuration and VM metadata.
  * Managed metrics direct push enabled for the metrics pipeline.

  #### Upgrade steps

  1. Update the `version` constraint in your `main.tf` module block to `v2.0.1`. See the [release page](https://github.com/gitpod-io/terraform-google-ona-runner/releases/tag/v2.0.1) for details.
  2. Run `terraform init -upgrade` to fetch the new module.
  3. Run `terraform plan -out=tfplan` and review the changes, paying attention to firewall and shielded VM settings.
  4. Run `terraform apply tfplan`.

  Full walkthrough: [Upgrade GCP runner infrastructure](/ona/runners/gcp/update-runner#updating-infrastructure)

  ## What else is in this release

  <AccordionGroup>
    <Accordion title="New" icon="sparkles">
      * The Terraform module version used to provision your runner infrastructure is now displayed on the runner details page in the dashboard.
      * External user IDs are now resolved for Bitbucket and GitLab auth tokens, enabling user attribution in Insights across all SCM providers.
    </Accordion>

    <Accordion title="Improvements" icon="arrow-up-right">
      * Environments that fail to start within 10 minutes (supervisor never connects) are now stopped automatically instead of staying in "starting" indefinitely.
      * Workspace folder path is correctly reported during environment creation when dotfiles are configured.
      * Supervisor retries asset downloads on SHA-256 mismatch instead of failing permanently.
      * File watch self-healing works reliably under Docker-in-Docker (fuse-overlayfs) after file unlink and recreate.
      * Agent conversations no longer stall silently when the model pauses mid-turn.
    </Accordion>

    <Accordion title="Security" icon="shield-halved">
      * Credentials (AWS keys, GitHub tokens, basic-auth URLs, bearer tokens, JWTs) are now redacted from environment status messages, on-disk state files, and process-output logs.
    </Accordion>
  </AccordionGroup>
</Update>

<Update label="20260504.828" description="May 4, 2026">
  ## Warm pools now available on GCP

  [Warm pools](/ona/projects/warm-pools) keep pre-initialized Compute Engine instances in a suspended state, ready to resume when you create an environment. Instead of provisioning a new VM and loading the prebuild snapshot from scratch, Ona claims a suspended instance and resumes it. Startup drops from minutes to around 10 seconds.

  Enable warm pools per environment class in your project's prebuild settings. The runner dynamically scales the pool between your configured minimum and maximum based on demand, and rotates instances automatically when new prebuilds complete.

  Requires an [Enterprise plan](https://ona.com/pricing). See the [warm pools documentation](/ona/projects/warm-pools) for prerequisites and setup instructions.

  ## Infrastructure upgrade required

  This release requires a Terraform module upgrade to [v2.0.0](https://github.com/gitpod-io/terraform-google-ona-runner/releases/tag/v2.0.0) to enable warm pools and apply IAM changes.

  **New IAM permissions added to the runner custom role:**

  | Permission                          | Purpose                                                  |
  | ----------------------------------- | -------------------------------------------------------- |
  | `compute.autoscalers.create`        | Manage MIG autoscalers for dynamic warm pool scaling     |
  | `compute.autoscalers.delete`        | Clean up autoscalers when warm pools are removed         |
  | `compute.autoscalers.get`           | Read autoscaler state during reconciliation              |
  | `compute.autoscalers.update`        | Adjust autoscaler targets as demand changes              |
  | `compute.instanceGroupManagers.use` | Required for autoscaler to manage MIG instances          |
  | `compute.instances.listReferrers`   | Discover which MIG owns a VM during warm pool operations |
  | `compute.instances.resume`          | Resume suspended warm pool VMs on claim                  |
  | `monitoring.timeSeries.create`      | Publish scaling metrics that drive the autoscaler        |

  **IAM role binding changes:**

  * The project-level `iam.serviceAccounts.actAs` and `iam.serviceAccounts.getAccessToken` permissions have been **removed** from the runner custom role.
  * Instead, the runner SA is granted `roles/iam.serviceAccountUser` on three specific service accounts: `runner_sa`, `environment_vm_sa`, and `proxy_vm_sa`. This limits impersonation to only the SAs the runner attaches to instances.
  * The runner assets bucket role has been elevated from `roles/storage.objectViewer` to `roles/storage.objectAdmin` to support writing managed metrics audit payloads.

  **Other infrastructure changes:**

  * Unused service accounts (`build_cache`, `secret_manager`, `pubsub_processor`) are removed.
  * Environment UDP egress is now restricted to DNS, NTP, and QUIC.

  #### Upgrade steps

  1. Update the `version` constraint in your `main.tf` module block to `v2.0.0`. See the [release page](https://github.com/gitpod-io/terraform-google-ona-runner/releases/tag/v2.0.0) for details.
  2. Run `terraform init -upgrade` to fetch the new module.
  3. Run `terraform plan -out=tfplan` and review the changes, paying attention to IAM and firewall rule updates.
  4. Run `terraform apply tfplan`.
  5. If you use [pre-created service accounts](/ona/runners/gcp/setup#pre-created-service-accounts), you must:
     * Add the new custom role permissions listed above.
     * Grant `roles/iam.serviceAccountUser` on the `runner_sa`, `environment_vm_sa`, and `proxy_vm_sa` service accounts to the runner SA.

  Full walkthrough: [Upgrade GCP runner infrastructure](/ona/runners/gcp/update-runner#updating-infrastructure)

  ## What else is in this release

  <AccordionGroup>
    <Accordion title="New" icon="sparkles">
      * Managed metrics pipeline lets you export runner metrics via Prometheus `remote_write` for monitoring runner health, environment lifecycle, and resource utilization. Contact your account team to enable it.
      * Quota and capacity errors from GCP are now surfaced as clear machine failure messages instead of generic errors.
      * Automation services support a configurable readiness timeout, preventing services from hanging indefinitely when a health check never passes.
      * Orphaned MIGs, autoscalers, instance templates, and warm pool instances are automatically cleaned up, preventing resource leaks.
    </Accordion>

    <Accordion title="Improvements" icon="arrow-up-right">
      * Environment startup is faster. Supervisor initialization steps now run concurrently, disk pre-warming prioritizes startup-critical paths, and git configuration runs in fewer round trips.
      * Warm pool claim reliability is improved. The runner picks the oldest available instance, skips in-flight instances, and recovers the default network route after resuming a suspended VM.
      * Async VM creation failures are now surfaced via Pub/Sub instead of silently failing.
      * Log line ordering within the same timestamp is now preserved.
      * The agent operations proxy is more resilient to transient connection failures.
      * Prebuild snapshots no longer carry stale git identity from the prebuild executor.
      * File watch self-healing works correctly when a denylisted file is unlinked and recreated inside Docker-in-Docker.
      * The runner recovers gracefully from stale gitconfig lock files.
    </Accordion>

    <Accordion title="Security" icon="shield-halved">
      * Updated `go-jose/v4` to v4.1.4 (High severity, GHSA-78h2-9frx-2jm8).
      * Updated `go.opentelemetry.io/otel/sdk` to v1.43.0 (High severity).
      * Updated Node.js to v24.14.1 (High severity).
      * Updated base container images and Prometheus for CVE fixes.
      * Go toolchain bumped to go1.26.2 (fixes CVE-2026-27143).
    </Accordion>
  </AccordionGroup>
</Update>
