Runner Monitoring and Metrics

Available on the Enterprise tier. Contact sales to learn more.

Ona runners expose Prometheus metrics for runner health, environment lifecycle, and resource utilization.

Enabling metrics collection

Go to Settings → Runners
Select your runner
Toggle Enable metrics collection
Enter your configuration:

Parameter	Required	Description
Metrics collector URL	Yes	Prometheus remote write endpoint
Username	No	Basic auth username
Password	No	Basic auth password

Click Save Configuration

Your credentials are encrypted at rest and transmitted securely. They are never exposed in logs or the dashboard.

Metrics flow immediately. Ensure outbound HTTPS (port 443) is allowed to your endpoint. Network requirements: AWS | GCP

What to monitor

Runners expose many metrics, but not all require your attention. Some indicate issues you can resolve directly in your cloud account. Others signal problems that require Ona support. The rest provide visibility into usage and system health.

Act on these

These metrics reflect infrastructure you control. Set up alerts and respond directly.

Metric	What it means	What to do
`up == 0`	Runner is unreachable	Check network connectivity, security groups, and firewall rules
`gitpod_gateway_proxy_up == 0`	Proxy is down	Check runner logs and network configuration
`gitpod_gateway_proxy_http_requests_total`	Proxy request errors	Filter for `status_code` 4xx/5xx to identify failing requests; may indicate misconfigured clients or network issues

Contact support for these

These metrics indicate issues within the runner itself. You can’t resolve them directly, but they help you know when to reach out.

Metric	What it means	What to tell support
`gitpod_runnerkit_function_errors_total`	Total internal operation failures	Share the error rate trend and affected time window
`workqueue_unfinished_work_seconds`	Processing is stuck (value stays elevated)	Note how long it’s been elevated and any correlated symptoms
`environment_error_errors_total`	Environment creation/operation failures	Include the `error_code` and `component` labels from the metric

Informational

These metrics provide visibility but don’t typically require action.

Metric	What it shows
`gitpod_runnerkit_active_instances`	Current environment count by state

Example alerts

These alerts cover high-signal scenarios that directly impact your users. Runner and proxy availability determine whether users can access environments at all. Proxy error rates indicate active failures during environment connections. These are the first things to know about when something goes wrong.

Runner unreachable

Check your network configuration, security groups, and firewall rules.

- alert: RunnerUnreachable
  expr: up == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Runner is unreachable"
    runbook: "Check network connectivity and security groups"

Proxy error rate elevated

Indicates users are experiencing failures connecting to environments.

- alert: ProxyErrorRateElevated
  expr: |
    sum(rate(gitpod_gateway_proxy_http_requests_total{status_code=~"5.."}[5m]))
    / sum(rate(gitpod_gateway_proxy_http_requests_total[5m])) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Proxy 5xx error rate above 5%"
    runbook: "Check proxy logs and backend connectivity"

High error rate (contact support)

This alert signals an issue you can’t fix directly. Contact support with the time window and error details.

- alert: HighErrorRate
  expr: |
    rate(gitpod_runnerkit_function_errors_total[5m])
    / rate(gitpod_runnerkit_function_calls_total[5m]) > 0.1
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Runner error rate elevated"
    runbook: "Contact Ona support with time window and error_code labels"

Available metrics

All metrics include these common labels:

Label	Description
`stack`	Runner stack name (e.g., `Ona-AWS-US-East---Enterprise`)
`account_id`	Cloud provider account ID
`region`	Deployment region
`instance`	Container hostname
`job`	Prometheus job name (`ec2_runner`, `runner_manager`, `proxy`)

The tables below list additional metric-specific labels where applicable.

Standard

Common metrics available on all runners.

Metric	Type	Labels	Description
`up`	Gauge	-	Target health (1 = up, 0 = down)
`gitpod_gateway_proxy_up`	Gauge	-	Proxy health (1 = up, 0 = down)
`gitpod_ec2_runner_version_info`	Gauge	`version`	Runner version (AWS)
`gitpod_runner_version`	Gauge	`version`, `kind`	Runner version (GCP)

Environment (`gitpod_runnerkit_*`)

Metrics for environment lifecycle operations including creation, supervision, and state management.

Metric	Type	Labels	Description
`gitpod_runnerkit_active_instances`	Gauge	`state`	Environments by state
`gitpod_runnerkit_environment_operation_duration_seconds`	Histogram	`operation`	Operation duration
`gitpod_runnerkit_function_calls_total`	Counter	`function`	Function calls
`gitpod_runnerkit_function_duration_seconds`	Histogram	`function`	Function duration
`gitpod_runnerkit_function_errors_total`	Counter	`function`	Function errors
`gitpod_runnerkit_supervisor_status_events_total`	Counter	-	Supervisor events
`gitpod_runnerkit_supervisor_watch_starts_total`	Counter	-	Watch starts
`gitpod_runnerkit_supervisor_watch_closes_total`	Counter	`reason`	Watch closes
`gitpod_runnerkit_supervisor_watch_duration_seconds`	Histogram	-	Watch duration

Snapshots (`snapshot_*`)

Metrics for environment snapshot operations used for persistence and restore.

Metric	Type	Labels	Description
`snapshot_reconcile_duration_seconds`	Histogram	`phase`, `result`	Processing time by phase
`snapshot_in_progress`	Gauge	`phase`	Active snapshots
`snapshot_timeouts_total`	Counter	-	Timeouts
`snapshot_deletions_total`	Counter	`result`	Deletions

Work queue (`workqueue_*`)

Internal task queue metrics. A growing workqueue_depth or high workqueue_unfinished_work_seconds may indicate the runner is falling behind on processing. Contact support if these remain elevated.

Metric	Type	Labels	Description
`workqueue_depth`	Gauge	`name`	Queue depth
`workqueue_adds_total`	Counter	`name`	Items added
`workqueue_queue_duration_seconds`	Histogram	`name`	Time in queue
`workqueue_work_duration_seconds`	Histogram	`name`	Processing time
`workqueue_unfinished_work_seconds`	Gauge	`name`	Stuck work indicator
`workqueue_longest_running_processor_seconds`	Gauge	`name`	Longest processor
`workqueue_retries_total`	Counter	`name`	Retries

Errors (`environment_error_*`)

Tracks environment-level errors. Use the error_code and component labels when reporting issues to support.

Metric	Type	Labels	Description
`environment_error_errors_total`	Counter	`instance_id`, `error_code`, `component`	Errors by instance/code/component

Gateway proxy (`gitpod_gateway_proxy_*`)

The gateway proxy handles all traffic between users and environments. It runs as a container named proxy in the same ECS task as the runner (AWS deployments).

HTTP requests

Metric	Type	Labels	Description
`gitpod_gateway_proxy_http_requests_total`	Counter	`protocol`, `status_code`	Total HTTP requests processed
`gitpod_gateway_proxy_http_request_duration_seconds`	Histogram	`protocol`	Request duration
`gitpod_gateway_proxy_http_requests_in_flight`	Gauge	`protocol`	Active requests
`gitpod_gateway_proxy_http_request_size_bytes`	Histogram	`protocol`	Request size
`gitpod_gateway_proxy_http_response_size_bytes`	Histogram	`protocol`	Response size

Connections

Metric	Type	Labels	Description
`gitpod_gateway_proxy_http_connections_in_flight`	Gauge	`protocol`	Active connections
`gitpod_gateway_proxy_http_connection_errors_total`	Counter	`protocol`, `error_type`	Connection errors
`gitpod_gateway_proxy_http_connection_duration_seconds`	Histogram	`protocol`	Connection duration

Backend

Metric	Type	Labels	Description
`gitpod_gateway_proxy_http_backend_request_duration_seconds`	Histogram	`protocol`	Backend request duration
`gitpod_gateway_proxy_http_backend_failures_total`	Counter	`error_type`	Backend failures
`gitpod_gateway_proxy_http_backend_connections_in_flight`	Gauge	`protocol`	Active backend connections

DNS

Metric	Type	Labels	Description
`gitpod_gateway_proxy_dns_resolution_duration_seconds`	Histogram	-	DNS resolution time
`gitpod_gateway_proxy_dns_cache_hits_total`	Counter	-	DNS cache hits
`gitpod_gateway_proxy_dns_cache_misses_total`	Counter	-	DNS cache misses
`gitpod_gateway_proxy_dns_errors_total`	Counter	`error_type`	DNS errors
`gitpod_gateway_proxy_gitpod_proxy_dns_negative_cache_hits_total`	Counter	-	Negative cache hits
`gitpod_gateway_proxy_gitpod_proxy_dns_failures_by_code_total`	Counter	`code`	DNS failures by HTTP status
`gitpod_gateway_proxy_gitpod_proxy_dns_cache_invalidations_total`	Counter	-	Cache invalidations
`gitpod_gateway_proxy_gitpod_proxy_dns_cache_invalidations_batch_total`	Counter	-	Batch cache invalidations
`gitpod_gateway_proxy_gitpod_proxy_environment_not_found_total`	Counter	-	Requests for non-existent environments

TLS

Metric	Type	Labels	Description
`gitpod_gateway_proxy_tls_handshake_duration_seconds`	Histogram	`protocol_version`	TLS handshake duration
`gitpod_gateway_proxy_tls_errors_total`	Counter	`error_type`	TLS errors

Security

Metric	Type	Labels	Description
`gitpod_gateway_proxy_suspicious_requests_total`	Counter	`type`	Suspicious requests detected

GCP-specific (`gitpod_gcp_*`)

Metrics specific to GCP runner deployments, tracking compute and Redis connectivity.

Metric	Type	Labels	Description
`gitpod_gcp_compute_network_errors_total`	Counter	`error_type`	Network errors
`gitpod_gcp_compute_redis_connection_errors_total`	Counter	-	Redis errors
`gitpod_gcp_compute_redis_connection_health`	Gauge	-	Redis health
`gitpod_gcp_network_connection_health`	Gauge	-	Network health

Troubleshooting

Metrics not appearing?

Check network connectivity to your endpoint
Verify authentication credentials
Check runner logs for errors

Network requirements: AWS | GCP High cardinality?

Aggregate at runner level instead of per-environment
Adjust retention for high-volume metrics

Get Started

Understanding Ona

Environments

Agents

Automations

Runners

Security & Compliance

Organizations

Projects

Integrations

Source Control

Editors & IDEs

Reference

Monitoring and Metrics

Enabling metrics collection

What to monitor

Act on these

Contact support for these

Informational

Example alerts

Runner unreachable

Proxy error rate elevated

High error rate (contact support)

Available metrics

Standard

Environment (`gitpod_runnerkit_*`)

Snapshots (`snapshot_*`)

Work queue (`workqueue_*`)

Errors (`environment_error_*`)

Gateway proxy (`gitpod_gateway_proxy_*`)

HTTP requests

Connections

Backend

DNS

TLS

Security

GCP-specific (`gitpod_gcp_*`)

Troubleshooting

Get Started

Understanding Ona

Environments

Agents

Automations

Runners

Security & Compliance

Organizations

Projects

Integrations

Source Control

Editors & IDEs

Reference

​Enabling metrics collection

​What to monitor

​Act on these

​Contact support for these

​Informational

​Example alerts

​Runner unreachable

​Proxy error rate elevated

​High error rate (contact support)

​Available metrics

​Standard

​Environment (gitpod_runnerkit_*)

​Snapshots (snapshot_*)

​Work queue (workqueue_*)

​Errors (environment_error_*)

​Gateway proxy (gitpod_gateway_proxy_*)

​HTTP requests

​Connections

​Backend

​DNS

​TLS

​Security

​GCP-specific (gitpod_gcp_*)

​Troubleshooting

Enabling metrics collection

What to monitor

Act on these

Contact support for these

Informational

Example alerts

Runner unreachable

Proxy error rate elevated

High error rate (contact support)

Available metrics

Standard

Environment (`gitpod_runnerkit_*`)

Snapshots (`snapshot_*`)

Work queue (`workqueue_*`)

Errors (`environment_error_*`)

Gateway proxy (`gitpod_gateway_proxy_*`)

HTTP requests

Connections

Backend

DNS

TLS

Security

GCP-specific (`gitpod_gcp_*`)

Troubleshooting