Skip to main content
Available on the Enterprise tier. Contact sales to learn more.
Ona runners expose Prometheus metrics for runner health, environment lifecycle, and resource utilization.

Enabling metrics collection

  1. Go to Settings → Runners
  2. Select your runner
  3. Toggle Enable metrics collection
  4. Enter your configuration:
ParameterRequiredDescription
Metrics collector URLYesPrometheus remote write endpoint
UsernameNoBasic auth username
PasswordNoBasic auth password
  1. Click Save Configuration
Your credentials are encrypted at rest and transmitted securely. They are never exposed in logs or the dashboard.
Metrics flow immediately. Ensure outbound HTTPS (port 443) is allowed to your endpoint. Network requirements: AWS | GCP

What to monitor

Runners expose many metrics, but not all require your attention. Some indicate issues you can resolve directly in your cloud account. Others signal problems that require Ona support. The rest provide visibility into usage and system health.

Act on these

These metrics reflect infrastructure you control. Set up alerts and respond directly.
MetricWhat it meansWhat to do
up == 0Runner is unreachableCheck network connectivity, security groups, and firewall rules
gitpod_gateway_proxy_up == 0Proxy is downCheck runner logs and network configuration
gitpod_gateway_proxy_http_requests_totalProxy request errorsFilter for status_code 4xx/5xx to identify failing requests; may indicate misconfigured clients or network issues

Contact support for these

These metrics indicate issues within the runner itself. You can’t resolve them directly, but they help you know when to reach out.
MetricWhat it meansWhat to tell support
gitpod_runnerkit_function_errors_totalTotal internal operation failuresShare the error rate trend and affected time window
workqueue_unfinished_work_secondsProcessing is stuck (value stays elevated)Note how long it’s been elevated and any correlated symptoms
environment_error_errors_totalEnvironment creation/operation failuresInclude the error_code and component labels from the metric

Informational

These metrics provide visibility but don’t typically require action.
MetricWhat it shows
gitpod_runnerkit_active_instancesCurrent environment count by state

Example alerts

These alerts cover high-signal scenarios that directly impact your users. Runner and proxy availability determine whether users can access environments at all. Proxy error rates indicate active failures during environment connections. These are the first things to know about when something goes wrong.

Runner unreachable

Check your network configuration, security groups, and firewall rules.
- alert: RunnerUnreachable
  expr: up == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Runner is unreachable"
    runbook: "Check network connectivity and security groups"

Proxy error rate elevated

Indicates users are experiencing failures connecting to environments.
- alert: ProxyErrorRateElevated
  expr: |
    sum(rate(gitpod_gateway_proxy_http_requests_total{status_code=~"5.."}[5m]))
    / sum(rate(gitpod_gateway_proxy_http_requests_total[5m])) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Proxy 5xx error rate above 5%"
    runbook: "Check proxy logs and backend connectivity"

High error rate (contact support)

This alert signals an issue you can’t fix directly. Contact support with the time window and error details.
- alert: HighErrorRate
  expr: |
    rate(gitpod_runnerkit_function_errors_total[5m])
    / rate(gitpod_runnerkit_function_calls_total[5m]) > 0.1
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Runner error rate elevated"
    runbook: "Contact Ona support with time window and error_code labels"

Available metrics

All metrics include these common labels:
LabelDescription
stackRunner stack name (e.g., Ona-AWS-US-East---Enterprise)
account_idCloud provider account ID
regionDeployment region
instanceContainer hostname
jobPrometheus job name (ec2_runner, runner_manager, proxy)
The tables below list additional metric-specific labels where applicable.

Standard

Common metrics available on all runners.
MetricTypeLabelsDescription
upGaugeTarget health (1 = up, 0 = down)
gitpod_gateway_proxy_upGaugeProxy health (1 = up, 0 = down)
gitpod_ec2_runner_version_infoGaugeversionRunner version (AWS)
gitpod_runner_versionGaugeversion, kindRunner version (GCP)

Environment (gitpod_runnerkit_*)

Metrics for environment lifecycle operations including creation, supervision, and state management.
MetricTypeLabelsDescription
gitpod_runnerkit_active_instancesGaugestateEnvironments by state
gitpod_runnerkit_environment_operation_duration_secondsHistogramoperationOperation duration
gitpod_runnerkit_function_calls_totalCounterfunctionFunction calls
gitpod_runnerkit_function_duration_secondsHistogramfunctionFunction duration
gitpod_runnerkit_function_errors_totalCounterfunctionFunction errors
gitpod_runnerkit_supervisor_status_events_totalCounterSupervisor events
gitpod_runnerkit_supervisor_watch_starts_totalCounterWatch starts
gitpod_runnerkit_supervisor_watch_closes_totalCounterreasonWatch closes
gitpod_runnerkit_supervisor_watch_duration_secondsHistogramWatch duration

Snapshots (snapshot_*)

Metrics for environment snapshot operations used for persistence and restore.
MetricTypeLabelsDescription
snapshot_reconcile_duration_secondsHistogramphase, resultProcessing time by phase
snapshot_in_progressGaugephaseActive snapshots
snapshot_timeouts_totalCounterTimeouts
snapshot_deletions_totalCounterresultDeletions

Work queue (workqueue_*)

Internal task queue metrics. A growing workqueue_depth or high workqueue_unfinished_work_seconds may indicate the runner is falling behind on processing. Contact support if these remain elevated.
MetricTypeLabelsDescription
workqueue_depthGaugenameQueue depth
workqueue_adds_totalCounternameItems added
workqueue_queue_duration_secondsHistogramnameTime in queue
workqueue_work_duration_secondsHistogramnameProcessing time
workqueue_unfinished_work_secondsGaugenameStuck work indicator
workqueue_longest_running_processor_secondsGaugenameLongest processor
workqueue_retries_totalCounternameRetries

Errors (environment_error_*)

Tracks environment-level errors. Use the error_code and component labels when reporting issues to support.
MetricTypeLabelsDescription
environment_error_errors_totalCounterinstance_id, error_code, componentErrors by instance/code/component

Gateway proxy (gitpod_gateway_proxy_*)

The gateway proxy handles all traffic between users and environments. It runs as a container named proxy in the same ECS task as the runner (AWS deployments).

HTTP requests

MetricTypeLabelsDescription
gitpod_gateway_proxy_http_requests_totalCounterprotocol, status_codeTotal HTTP requests processed
gitpod_gateway_proxy_http_request_duration_secondsHistogramprotocolRequest duration
gitpod_gateway_proxy_http_requests_in_flightGaugeprotocolActive requests
gitpod_gateway_proxy_http_request_size_bytesHistogramprotocolRequest size
gitpod_gateway_proxy_http_response_size_bytesHistogramprotocolResponse size

Connections

MetricTypeLabelsDescription
gitpod_gateway_proxy_http_connections_in_flightGaugeprotocolActive connections
gitpod_gateway_proxy_http_connection_errors_totalCounterprotocol, error_typeConnection errors
gitpod_gateway_proxy_http_connection_duration_secondsHistogramprotocolConnection duration

Backend

MetricTypeLabelsDescription
gitpod_gateway_proxy_http_backend_request_duration_secondsHistogramprotocolBackend request duration
gitpod_gateway_proxy_http_backend_failures_totalCountererror_typeBackend failures
gitpod_gateway_proxy_http_backend_connections_in_flightGaugeprotocolActive backend connections

DNS

MetricTypeLabelsDescription
gitpod_gateway_proxy_dns_resolution_duration_secondsHistogramDNS resolution time
gitpod_gateway_proxy_dns_cache_hits_totalCounterDNS cache hits
gitpod_gateway_proxy_dns_cache_misses_totalCounterDNS cache misses
gitpod_gateway_proxy_dns_errors_totalCountererror_typeDNS errors
gitpod_gateway_proxy_gitpod_proxy_dns_negative_cache_hits_totalCounterNegative cache hits
gitpod_gateway_proxy_gitpod_proxy_dns_failures_by_code_totalCountercodeDNS failures by HTTP status
gitpod_gateway_proxy_gitpod_proxy_dns_cache_invalidations_totalCounterCache invalidations
gitpod_gateway_proxy_gitpod_proxy_dns_cache_invalidations_batch_totalCounterBatch cache invalidations
gitpod_gateway_proxy_gitpod_proxy_environment_not_found_totalCounterRequests for non-existent environments

TLS

MetricTypeLabelsDescription
gitpod_gateway_proxy_tls_handshake_duration_secondsHistogramprotocol_versionTLS handshake duration
gitpod_gateway_proxy_tls_errors_totalCountererror_typeTLS errors

Security

MetricTypeLabelsDescription
gitpod_gateway_proxy_suspicious_requests_totalCountertypeSuspicious requests detected

GCP-specific (gitpod_gcp_*)

Metrics specific to GCP runner deployments, tracking compute and Redis connectivity.
MetricTypeLabelsDescription
gitpod_gcp_compute_network_errors_totalCountererror_typeNetwork errors
gitpod_gcp_compute_redis_connection_errors_totalCounterRedis errors
gitpod_gcp_compute_redis_connection_healthGaugeRedis health
gitpod_gcp_network_connection_healthGaugeNetwork health

Troubleshooting

Metrics not appearing?
  1. Check network connectivity to your endpoint
  2. Verify authentication credentials
  3. Check runner logs for errors
Network requirements: AWS | GCP High cardinality?
  • Aggregate at runner level instead of per-environment
  • Adjust retention for high-volume metrics