Skip to main content
Network misconfigurations are the most common cause of issues. See access requirements first.

CloudFormation stack fails

Symptoms: ROLLBACK_COMPLETE or ROLLBACK_IN_PROGRESS with errors like Parameter validation failed: parameter value for EC2RunnerInstancesSubnet does not exist. Fix: Ensure you select a VPC, at least one availability zone, and subnets across multiple AZs.

Runner task fails

Symptoms:
  • CREATE_FAILED with ECS Deployment Circuit Breaker was triggered
  • ResourceInitializationError in task logs
  • Cannot pull images or access AWS services
Fix:
  • Verify VPC has Internet Gateway or NAT Gateway
  • Update route tables (public → IGW, private → NAT)
  • For private subnets, add VPC endpoints for Secrets Manager, S3, ECR
  • Check security groups allow outbound HTTPS

Instance type not available

Symptoms: Error like “m6i.xlarge is not available in us-east-1e” Fix:

Unexpected costs

Symptoms: Unexpected AWS charges, or continued billing after deleting a runner. Fix:
  • See managing costs to identify resources
  • After deleting a runner, verify the CloudFormation stack is fully deleted
  • Check for residual EC2 instances or EBS volumes and delete manually

SSM access blocked

Symptoms:
  • Environments fail with AWS account policy blocks ssm:SendCommand
  • Runner marked as degraded
  • Slow startup (cache credentials can’t refresh)
Cause: Service Control Policies (SCPs) blocking SSM access. The runner needs ssm:SendCommand and ssm:GetCommandInvocation permissions. Fix: Request your AWS administrator add an exception for the runner’s IAM role:
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["ssm:SendCommand", "ssm:GetCommandInvocation"],
    "Resource": ["arn:aws:ec2:*:*:instance/*", "arn:aws:ssm:*:*:command/*"]
  }]
}

Network connectivity issues

Checklist:
  • Security groups: port 29222 (SSH), outbound HTTPS, port 22999 (internal)
  • Route tables: public subnets → IGW, private subnets → NAT
  • Network ACLs: not blocking required traffic
  • DNS: VPC DNS resolution enabled, can resolve app.gitpod.io
Test connectivity:
# Health endpoint (should return 200)
curl -v https://<your-domain>/_health

# Required endpoints
curl -I https://app.gitpod.io
curl -I https://public.ecr.aws

Restart runner after network changes

After changing security groups, route tables, or VPC endpoints, restart the runner: Console: ECS console → Clusters → your cluster → Services → Update → check Force new deployment CLI:
aws ecs update-service --cluster YOUR_CLUSTER_NAME --service YOUR_SERVICE_NAME --force-new-deployment
Verify: Check runner shows “Connected” in Settings → Runners, then test creating an environment.

Getting help

Use the support chat (bubble icon in bottom-right). Include:
  • Runner ID and version (from Settings → Runners... menu)
  • CloudFormation stack name and region
  • Runner logs from CloudWatch (ECS task logs)