Size your AWS runner infrastructure before deployment. The two factors that determine your configuration are the number of environments you plan to run and how many will be active at the same time.
Runner sizes
The AWS runner CloudFormation template includes a RunnerSize parameter that controls the runner control plane infrastructure. Choose the size that matches your expected workload.
| Small | Large |
|---|
| Total environments | Up to 5,000 | 5,000+ |
| Concurrent running | Up to 300 | 300+ |
| Availability zones | 2 | 3 or more |
| EC2 subnet | /20 per AZ (4,096 IPs) | /16 per AZ using CGNAT range |
| LB subnet | /28 per AZ | /28 per AZ |
| Management plane connection | NAT gateway or VPC endpoint | VPC endpoint (PrivateLink) |
| Managed metrics | Recommended | Required |
| Runner scaling | Not needed | Recommended |
If you are unsure which size to start with, choose small. You can switch to large later by updating the RunnerSize CloudFormation parameter without redeploying the stack.
The RunnerSize parameter controls the runner control plane (orchestrator, proxy, cache). It does not affect the size of environment VMs. Environment VM sizing is configured through environment classes.
Small runner
Set RunnerSize to small in the CloudFormation template. This is the default.
The small configuration supports up to 5,000 total environments and 300 running at the same time. It is the right starting point for most deployments.
Infrastructure provisioned
| Component | Specification |
|---|
| Runner Fargate task | 1 vCPU, 3 GB memory |
| Proxy Fargate task | 0.5 vCPU, 1 GB memory |
| ECS host instance | c6i.large (2 vCPU, 4 GB) |
| Cache (MemoryDB) | db.t4g.small |
Network layout
Use 2 availability zones with one EC2 subnet and one load balancer subnet per AZ.
Stopped environments are stopped EC2 instances. Stopped instances retain their private IP address, so your subnets must be large enough to hold all environments, not only the ones that are running.
| Runner Name | Region | AZs | EC2 Subnet | LB Subnet | Environment Capacity |
|---|
| us-east | us-east-1 | 2 | /20 (4,096 IPs) | /28 (16 IPs) | ~8,187 |
Select your region based on recommended latency thresholds. If this works for you, proceed to setup.
Capacity formula: (Subnet IPs per AZ x Number of AZs) - ~5 management IPs
| EC2 Subnet | IPs per AZ | With 2 AZs | With 3 AZs |
|---|
| /21 | 2,048 | ~4,091 | ~6,139 |
| /20 | 4,096 | ~8,187 | ~12,283 |
| /19 | 8,192 | ~16,379 | ~24,571 |
Multi-region example
Each runner is deployed into a single AWS region and a single AWS account. To serve users in multiple regions, deploy one runner per region.
| Runner Name | Region | AZs | EC2 Subnet | LB Subnet | Environment Capacity |
|---|
| us-east | us-east-1 | 2 | /20 (4,096 IPs) | /28 (16 IPs) | ~8,187 |
| us-west | us-west-2 | 2 | /21 (2,048 IPs) | /28 (16 IPs) | ~4,091 |
| europe | eu-west-1 | 2 | /21 (2,048 IPs) | /28 (16 IPs) | ~4,091 |
Connectivity
The runner must reach the Ona management plane and several AWS services. For small deployments, a NAT gateway provides the simplest path. See Networking for all connectivity options.
For lower latency and to avoid NAT gateway data processing charges, you can connect the runner to the management plane over PrivateLink. This is optional for small runners but recommended if your subnets already use VPC endpoints for AWS services.
Large runner
Set RunnerSize to large in the CloudFormation template.
The large configuration is designed for deployments that exceed 5,000 total environments or 300 concurrent running environments. It provisions more CPU, memory, and cache capacity for the runner control plane, and supports horizontal scaling of the runner service.
Heavy agent workloads (many concurrent AI agent sessions) increase runner control plane load independently of environment count. If you observe high CPU utilization on the runner Fargate task with fewer than 5,000 environments, switch to large.
Infrastructure provisioned
| Component | Specification |
|---|
| Runner Fargate task | 4 vCPU, 16 GB memory per replica (2 to 5 replicas with autoscaling) |
| Proxy Fargate task | 2 vCPU, 4 GB memory |
| ECS host instance | c6i.2xlarge (8 vCPU, 16 GB) |
| Cache (MemoryDB) | db.t4g.medium |
To enable horizontal scaling, set EnableRunnerScaling to true in the CloudFormation template. The runner service starts with 2 replicas and scales up to 5 based on CPU and memory utilization.
Network layout
Use 3 or more availability zones. At this scale, spreading environments across more AZs is important for two reasons:
- Instance capacity. When many environments start at the same time, EC2
RunInstances calls concentrate per AZ. More AZs reduce the chance of hitting InsufficientInstanceCapacity in any single zone.
- Fault tolerance. Losing one AZ still leaves two or more zones operational, which matters when hundreds of environments are running.
For EC2 subnets, use a CGNAT range (100.64.0.0/10). EC2 subnets do not need to be routable because environments connect outbound through NAT or proxy. CGNAT provides a large address space without consuming your organization’s routable IP allocation.
| Runner Name | Region | AZs | EC2 Subnet | LB Subnet | Environment Capacity |
|---|
| production | eu-central-1 | 3 | /16 (65,536 IPs) per AZ using CGNAT | /28 (16 IPs) | ~196,603 |
A /16 per AZ is generous. Size the subnets based on your expected peak, with room for growth. The key point is that CGNAT ranges are free to use and expanding subnets after deployment is complex.
Management plane connection
For large runners, connect to the Ona management plane over PrivateLink instead of routing through the public internet.
Without PrivateLink, all runner-to-management-plane traffic traverses a NAT gateway. At high environment counts, this adds latency and incurs NAT gateway data processing charges ($0.045/GB). PrivateLink keeps this traffic within the AWS network.
If you use app.gitpod.io: Create a VPC endpoint to the Ona management plane service. See Networking: VPC endpoints for setup instructions. When private DNS is enabled on the endpoint, app.gitpod.io resolves to private IPs inside your VPC and the runner connects directly over PrivateLink.
If you use a custom domain: The custom domain page describes how to set up a VPC endpoint and load balancer for access to the management plane through your domain. The custom-domain path already goes through your Network Load Balancer. To also keep the runner’s API traffic private, use split-horizon DNS so the runner VPC resolves your custom domain to the private load-balancer endpoint instead of the public DNS record or external proxy path. See Route runner API traffic privately for the supported DNS patterns.
Managed metrics
Enable Ona managed metrics on large runners. Managed metrics give the Ona team visibility into runner health, which enables proactive detection of resource exhaustion, elevated error rates, and degraded performance. Without metrics, Ona cannot identify issues until you report them.
Reserved capacity
At this scale, consider EC2 Capacity Reservations or Savings Plans for your most-used environment instance types. Reservations guarantee instance availability in your AZs and reduce costs compared to on-demand pricing.
Planning steps
1. Select regions
Choose AWS regions with optimal latency for your users. Plan subnet sizes for each region before deploying.
2. Estimate environments per region
For each region, estimate the maximum number of environments including:
- Current users and expected growth
- Peak concurrent usage patterns
- Agent and automation workloads (each agent session runs in its own environment)
3. Choose availability zones
| Deployment size | Recommended AZs |
|---|
| Small (up to 5,000 environments) | 2 |
| Large (5,000+ environments) | 3 or more |
One EC2 subnet and one load balancer subnet are required per AZ.
4. Plan subnet sizes
EC2 subnets
Each environment uses one IP address. Stopped environments are stopped EC2 instances and retain their IP address, so plan for the total number of environments (running and stopped), not only the concurrent running count.
| Consideration | Details |
|---|
| IP per environment | 1 (retained while stopped) |
| Management overhead | ~5 IPs |
| Minimum size | /28 (10 environments) |
| Capacity formula | (Subnet IPs per AZ x Number of AZs) - ~5 management IPs |
EC2 subnets can use non-routable CIDR ranges. For large deployments, use CGNAT (100.64.0.0/10) to avoid IP exhaustion. Plan generously because expanding subnets after deployment is complex.
For public subnets, enable auto-assign public IP.
Load balancer subnets
For the Network Load Balancer:
- Must be routable from your internal network
/28 (16 IPs) is sufficient for all deployment sizes
- One subnet per AZ
- Does not affect environment capacity
Next steps