How to manage the connections between your VPC and on-premises networks over a private connection using AWS Direct Connect and Transit Gateway

Pierre RAFFA
9 min readMar 30, 2024

Direct Connect is a method of connecting to cloud services such as Amazon Web Services (AWS) directly, bypassing the public internet.
It provides a dedicated network connection between the customer’s network and the cloud service provider’s data center, offering increased security, reliability, and potentially improved performance compared to accessing the services over the internet.

Direct Connect (DX) implies a connection where one party initiates the connection directly to another party. However, who initiates the connection can vary depending on the specific technology or service involved or an agreement with the other party, resulting in a different architecture to route the traffic.

In our case, we initiate the connection from AWS to the other party’s data center. In the case of the other parties to initiate the connection and to provide high availability, the architecture would require load balancers to balance the traffic across multiple instances.

high-level overview of how AWS Direct Connect interfaces with on-premise facilities (source: AWS)

Direct Connect Locations

From AWS Documentation, to use AWS Direct Connect in an AWS Direct Connect location, the on-premises network must meet one of the following conditions:

  • Your network is colocated with an existing AWS Direct Connect location. For more information about available AWS Direct Connect locations, see AWS Direct Connect Product Details.
  • You are working with an AWS Direct Connect partner who is a member of the AWS Partner Network (APN). For information, see APN Partners Supporting AWS Direct Connect.
  • You are working with an independent service provider to connect to AWS Direct Connect.

When using DX to connect your on-premises locations to AWS, BGP is a requirement.

What is Border Gateway Protocol (BGP)?

It’s a routing protocol that is used to exchange routing information between different autonomous systems (ASes) on the internet, each identified by a unique AS number.
AS numbers, or ASNs, are unique 16-bit numbers between 1 and 65534 or 32-bit numbers between 131072 and 4294967294.
On AWS, the private ASN must be in the 64512 to 65535 range.

Border Gateway Protocol (BGP) selects a path based on several criteria, primarily aiming to find the “best” path to reach a particular destination.

BGP is the protocol used by internet service providers (ISPs) and large networks to route traffic between each other.

Example of BGP Visualization

Border Gateway Protocol (BGP) works using a mechanism called peering. Establishing a connectivity between AWS and on-premises facilities is achievable with BGP Peering

BGP Peering

In AWS, BGP peering consists of the exchange of routing information between customer networks and AWS services such as Direct Connect (or VPN connections).

BGP peering parameters in AWS typically include:

  • BGP Autonomous System Number (ASN),
  • BGP neighbour IP addresses (ip ranges to advertise)
  • BGP authentication settings. (required in AWS)

These parameters are configured to establish a secure and reliable BGP session, enabling efficient routing and communication between the customer’s network and AWS resources.

AWS Direct Connect connections

From AWS Documentation, AWS Direct Connect enables you to establish a dedicated network connection between your network and one of the AWS Direct Connect locations.

There are two types of connections:

  • Dedicated Connection: A physical Ethernet connection associated with a single customer. Customers can request a dedicated connection through the AWS Direct Connect console, the CLI, or the API. For more information, see Dedicated connections.
    The port speed values are 1 Gbps, 10 Gbps, and 100 Gbps.
  • Hosted Connection: A physical Ethernet connection that an AWS Direct Connect Partner provisions on behalf of a customer. Customers request a hosted connection by contacting a partner in the AWS Direct Connect Partner Program, who provisions the connection. For more information, see Hosted connections.
    The port speed values are 50 Mbps, 100 Mbps, 200 Mbps, 300 Mbps, 400 Mbps, 500 Mbps, 1 Gbps, 2 Gbps, 5 Gbps, and 10 Gbps.

Infrastructure Architecture

Our current architecture consists of an EKS cluster with nodes/pods distributed across multiple Availability Zones, provisioned in us-east-1.
The CIDR block range for AWS subnet is 10.59.0.0/16.
Each pod will be able to initiate the connection with the other party via Direct Connect.
ℹ️ The connection needs to be entirely private from end to end.

The parties can come with some constraints:

  • The private AS Number might be already used in the other parties.
  • The ips to advertise through the BGP must not already be used to avoid routing issues.

The EC2 instances provisioned for the EKS cluster will be considered as the source when routing the traffic to Direct Connect and their ips will be in the range 10.59.0.0/16. Then that’s the range we would need to advertise in AWS.

⚠️ But this range can overlap an existing range in the on-premises parties’ infrastructure. In such case, they might suggest another range.
And obviously, we won’t recommission our whole infrastructure to match their ip range, especially when each on-premises party has their own constraints.

In order to overcome this constraint, we need to perform an ip translation by provisioning intermediate private NAT gateways which are in the suggested range.
The NAT Gateways will then be considered as the new source when routing the traffic to Direct Connect and that’s the range we would need to advertise in AWS!

⚠️ The new range MUST be added to the VPC as an Additional VPC CIDR block, but bear in mind that in AWS, there are some restrictions on additional VPC CIDR Blocks:
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-cidr-blocks.html#add-cidr-block-restrictions

In our case, the VPC CIDR block is within the range 10.0.0.0/8 and these permitted/restricted associations will apply:

Permitted/restricted Assocations when VPC CIDR block is within 10.0.0.0/8

Considering the AWS restrictions above, we finally agreed with one party to advertise 100.100.0.0/24, and with the other party 10.248.136.138/26.

1./ Solution using a Virtual Private Gateway (VGW) and Private VIF:

Explanation about the routing:

  1. The pods in the EC2 instances initiate the connection when making a request to one of the ips advertised by the other parties (ip1, ip2 etc… from the diagram above)
  2. The route table associated to the private app subnets (blue in the architecture design) is set up to route the traffic to the NAT sitting in the AZ, either nat-1 or nat-2 depending on the ip used for the request.
  3. The route table associated to the NAT subnets (purple in the architecture design) is set up to route the traffic to the VGW.
  4. The VGW forwards the traffic to the Direct Connect Gateway (DGW) which advertises 100.100.0.0/24 and 10.248.136.138/26.
  5. The DGW routes the traffic to the correct private VIF depending on the ip used for the request.

⚠️ This solution works pretty well but comes with some limitations:

  • Only 1 VGW can be attached to a VPC.
  • VGW can be associated to only 1 DXG.
  • With VGW, we can only advertise VPC CIDRs
  • Only 1 DXG is possible when using VGW, then only 1 ASN is possible (ASN can be a requirement from some parties)

2./ Solution using a Transit Gateway (TGW) and Transit VIF:

A transit gateway is a network transit hub in cloud computing that enables centralized management and routing of traffic between virtual private clouds (VPCs), VPNs, and on-premises networks within a cloud provider’s environment.

It simplifies network architecture by acting as a hub for connecting multiple VPCs and VPNs, allowing for scalable and efficient communication between resources.

In this architecture, the TGW will act as a hub for connecting the VPC and the 2 Direct Connect Gateways.
This solution provides more flexibility and is more suitable for a long-term solution as we can finally have multiple Direct Connect Gateway and multiple AS Numbers 🎉

The Transit Gateway will be nothing without its Route Tables, then let’s talk about their configuration to make the routing decisions.

In the architecture above, there are 3 Transit Gateway Route tables (magenta in the architecture design):

  1. Route table associated to the VPC attachment with a DGW (partner1 and partner2) propagations.
    => the traffic coming from the VPC will be routed to the DXGs
  2. Route table associated to the DGW attachment (partner1) with a VPC propagation.
    => the response coming from the DXG (partner1) will be routed back to the VPC
  3. Route table associated to the DGW attachment (partner2) with a VPC propagation.
    => the response coming from the DXG (partner2) will be routed back to the VPC

Note that the TGW does not use any default Route Table

And now the propagations because that’s a very important point here!

When dynamic routing is used with a VPN attachment or a Direct Connect gateway attachment or VPC Attachment, you can propagate the routes learned from the on-premises router through BGP to any of the transit gateway route tables and this is really great as the TGW route tables will be automatically updated when new ips are advertised!

In other words, the 3 TGW Route tables you see in the infrastructure above are automatically populated:

  • by the VPC propagation with all its CIDR blocks
  • by the Direct Connect Gateway propagation with the ip ranges advertised by the on-premises parties.

That’s great but how to set up the TGW Route tables so that the routes are automatically propagated?

Here is an example with the TGW Route table associated to the VPC attachment in order to get the Direct Connect routes automatically propagated.

1./ Create a new TGW Route table attached to the TGW

2./ Associate the Route table to the VPC attachment:

TGW Route table associated to the VPC to route the traffic to the Direct Connect Gateway (partner2)

3./ Create a propagation by selecting the Direct Connect Gateway attachment

Same TGW Route table with Direct Connect Gateway Propagation

4./ Take a look on the routes automatically propagated (as soon as the transit VIF is up)

Same TGW Route table with the route propagated automatically (ip range advertised by the on-premises party)

Same result applies for the TGW Route Table associated to the Direct Connect Gateway Attachment with the VPC propagation.

More info about the Transit Gateway and its Route Tables: https://docs.aws.amazon.com/vpc/latest/tgw/how-transit-gateways-work.html

Explanation about the routing:

  1. The pods in the EC2 instances initiate the connection when making a request to one of the ips advertised by the other parties (ip1, ip2 etc… from the diagram above)
  • 10.248.129.64/26 is the additional VPC CIDR block we advertise for the partner2
  1. The route table associated to the private app subnets (blue) is set up to route the traffic to the NAT sitting in the same AZ, either nat-1 or nat-2 depending on the ip used for the request.
  2. The route table associated to the NAT subnets (purple) is set up to route the traffic to the TGW.
  3. As per the TGW Route table associated to the VPC, the TGW forwards the traffic to the correct Direct Connect Gateway (DGW) and the traffic is finally routed to the on-premises party via the Transit VIF.
  4. The on-premises party processes the requests and returns the response back via the transit VIF over the DXG
  5. The DXG forwards the traffic back to the TGW.
  6. As per the TGW Route table associated to the DGW, the TGW forwards the response back to the VPC and more precisely to the NAT Gateway
  7. The NAT Gateway forwards the response back to the EC2 instance where the pod which initiated the connection sits.

I hope this will help you to build a private connectivity between your infrastructure in AWS and on-premises facilities.
Next time, I will cover Direct Connect for High Resiliency which is highly recommended for critical workloads.

--

--