In our last post we pulled back the curtain on some of the sneakiest hidden costs AWS has to offer. This time we’re doing the same for data ingress.
There’s no way to succeed in a vacuum in today’s tech climate. Whether you’re cornering a niche or trying for a larger slice of the market, you need to be constantly importing, analyzing, and acting on data in order to get the edge on your competitors and provide the best service you can.
But what happens when a seemingly random section of the data you import into your network incurs extra charges that you hadn’t budgeted for?
Don’t get caught in that situation - let us teach you all about data ingress to give you the tools you need to succeed with AWS. In this post we’ll cover:
- What is data ingress?
- Data ingress vs data egress
- How much internal AWS data ingress costs
- How to reduce data ingress charges
- How to manage data ingress
Let’s get started.
What is data ingress?
Data ingress is the name for when data enters a network from another network. It doesn’t matter what kind of information is entering the network - the act of it entering at all is data ingress.
It’s the opposite of data egress, which is when data leaves a network for another location. We’ll cover this more in the next section of this post.
Much like data egress, it doesn’t matter what kind of network the data is being inputted or imported into. Whether you’re copying data onto a USB stick, receiving a regular email from a mailing list, downloading a file from a client’s Google Drive, or importing the latest month’s marketing data into your data warehouse, all of this and more is still classed as data ingress.
Got it? Great, because it’s about to get a lot more confusing.
Data ingress vs data egress
The difference between data ingress and data egress should be self-explanatory, right? Ingress equals data going in, egress is data going out. We get it.
Except data ingress means that wherever you’re getting the data from is also experiencing data egress. You can’t have one without the other, so you need to understand both.
Think of it this way; imagine that the two networks that are passing data between them are two separate bubbles. When data travels from one to the other a line is going out of one bubble and into the other. Data egress is when that line leaves the soapy barrier of the first bubble, and data ingress is when the line passes through the border of the other bubble.
They’re both part of the same process.
Let’s run through a few examples to show what we mean.
First off, let’s say that you have a SaaS app which runs on EC2 instances and utilizes S3 for its storage needs. To keep this simple, we’ll also say that all of these are within the same AWS Region and Availability Zone, and that they’re also within the same AWS account.
With this setup you’re not creating any artificial data boundaries within your SaaS app - it’s an entirely closed network, so data ingress (and egress) don’t occur on your end.
The same can’t be said for when your clients use your app.
In order for your app to be used by clients, information needs to be sent from your closed network to the wider internet, as they need to access images, information, and so on. In other words, your client experiences data ingress due to importing all of this from your network. You thus experience data egress.
It’s a two-way street - you can’t have data ingress without data egress.
Now imagine that your app has been successful and you’ve had to get a more complicated setup in order to keep growing and meet demand.
You’re still running your app via EC2 instances and S3 storage buckets, but you’ve expanded your operation a little. You’ve got virtual private clouds (VPCs) to meet growing client security requirements, and for some extra insurance against disasters by distributing your workloads across several availability zones.
Let’s also say that you’ve boosted your global response times by replicating your clusters over several AWS Regions, and that you can easily analyze all of your costs because you’ve spread individual features to their own associated AWS account. You’re cruising as far as customer service, app responsiveness and added security goes.
However, your costs have gone through the roof due to data transfer.
First off, whenever your data passes through Availability Zones you incur a data ingress charge (one of the few times that ingress is charged in AWS). Every last improvement you’ve made also creates a new closed-barrier network within your overall AWS configuration. This not only means that data ingress is happening to import information across these barriers, but that data egress and the associated charges are also being incurred.
We’ll do one more example, and this time we’ll make it a little more complicated to really drive the idea home.
Take the same SaaS app from the previous example. You’ve expanded your operations for security and availability and you’re consistently paying the same price for your data ingress and egress charges.
Now picture that you want to add more functionality to your app, but your team is wary of changing (and potentially breaking) its core architecture. That’s why you start sending the data from all of your app’s regional clusters to a single workload in AWS Lambda for processing.
Here’s where the complexities of how data is transferred between AWS services come in.
First, your data needs to be entered into AWS Lambda from each of your EC2 VPCs. Once inside Lambda the workload executes and the results are returned to where the data originally came from.
This leaves us with two examples of data ingress. The first is when your data enters AWS Lambda to be used in the workload, and the second is when the results are returned to the source of the original data. Neither of these incur charges but, once again, the simultaneous data egress does.
When data ingress happens to put data into Lambda, egress is occurring to export the data from your EC2 VPCs. Egress also happens to extract the results from Lambda before the ingress of those results returning to the data sources.
You can’t get away from it - any discussion of data ingress must include consideration of the necessary data egress for it to occur. Neither can exist without the other.
How much internal AWS data ingress costs
The first thing to bear in mind when it comes to data ingress charges in AWS is that there is only one instance where data ingress itself incurs a cost. Whenever you transfer data across Availability Zones or VPC connections you’ll incur a $0.01 per GB ingress cost, and an identical egress cost.
However, as we’ve already stated, it would be foolish to therefore think that those are the only costs associated with data ingress.
Remember that data ingress can’t occur without data egress of some kind. This is where the hidden costs really start to rack up, as while one side (ingress) isn’t charged, the other very often is.
For a full rundown of the data egress prices your data ingress will also incur, check out our post on the topic below:
To summarize, the first 100 GB of data per month transferred out from EC2 instances to the public internet are free, the next 150 TB per month cost between $0.0900 per GB and $0.0700 per GB (reducing over three pricing bands as you transfer more), and anything beyond that costs $0.0500 per GB.
There are a few extra charges for things like data egress from S3 Multi-Region Access Points, internet acceleration for those points, and for S3 Transfer Acceleration, and there is a $0.02 per GB charge for egress from EC2 instances to Amazon GovCloud.
For example, let’s say that you’re only interested in data ingress into your AWS accounts, so there will naturally be no egress charges to get data out onto the public internet. This means that you’ll only have to worry about the ingress and egress charges for importing data between Availability Zones and VPCs, and any egress involving those extra S3-related charges above or transferring data from EC2 instances to Amazon GovCloud.
Finally, remember that data ingress doesn’t just mean that you could incur extra egress charges depending on where the data comes from. It also means that the cost for your AWS accounts may go up in general.
Think about it; if you’re importing data (data ingress) to your accounts without deleting other data, that means that there’s more data being held or processed by any service you use. Thus, depending on their pricing model, you could end up with a higher bill due to increased use of their main pricing point.
How to reduce data ingress charges
Due to the charges for data ingress being so closely linked to also incurring data egress charges, the tips for reducing your costs are much the same as for egress:
- Limit data transfer where possible
- Limit VPCs, AZs, and AWS Regions
- Look for the cheapest regions and/or zones
- Monitor and purge data you don’t need
Limiting data transfer will allow you to avoid data transfer costs in general, be they egress or ingress. If you don’t perform the action you won’t be charged for doing it, so this method cuts off the source of your charges at the root. While it isn’t always practical to do this, you could at least try to make sure that any data traveling across Availability Zones (AZs) is absolutely necessary.
Reducing the number of VPCs, AZs, and AWS Regions is similarly simple, but could be more difficult to do if your architecture is particularly complex. Scaling back your operations like this will incur fewer charges for data ingress and egress across networks, but you should always weigh the potential savings against the impact it will have on your availability and performance.
A compromise of limiting the number of networks (VPCs, etc) you’re working with while maintaining higher performance would be to move your operations to cheaper regions and/or zones. While Amazon charges the same amount for data ingress no matter what, data egress charges can massively depend on the regions you’re dealing with, so there are potential savings to be made by moving to a cheaper region.
Finally, you can monitor and purge the data which you don’t need. As stated in the previous section, data ingress can cause your AWS accounts to get bloated with new or extra information which, in turn, causes the general price of your operations to increase. Trimming things down to only contain the important or necessary data can avoid this price hike by using fewer resources.
How to manage data ingress
Data ingress is slightly easier to manage than data egress, as there are fewer charges related to it and far less of it occurring. For example, data egress happens all the time when your clients are trying to load information from your app or website, but ingress to your networks happens in much lower quantities unless you’re actively moving data around.
However, the only way to get it under control is to know exactly how much data ingress is occurring, where and when it’s happening, and how much you’re getting charged for it. That’s why you need to be effectively managing your AWS bill overall.
AWS Billing console and AWS Cost Explorer are a great start to see your bill and some of the related details, but if you really want to get things under control you should also use Aimably’s AWS Spend Transparency Software. This will let you see all of the details that AWS’ native tools simply don’t let you dive into. You can also use cost allocation tags within the AWS Cost Explorer to isolate your data ingress costs specifically - it’s a great way to see where your costs are coming from at a glance.
However, in trying to manage your data ingress more thoughtfully and effectively, don’t forget that you’re not just trying to reduce costs. It’s vital that any cost-cutting reductions also consider the impact that doing so will have on your app and services, not just for clients but your own team as well.
There’s no silver bullet for this. That’s why it’s time to get started yourself and experiment to see what works!