Cloud cost management
- infrahead
- Aug 12, 2024
- 3 min read
Updated: Aug 13, 2024
Do you know that the cloud providers practice oversubscription? They sell you the hardware that they still need to provision. They do so because they know you will not use the resources you pay for. It may not be about you, but most of their clients don’t. That is where their earnings come from.
We can assume that most companies misuse clouds. Change management is essential for cost-saving and mature engineering, and unfortunately, it is not available in most companies. The first consequence of that is financial waste. A client of infraheads DevOps Agency was paying $1m monthly to Azure and had too many provisioned MSSQL servers, and every new DBA could find an unused server earning ~$7000 monthly during the onboarding. Some of their engineers thought the cost could be reduced by at least 50%.
At some point, they decided to assign tags to each team and made it mandatory to add the team tag for every resource, which was done for cost management. However, anyone could use an arbitrary value, basically any value. Most people were not abusing that. The policy was half done.
Inadequate resources may also be provisioned due to inaccurate technical calculations, which is normal. People can overcommit and monitor to adjust shortly. But visibility can be improved when everyone is free to do anything on the web console, and people can prioritize their deadlines. Thanks to the Broken window theory, we can predict the future. "Punishing" for minor sins has proven to be an efficient approach, but determining the "punishment" is always a complex problem. Whatever it is, people need to accept its fairness. Good luck with figuring that out. What we propose is proper public cloud management practices.
Here are the recommendations from Infraheads DevOps Agency:

Restrict write access to any resources to anyone:
This may sound like an aggressive step, but people will complain, and the most common argument will be, “What if we need to change something when prod is down?!” This is a classical case of the Allegory of the Cave. Don’t be afraid of that resistance.
Identify the roles and people who can modify different tiers of your cloud infrastructure and reflect that hierarchy in your Git configuration. If you use Terraform, it should be applied from the main branch. Beware of a gotcha there; the Terraform code can be “planable” but not “applicable.” Make sure to have a well-defined and trained validation and rollback process.
Use GitOps:
Make sure all the changes are reflected in Git. Tools like Atlantis can help you set up GitOps with Terraform for cloud management.
Fragment your IaC:
Terraform checks the resources in the state files during the plan. The longer the list of your resources in the plan, the longer it takes to plan. Remember, the engineers hate that. Waiting until something is being applied is the worst experience for your cloud engineers. It makes them super unproductive. Practice fragmented infrastructure and keep fragments simple and stupid.
Don’t use Terragrunt:
Terraform is not DRY by nature. Having a few repeating fragments of the code is fine for small infrastructures. The team behind Terraform is highly mature, so trust their decisions. If you want to use Terragrunt, you probably don’t understand the rationale behind the absence of the feature you need in Terraform. Gurntworks is also a team of brilliant engineers, but they look at the world through their prism of 300,000 infrastructure testing Go code. How many companies do you know that test their infra programmatically?
Be prepared:
Invest in your team’s knowledge and ensure it comes from hands-on practice. Periodically simulate downtimes and emergencies in the testing environment to prepare for the battlefield.
Summary:
Instead of paying for the tools that scan your clouds to find unused resources, apply simple practices of proper cloud management. You can benefit even more from doing that from the very beginning.
At Infrahead DevOps Agency, we have professionally certified AWS and Azure engineers who can fix your cloud management. Cost-cutting is just one of the benefits you’ll get.
Yorumlar