This work is licensed under CC BY 4.0 - Read how use or adaptation requires attribution

Wrangling Runaway AWS RDS/SQL Costs

This anomaly investigation began after noticing an increase in AWS RDS cost. Upon review, a few things stood out: it was for a particular on-demand instance which was weird, because all of my production instances were covered with Reserved Instances (RIs).

This story is told originally as part of Amy Ashby’s FinOps X ‘23 talk.

This data was telling me that my production instances basically doubled, which is kind of odd. What I had found was that I couldn’t find them in the console anywhere. But yet, the number of hours that these things were running had doubled.

Upon further investigation, somebody took a snapshot of an RDS instance in production, restored it, and brought the tags along with it.

Having a data-focused discussion with engineering

Even though it’s named differently in the console, the tags themselves said it was production. So in this particular case, I found out what the issue was. With this data clearly identifying the cause of this anomaly, I can have a conversation with my engineering team. My questions for them include:

  • Do these instances need to be this size?
  • Can we downsize these?
  • Can we update this process to avoid costs like these?

Whether it was a manual process or it was automated, we weren’t sure at first. But with future conversation and work, we made sure that if they restore future snapshots, that the tags are updated appropriately.