Databases maintenance
Scheduled Maintenance Report for Ifeelgoods
Postmortem

Database Maintenance, IOPS and RDS SSD burst quota

The Required Migration

The planned Database maintenance consisted of running an ALTER operation on a fairly large MySQL Table (20M+ records). We're attempting to switch the encoding to utf8mb4 to accommodate a richer set of characters for certain columns as required by some of our new customers.

The Planning

We use RDS for our database needs and a rough calculation showed that we were going to trigger around 5-6M Write IOs. Various benchmarks we run in the past and previous migrations show that our RDS database (m3.xlarge) can handle 2000-2500 Write IOPS for a Burst of around 50 mins. So we think we're safe at this point (I.e we can get the migration done during the planned window).

Getting it Done (or NOT...)

We launched the migration and we monitored the Write IOPS which rise quickly to 2500K Write IOPS. Our RDS instance is humming along and because the Table is locked all insertions in the table are rejected. This triggers our alerts in newrelic. After 1h, we realise that the total numbers of IOPS required to finish the migration is not going to fit within the burst quota. The Write IOPS crash from 2500K down to 500 IOPS and we're faced with a migration that's going to take an extra 1h30 mins - which means an extra 1h30 of failing all API calls resulting in an insert. This is clearly unacceptable given our SLAs. Decision to roll back the migration and tackle it differently.

Take aways

  • Make sure the migration fits in the Burst Quota if not using provisioned IOPS. The Burst quota is impacted by DB storage size and whether SSD is used or not the EBS volumes.
  • Experiment with tools like this one from Percona
  • use Provisioned IOPS now that some tables are this big
Posted Feb 26, 2015 - 18:47 CET

Completed
This maintenance has been cut short due to unforeseen performance issue that extended it beyond our planned window - please visit the postmortem page for more details.
Posted Feb 26, 2015 - 18:28 CET
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Feb 26, 2015 - 17:00 CET
Scheduled
The maintenance has been partially completed and we will need to schedule another maintenance window to complete it
Posted Feb 26, 2015 - 11:34 CET
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Feb 26, 2015 - 09:02 CET
Scheduled
We're going to perform a second maintenance on our databases.
A service interruption of 30 min is expected.
Posted Feb 20, 2015 - 12:24 CET