Databases maintenance

Scheduled Maintenance Report for Ifeelgoods

Postmortem

Database Maintenance, IOPS and RDS SSD burst quota

The Required Migration

The planned Database maintenance consisted of running an ALTER operation on a fairly large MySQL Table (20M+ records). We're attempting to switch the encoding to utf8mb4 to accommodate a richer set of characters for certain columns as required by some of our new customers.

The Planning

We use RDS for our database needs and a rough calculation showed that we were going to trigger around 5-6M Write IOs. Various benchmarks we run in the past and previous migrations show that our RDS database (m3.xlarge) can handle 2000-2500 Write IOPS for a Burst of around 50 mins. So we think we're safe at this point (I.e we can get the migration done during the planned window).

Getting it Done (or NOT...)

We launched the migration and we monitored the Write IOPS which rise quickly to 2500K Write IOPS. Our RDS instance is humming along and because the Table is locked all insertions in the table are rejected. This triggers our alerts in newrelic. After 1h, we realise that the total numbers of IOPS required to finish the migration is not going to fit within the burst quota. The Write IOPS crash from 2500K down to 500 IOPS and we're faced with a migration that's going to take an extra 1h30 mins - which means an extra 1h30 of failing all API calls resulting in an insert. This is clearly unacceptable given our SLAs. Decision to roll back the migration and tackle it differently.

Take aways

Make sure the migration fits in the Burst Quota if not using provisioned IOPS. The Burst quota is impacted by DB storage size and whether SSD is used or not the EBS volumes.
Experiment with tools like this one from Percona
use Provisioned IOPS now that some tables are this big

Posted Feb 26, 2015 - 18:47 CET

Completed

This maintenance has been cut short due to unforeseen performance issue that extended it beyond our planned window - please visit the postmortem page for more details.

Posted Feb 26, 2015 - 18:28 CET

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Posted Feb 26, 2015 - 17:00 CET

Scheduled

The maintenance has been partially completed and we will need to schedule another maintenance window to complete it

Posted Feb 26, 2015 - 11:34 CET

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Posted Feb 26, 2015 - 09:02 CET

Scheduled

We're going to perform a second maintenance on our databases.
A service interruption of 30 min is expected.

Posted Feb 20, 2015 - 12:24 CET