Skip to content

AWS SSD volumes with standard IOPS ruined by past three days

April 22, 2015

Since Sunday, I am occasionally seeing a load of around 50/60 on one of my large instances. My prev post was also on the same. First I thought that DB connections is the problem, so increased the limit from default 150 to 1000!

yesterday evening I though the problem is solved after that and was happy.

But yet again, saw loads of around 90 late in the night. It was too late, and I was tired so decided to sleep on it.

Today morning, after doing my round of strength exercises, I came back to debug the problem. And the machines seem to be fine at the time, so I had to wait, and spent that time reading up on MySql performance and scalability.

One thing I have been worried about, right from the beginning is storing MySql data stores (ISAM in my case) on EBS.

But the general consensus has been on using EBS, going by various blogs and Server fault Q&As.

So today I checked (manually) listing some files on the EBS volume having MySql store, and it was slow.

Also one sql query on the same instance, was taking for ever to run. A search which normally happens in milli seconds, took around 30 minutes.

Suspicious, that EBS volumes could be the culprit, but not very sure, I still went and opened the one hour graph for the volume in question. And Lo and behold!

It actually was showing that average read latency was around 1000 msec!! (Figure below )

And the graph for write latency was like:

And the queue length (Note this parameter actually translates to load, as its related to iowait):

So looking at the graphs it was clear that the default provision of a 10GB volume was putting a ceiling at 30 IOPS.

And the queue length was 30/40. Which meant the need at crawl(!) loads, yes when Google crawler comes crawling would be at at something below 100 IOPS.

Now provisioning 100 IOPS for a month costs around 7 USD. And I had to do 200 IOPS (extra safety margin) so it would cost me 14 USD/month.

So did that. Baked that new volume in the image, and now after relaunching it seems to work well.

Please note that the small bluish worm on the right hand corner of the above images, is the new IOPS provisioned volume at 200 IOPS.

I am confident, it should fix this problem!

Noteworthy is the learning (not always without pain) when scaling things in the cloud.


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: