A Few Tips and Tricks for AWS EFS for Small Storage Sizes

by | Jun 8, 2018


AWS EFS (Elastic File Storage) is AWS managed network store for EC2 servers (and on-prem too if you are that way inclined). Its a really handy tool for sharing configuration files, data or anything you want between clusters of EC2’s or just keep it around in case of a EC2 failure. It works well but can have some performance issues when used incorrectly. So here is some good things to know when using AWS EFS.

Understanding EFS Performance

EFS’s IO is based on two things, continuous throughput and burst throughput.
Burst throughput is based on a token system where any IO over your continuous throughput limit will start consuming tokens, and any time under that continuous limit will start to slowly regenerate tokens. Once you are out of tokens then you will be limited to your continuous throughput limit. Its also good to note that EFS pretends to have 2TB of data stored until you reach 2TB of throughput, when it will use the real values. This means that testing can be skewed for new drives.
Both of these limits are based on how much data you have stored in your drive, the more data the more performance you get allocated. This has a major impact on people with small data sizes but a large throughput required.

Helping Throughput for Small Data Sizes

So small data sizes wont produce a great result on EFS, so how to we help? Well the easist way to get more performance is to pad out your EFS drive with nonsense data. This is quite easy to do with ubuntu and bash:

dd if=/dev/urandom of=<DRIVE_DIRECTORY>/dummyFile0 bs=1024 count=1024*1024*1024

This command will grab 1GB of random data and stuff it into <DRIVE_DIRECTORY>/dummyFile0. The size of the file is count*bs bytes.

General Purpose vs Max I/O

When creating a EFS drive you get the option of General Purpose or Max I/O for your drive, but whats the difference.
Max I/O has the highest maximum throughput but each file operation has a higher latency.
General Purpose has low-latency access but has slower max performance than Max I/O.
If you have a small number of large files I recommend Max I/O, but for every other case General Purpose is the best fit. AWS also recommends you try General Purpose for every case and move to Max I/O if it becomes a issue.
So thats a few things on AWS EFS, hope they come in handy.
Until next time…
Tim Gray
Coffee to Code
 

Read Tim’s other blog posts about AWS and all the other cool tech we use here.


We run regular business intelligence courses in both Wellington and Auckland. Come along to learn more about how we do what we do.

2 Comments
  1. Lokesh Jawane

    Hi,
    We have around 520 GB data on EFS(small size large number of files), but whenever we are trying to get fiie count in directory on EFS or size of the directory, it is taking more than hour but mostly it hangs more than that as well.
    EFS conf is genral purpose performance mode with brusting throughput
    What could be the issue here?

    Reply
  2. Tim Gray

    Hi Lokesh
    As there is a overhead for each file copied your issue is probably the large number of files rather than the total size. You could try storing these files in an archive (.tar.gz or similar) or use another type of storage.
    Hope that helps
    Tim

    Reply
Submit a Comment

Your email address will not be published. Required fields are marked *