Nginx Split Proxy Cache Across Multiple Drives

One of the websites we run serves a massive amount of static content. However, each page request needs to query a MySQL database to retrieve the storage location of the files. To reduce the load on our servers, we implemented an Nginx reverse proxy with caching. As our site grew, we quickly outgrew the system we were hosting the proxy on. At its peak time of day, this site is now serving over 600 requests per second for small image files averaging 15KB each. This is generating a massive amount of disk IO to the point where the disk was pegged at 100% utilized all the time, and starting to impact performance.

Our solution was to replace the cache hard drives with solid state drives- a total of three 60GB SSDs. We wanted to use these to distribute the load but didn’t want to introduce any type of RAID or pooling overhead. We were surprised to learn that nginx provides a neat little way to load balance these cache drives.

First, we begun by declaring our cache directories within the http block

proxy_cache_path /mnt/sdb1 levels=1:2 keys_zone=cachedisk1:100m inactive=24h max_size=55g;
proxy_cache_path /mnt/sdc1 levels=1:2 keys_zone=cachedisk2:100m inactive=24h max_size=55g; 
proxy_cache_path /mnt/sdd1 levels=1:2 keys_zone=cachedisk3:100m inactive=24h max_size=55g;

Next, we used a directive called “split_clients”. Before we go any further, I would like to point out that you will need to download the latest version of nginx from nginx. The versions included in the CentOS 6 repositories do not support the “split_clients” function. It took me several hours of poking around with the configuration to figure this out.

Anyway, we will be using the request string “$request_uri” to make sure that each piece of content will always end up on the same disk. You can use any combination of variables here. Nginx will simply hash the result behind the scenes to determine which disk it should go on. We are still within the http block at this point.

split_clients $request_uri $cachedisk {
  33% "cachedisk1";
  33% "cachedisk2";
  34% "cachedisk3";

Lastly, in your server block, enable the proxy caching and tell it to use the new “$cachedisk” location we defined above.

proxy_pass    ; 
proxy_redirect		off;
proxy_set_header        Host $host;
proxy_set_header        X-Real-IP $remote_addr;
proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_cache            	$cachedisk;
proxy_cache_min_uses 	2;
proxy_cache_key		$request_uri;

It’s that simple. Restart and you should be good to go. Hopefully this helps save someone else the time it took me to research and figure out. This method is a lot more efficient than the common “symlink” method.

Leave a Reply