Thursday, January 13, 2011

No-Fuss Amazon CloudFront CDN for Your Rails Stack

A Content Delivery Network is a great way to ease the workload on your server and speed up page loads for your users. Amazon's CloudFront helps you do this by serving static assets (images, stylesheeets, javascripts, etc.) hosted in S3 buckets - you stick the files in the buckets and point your URLs to the proper CloudFront location, and they do the rest.

If you have a dynamic application where your assets change with each deploy or images are created on the fly, you need a way to keep things in sync with CloudFront. For Rails applications, there are a few tools out there for keeping your /public directory in sync with your S3 bucket but I found them all to lie somewhere between buggy and broken. There must be a better way!

O, Fortuna. Just as I was struggling with synching solutions, Paul Stamatiou posted an article on this very topic: Thoughts on Origin Pull, S3 and CloudFront. It turns out that CloudFront just recently added the ability for you to define a custom origin for the source of static assets instead of an S3 bucket. Yes! What? Here's how it works.

After setting up a CloudFront distribution and a CNAME record pointing to that distribution, You can reference an image on your page like this:

 <img alt="Stun Gun" src="http://assets0.bubbaganoush.com/images/stun_gun.png" />  

The first time this is called, CloudFront will see that it doesn't have this resource yet and go to your origin to retrieve the image. In the case of an S3 bucket, it would pull the image from the bucket with the key "images/stun_gun.png". In the case of a custom origin, however, it just routes the "/images/stun_gun.png" request through to your application, so the image is served as usual by the application and cached by CloudFront. The great part is that you didn't have to do anything special to tell CloudFront about this particular asset - It's a pull, not a push, which eliminates the synching issues.

After a few (relatively) painless setup steps, you can pretty much forget about it. Continue developing your application, deploy new assets and new versions of existing assets, and the plumbing will keep everything up to date.

Implementation Details

1) Create CloudFront distributions
There is not a function yet on the AWS Developer console to create a distribution with a custom origin - you can only point to S3 buckets. Once again, the internet comes to the rescue. Custom origin creation is only supported via the API, and I found instructions for using a Perl script to make the request in the article, Creating a Custom Origin Server for Amazon CloudFront. Follow the link for instructions; my request XML looks like a little something like this:

 <?xml version="1.0" encoding="UTF-8"?>  
  <DistributionConfig xmlns="http://cloudfront.amazonaws.com/doc/2010-11-01/">  
   <CustomOrigin>  
     <DNSName>www.bubbaganoush.com</DNSName>  
     <HTTPPort>80</HTTPPort>  
     <HTTPSPort>443</HTTPSPort>  
     <OriginProtocolPolicy>match-viewer</OriginProtocolPolicy>  
   </CustomOrigin>  
   <CallerReference>1294874303</CallerReference>  
   <CNAME>assets0.bubbaganoush.com</CNAME>  
   <Enabled>true</Enabled>  
  </DistributionConfig>   

Because some browsers have a limit for how many simultaneous requests can go to one domain (further reading here), we will actually create a total of four distributions. Run the perl script once for each, making sure to increment the CallerReference number and CNAME number. Lastly I created CNAME records for each of these distributions: assets0.bubbaganoush.com, assets1.bubbaganoush.com, assets2.bubbaganoush.com, and assets3.bubbaganoush.com.

2) Rails Configuration

Update:Carl pointed out that the asset_host directive will not work for HTTPS requests; for HTTPS requests we must point directly to a real hostname for which we have registered an SSL certificate. This has been corrected in the snippet below.


Now all we have to do is configure the Rails application to route asset requests to CloudFront via the config.action_controller.asset_host directive. In production.rb,

 config.action_controller.asset_host = Proc.new { |source, request|
  if request.ssl?
    "https://jekdi56jkdlkje787.cloudfront.net"  # you must have SSL cert for this domain!
  else
    "http://assets#{source.hash % 4}.bubbaganoush.com"  
  end
 }  

The #{source.hash % 4} component spreads the requests among the 4 servers that were created in step 1.

3) Extra credit - Asset Versioning
Once CloudFront retrieves an asset, it will store it for 24 hours. If you update a file but don't change the name, CloudFront won't know to check for an updated version. What to do? Rails normally handles this by appending a version number as a query parameter to the asset request, as in

 <img alt="Stun Gun" src="http://assets0.bubbaganoush.com/images/stun_gun.png?75783847" />  

Sounds great, except CloudFront ignores query parameters and therefore all version requests will look the same. No problem, we just have to make the version number a part of the URL, which we can do via the config.action_controller.asset_path directive in production.rb.

 config.action_controller.asset_path = proc { |asset_path|  
  "/rel-#{RELEASE_NUMBER}#{asset_path}"  
 }  

There's probably a better way to do this, but I calculate RELEASE_NUMBER from the release path by putting the following in environment.rb:

 RELEASE_NUMBER = Dir.pwd.gsub(/.*\//,'')  

Now the asset URLs will look like:

 <img alt="Stun Gun" src="http://assets0.bubbaganoush.com/rel-75783847/images/stun_gun.png" />  

And we get the same versioned effect. With each deploy, all static assets will have a new unique URL which will force CloudFront to get the latest version. But not so fast, you say, how will the app know what to do with that URL? Via an Apache RewriteRule, of course. Create a rule in your vhost conf to strip out the version part of the URL, since the Rails app won't need it to serve the latest version of the asset.

 RewriteRule ^/rel-\d+/(images|javascripts|stylesheets)/(.*)$ /$1/$2 [L]  

And that's it! We now have a no-fuss CDN layer on top of a Rails stack. It has been set, and now we can forget.

8 comments:

  1. Jeff, I think there's a bug here in that you can't use CNAMEs and HTTPS together in this way. For https you'd have to use the real distribution hostnames (xxx.cloudfront.net) in config.action_controller.asset_path.

    ReplyDelete
  2. Excellent post! Definitely the best way to use cloudfront with rails. One thing to watch out for, the cache option does not work with the javascript include tag helper. Rails tries to locate the javascript file under the rel-RELEASE_NUMBER directory which obviously does not exist.

    ReplyDelete
  3. @carl, thanks for pointing that out. I've updated the instructions to point to a real host for HTTPS requests.

    ReplyDelete
  4. @Karl, the method as I described is working for me, although I am using the asset_packager plugin instead of calling the javsascript_include_tag directly.

    I'm not sure that makes a difference though, as it's apache and not rails that retrieves the static javascript asset, and apache is taken care of via the RewriteRule.

    Did you have trouble getting this to work with javascript_include_tag?

    ReplyDelete
  5. Jeff thanks a bunch for writing this guide, it helped me a lot. Sidenote: for older Rails (like 2.2.2 that I use) there's no 'asset_path' property but '/rel-xxx' can be appended directly to asset_host property.

    ReplyDelete
  6. Curious if anyone has done this using Nginx instead of Apache? Quite possible that I'm missing something, but a 'rewrite' in Nginx does just that, it redirects and doesn't pass thru, so the URL gets rewritten and a 302 (or 301) is returned.

    This prevents cloudfront from actually working.

    I'm new to Nginx, so I'm still trying to find the workaround, hopefully one exists.

    Thanks.

    ReplyDelete
  7. Ok, I figured it out. For those of you using Nginx, you can do this and it will work.

    Add a new location and alias that does the same thing as the rewrite:


    location ~ ^/rel-\d+/(assets|images|javascripts|stylesheets)/(.*)$ {
    alias /path/to/your/site/public/$1/$2;
    }

    Note that the alias needs to be a FULL path to your site and the files, it is not relative.

    ReplyDelete
  8. using source.hash will only work as long as you deploy to only one server.

    if you have multiple production servers, don't use String.hash, as it will return different hashes on all your servers (more info here: http://stackoverflow.com/questions/6783811/why-is-ruby-string-hash-inconsistent-across-machines )

    you can use Digest::MD5.hexdigest(source).to_i(16) instead of source.hash - this will return the same digest across all you production servers

    ReplyDelete