Friday, April 15, 2011

When Will Amazon Choose Our Own Adventure?


This might be the paranoid data geek in me, but the first thing that came to mind when learning about Amazon's WhisperSync capability was the incredible data collection opportunity it presents. As it's advertised, WhisperSync is helpful to the Kindle user because it keeps your books in sync across multiple reading devices. Start a book on your Kindle, read to page 23, and put it down. Later while in line at the supermarket, open the Kindle app on your phone and it syncs to your last-read location, starting you right there on page 23.


Did you see what just happened?


For this to work, that means that every time you turn a page, a timestamped data point is logged and sent to Amazon. Over time, this means they amass an incredibly detailed reading profile of every Kindle/WhisperSync user. This profile would, presumably, extend beyond merely which books you read; the data will show which books you do or don't finish, and the relative speed with with you finish particular books, chapters, and sentences. General reading habits are there for the scraping too - are you a nighttime reader? Self-help books at lunch? Teen fiction on cross-country flights? Just as web cookies and advanced click tracking mechanisms have tilted the scales of web UX development from art towards science, WhisperSync brings an exciting new dimension to book retail.


Of course, I don't presume to think that I'm the first person to realize this. Amazon has buildings full of ridiculously smart engineers and marketing types, and it's a safe bet that they're doing something more interesting with this data than simply reminding you what page you were on and then sending it to /dev/null. But since these hypothetical uses weren't detailed in the literature that came with my Kindle, I thought it would be fun to speculate on a few things that Amazon could do with a few billion page numbers and timestamps.


Product Suggestions


Amazon is already the king of this and they wouldn't necessarily advertise it if they were doing so, but the data on how vigorously you consume one book vs. another would go a long way towards recommending what you should read next.


Example: Carson Parsons is an avid reader of business books. Martin Smartypants has written a hot new book on management techniques, and Carson's purchase history combined with the popularity of the author would suggest that the book should go high on his list of recommendations. Should be a good match, right?


Not so fast - Carson's reading history recorded in Amazon's new (hypothetical) AwesoSuggest system indicates that his pages-per-minute takes a significant hit when the reading difficulty level goes too high. In fact, for his last "difficult" book he attempted, Carson had a spirited start before slowing to a chapter a week, having to re-read several sections, then down to a few pages a night, and finally abandoning it completely before reaching the halfway point. Amazon would do better to recommend something more appropriate to Carson's reading level, say, "21 Ways to Liven Up Your Letterhead."


Consumption Data


If for some reason Amazon couldn't get enough value out of this data on their own, imagine what a useful source of feedback it would be to authors and publishers to know the speed and manner in which readers get through their books.


Example: Mae Donahue authors a series of romance novels in which decent North American ladies fall inexplicably in love with alluring European men. For her newest novel, she tries something new, introducing her readers for the first time to the concept of unrequited love. Sales are sluggish and word of mouth is unfavorable, echoing the general sentiment of "it's just not as good as the last one." Worse yet, pre-sales for the next installment are way down. What can she do?


Enter Amazon's (hypothetical) ReaderHabits service. For a small fee, Ms. Donahue gets a rich breakdown of how her every book is consumed, down to the sentence. The report would show that among her normal readers, pages-per-minute was typical until page 37 when the lead character decides not to take the risk with the Spaniard whose English is "no so good", deciding instead to spend the next three weeks studying for her GRE. Reading pace slows to a trickle, with 1/3 of her readers giving up completely. Amazon can pinpoint exactly where things went wrong and how severely it turned readers off.


That, Mae Donahue, is quantitative feedback.


A/B Testing (AKA "Choose Our Own Adventure")


Here's where it gets interesting. It's one thing to measure the effects of the choices an author makes in telling a story, but how about testing story ideas on a real audience? Can you optimize a book?


Users unfamiliar with modern web product development might be surprised to learn that the green "Click Here" button they see on a website is actually a blue button for their neighbor who, by chance, saw a different iteration of a randomized trial. Websites optimize all the time by showing variations of a button, color scheme, page layout, messaging, etc., to different users until a large enough sample is taken to achieve significance and demonstrate a clear winner.


See where I'm going with this?


Example: Author Trent Dentley is struggling with the idea of killing off a major character in Chapter 4 of his adventure novel. The deadline approaching, he seeks guidance from his publisher. The publisher wisely suggests that Trent write three versions of the chapter and submit them to Amazon's new (hypothetical) OptimalNovel program. Early Kindle downloaders of the book will randomly see different versions of Chapter 4, and based on how they react to the plot development, (do they keep reading, do they speed up to see what happens next, do they stop altogether,) the most compelling version will be optimized and it will become the "official" version that all future readers get when they download the book.


The tested changes don't have to be this drastic, of course. The game breaks down when users realize that they are test subjects. A simpler example might be an author who is trying to create memorable characters. By testing different combinations of name, description, and backstory, the memorability of a character can be measured by watching how much readers have to flip back several pages or do a search to remind themselves who a particular character is when he reappears after an extended absence in the story.





Amazon, despite their massive library, Segway sales, and efficient packaging, is in the data business. Their established online presence gives them a stranglehold on the "where" and "how" of people purchasing consumable media, and their extended reach with WhisperSync (and more recently Cloud Player for music) now provides the feedback for what happens after consumers press that 1-Click.


How much did you enjoy your last book? Maybe you can't even put a finger on it, but the numbers won't lie. "Pageturner", once just an expression, is now quantifiable.


Follow me on Twitter for more speculative non-fiction paranoia: @wooswiff





Thursday, January 13, 2011

No-Fuss Amazon CloudFront CDN for Your Rails Stack

A Content Delivery Network is a great way to ease the workload on your server and speed up page loads for your users. Amazon's CloudFront helps you do this by serving static assets (images, stylesheeets, javascripts, etc.) hosted in S3 buckets - you stick the files in the buckets and point your URLs to the proper CloudFront location, and they do the rest.

If you have a dynamic application where your assets change with each deploy or images are created on the fly, you need a way to keep things in sync with CloudFront. For Rails applications, there are a few tools out there for keeping your /public directory in sync with your S3 bucket but I found them all to lie somewhere between buggy and broken. There must be a better way!

O, Fortuna. Just as I was struggling with synching solutions, Paul Stamatiou posted an article on this very topic: Thoughts on Origin Pull, S3 and CloudFront. It turns out that CloudFront just recently added the ability for you to define a custom origin for the source of static assets instead of an S3 bucket. Yes! What? Here's how it works.

After setting up a CloudFront distribution and a CNAME record pointing to that distribution, You can reference an image on your page like this:

 <img alt="Stun Gun" src="http://assets0.bubbaganoush.com/images/stun_gun.png" />  

The first time this is called, CloudFront will see that it doesn't have this resource yet and go to your origin to retrieve the image. In the case of an S3 bucket, it would pull the image from the bucket with the key "images/stun_gun.png". In the case of a custom origin, however, it just routes the "/images/stun_gun.png" request through to your application, so the image is served as usual by the application and cached by CloudFront. The great part is that you didn't have to do anything special to tell CloudFront about this particular asset - It's a pull, not a push, which eliminates the synching issues.

After a few (relatively) painless setup steps, you can pretty much forget about it. Continue developing your application, deploy new assets and new versions of existing assets, and the plumbing will keep everything up to date.

Implementation Details

1) Create CloudFront distributions
There is not a function yet on the AWS Developer console to create a distribution with a custom origin - you can only point to S3 buckets. Once again, the internet comes to the rescue. Custom origin creation is only supported via the API, and I found instructions for using a Perl script to make the request in the article, Creating a Custom Origin Server for Amazon CloudFront. Follow the link for instructions; my request XML looks like a little something like this:

 <?xml version="1.0" encoding="UTF-8"?>  
  <DistributionConfig xmlns="http://cloudfront.amazonaws.com/doc/2010-11-01/">  
   <CustomOrigin>  
     <DNSName>www.bubbaganoush.com</DNSName>  
     <HTTPPort>80</HTTPPort>  
     <HTTPSPort>443</HTTPSPort>  
     <OriginProtocolPolicy>match-viewer</OriginProtocolPolicy>  
   </CustomOrigin>  
   <CallerReference>1294874303</CallerReference>  
   <CNAME>assets0.bubbaganoush.com</CNAME>  
   <Enabled>true</Enabled>  
  </DistributionConfig>   

Because some browsers have a limit for how many simultaneous requests can go to one domain (further reading here), we will actually create a total of four distributions. Run the perl script once for each, making sure to increment the CallerReference number and CNAME number. Lastly I created CNAME records for each of these distributions: assets0.bubbaganoush.com, assets1.bubbaganoush.com, assets2.bubbaganoush.com, and assets3.bubbaganoush.com.

2) Rails Configuration

Update:Carl pointed out that the asset_host directive will not work for HTTPS requests; for HTTPS requests we must point directly to a real hostname for which we have registered an SSL certificate. This has been corrected in the snippet below.


Now all we have to do is configure the Rails application to route asset requests to CloudFront via the config.action_controller.asset_host directive. In production.rb,

 config.action_controller.asset_host = Proc.new { |source, request|
  if request.ssl?
    "https://jekdi56jkdlkje787.cloudfront.net"  # you must have SSL cert for this domain!
  else
    "http://assets#{source.hash % 4}.bubbaganoush.com"  
  end
 }  

The #{source.hash % 4} component spreads the requests among the 4 servers that were created in step 1.

3) Extra credit - Asset Versioning
Once CloudFront retrieves an asset, it will store it for 24 hours. If you update a file but don't change the name, CloudFront won't know to check for an updated version. What to do? Rails normally handles this by appending a version number as a query parameter to the asset request, as in

 <img alt="Stun Gun" src="http://assets0.bubbaganoush.com/images/stun_gun.png?75783847" />  

Sounds great, except CloudFront ignores query parameters and therefore all version requests will look the same. No problem, we just have to make the version number a part of the URL, which we can do via the config.action_controller.asset_path directive in production.rb.

 config.action_controller.asset_path = proc { |asset_path|  
  "/rel-#{RELEASE_NUMBER}#{asset_path}"  
 }  

There's probably a better way to do this, but I calculate RELEASE_NUMBER from the release path by putting the following in environment.rb:

 RELEASE_NUMBER = Dir.pwd.gsub(/.*\//,'')  

Now the asset URLs will look like:

 <img alt="Stun Gun" src="http://assets0.bubbaganoush.com/rel-75783847/images/stun_gun.png" />  

And we get the same versioned effect. With each deploy, all static assets will have a new unique URL which will force CloudFront to get the latest version. But not so fast, you say, how will the app know what to do with that URL? Via an Apache RewriteRule, of course. Create a rule in your vhost conf to strip out the version part of the URL, since the Rails app won't need it to serve the latest version of the asset.

 RewriteRule ^/rel-\d+/(images|javascripts|stylesheets)/(.*)$ /$1/$2 [L]  

And that's it! We now have a no-fuss CDN layer on top of a Rails stack. It has been set, and now we can forget.

Sunday, June 27, 2010

Fragconteur QOTD Shows Added: Simpsons and Seinfeld

Fragconteur Update.

Maybe it wasn't wise to start with what some have called "The most underrated show of all time".  The Wire is certainly a phenomenal achievement, but hardly anybody watched the damned thing, so that kind of limits the audience for a twitter mishmash.

So how about a couple of shows that people have seen:  The Simpsons and Seinfeld





 




I've staggered the posting times for each show, so if you sign up for all three you'll get a little sprinkle of each throughout the day.  There are so many quotes from The Simpsons that one will go out in the morning and a second in the evening.  Enjoy.

I Must be an Adult

Times sure have changed.  I walked by this treasure the other day and didn't even blink.



Used to be the only thought that would cross through my mind would be "Can I carry these both home on my back, or should I make two trips?"  Now it's "Ooh hey, I should take a picture of this so I can blog about it."

Impressive Early Growth for Fragconteur

The entire Fragconteur team is pleased to announce an impressive stat from our early days:  Our membership has already doubled after just one month in existence, going from 1 user to 2.  Take a look at this chart here:


So how long until we catch up to the big boys?  Quantcast estimates Google's monthly traffic at 159.4 Million users.  That means that Fragconteur need to double in size only 26 more times.  At our current growth rate, that should happen in approximately August of 2012.  Scared, Schmidt?

Stay tuned for an upcoming release announcement for Fragconteur.  I can tell you that both users are very excited for these new features.

The host, she has a-moved



It's time again to update those links.  This blog was hosted at one place, now it's hosted at another place.  I'll change the domain name over a little later today, but for anyone who subscribed to the rss feed you will need to re-rss.

The Wire Quote-of-the-Day, powered by Fragconteur

Twitter Timeline

Introducing: The Wire (Fragmented) Quote of the Day

Just about anyone who has seen HBO's The Wire will tell you that it's the best TV show ever made. I heard this claim a few times before ever watching it and thought it an absurd thing to say. Everyone was right, of course, and I now count myself among the Wire evangelists. I'll save my glowing reviews for another post; this is about an tool that needed to exist and may be of interest to other fans out there.

The Wire is eminently quotable, and it occurred to me that I would enjoy a daily quote from the show in my inbox or Twitter feed. Apart from a couple of Youtube compilations and a Wikiquote page, there was nothing available to give me what I wanted, so I explored the idea of writing something to post quotes on Twitter. This would something far more advanced than any quote-of-the-day application - Instead of attribution via byline, the lines needed to be "spoken" by the characters, which would mean tweets appearing to come from separate sources. Furthermore, most of the quotes would be a couple of lines back and forth between characters, which meant building something to take in lines of dialogue as input and post them in a pre-determined order. Lastly a scripting interface would be needed to write these out ahead of time and let the tool pick a random quote every day to post.

Enter Fragconteur. Fragconteur is a new storytelling tool where dialogue can be drafted and scheduled by a storyteller to be spoken across the internet in the form of Twitter updates, Facebook posts, blog entries, etc.. The first use is to facilitate my Wire obsession, but in the hands of a talented craftsman, the storytelling possibilities are endless.

Here's how it works:

  • Step 1: Create a story, characters, and dialogue.
    Story Tool Admin

  • Step 2: "Schedule" the delivery of lines. Choose Where (Twitter, Facebook, etc.) and When each line should be spoken.

  • Step 3: Watch the dialogue unfold across the internet!
    Twitter screenshot

So give a follow to the Twitter list to get your daily Wire fix. Or if you're interested in writing your own stories, send me an email to get an account.

Fragconteur | Fragment the Conversation.