Skip to content

Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.

Let’ see what each of these pieces gives us:

  • Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
  • We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
  • Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
  • We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.

Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.

Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.

If you want a peek into how these pieces fit together, peep the slides.


Posted in Tutorials.

Should I max out my 401(k) or pay down student loans?

This is a common question new graduates ask. Although I graduated two years ago, I didn’t really run the numbers until recently… and boy am I disappointed in past-Cody for not doing this sooner.

The spreadsheet I used to answer this question for myself is below (but with fake numbers :-). Punch in your own numbers to see how much money you can save by increasing your 401(k) contributions. Now that I know better, I’m saving an extra $3,000 each year by maxing out my 401(k). How much can you save?


Posted in Tutorials.

Hacking Twitter Competitions: Automatically Tracking Followers Count

Just before Christmas, a Chicago Food Truck decided to give away free sandwiches for a year to their 1000th Twitter follower.

Cheesies_Truck Competition Tweet

You know I had to try.

After checking it a few times over a 15 minute period or so, I noticed that the follower count was increasing very slowly. I knew I wouldn’t have the diligence to continue checking, so I decided to write a script that would do the check and notify me every ten minutes or so. Since I’m on a Mac, I decided to use Growl for these notifications.

In this post, I’ll walk you through how to automatically check a Twitter user’s follower count and get a Growl notification periodically.

What You Need Continued…

Posted in Tutorials.

Add to Goodreads from Amazon

Tired of splitting your reading wish-list between Amazon and GoodReads? Me too. Here’s an “Add to GoodReads” bookmarklet. Just highlight the code and drag it to your bookmark bar. You might have to right-click->Edit to give it a title like “Add to GoodReads”. This should work from Amazon product detail pages where you would otherwise click “Add to Wish List”.

Instead of adding books to your Amazon Wish List, you can now add them to Goodreads instead. Yay!

Happy reading!

Posted in Tutorials.

Dependency Injection in Sinatra

Dependency injection (DI) is a very common development practice in many languages, but its never been huge in Ruby. Part of that is because Ruby is dynamic enough that it doesn’t really need dependency injection like, say, Java. But I argue that Ruby can greatly benefit from DI. Do you use a singleton configuration object? Or worse, other singleton objects, especially those with mutable state?

def some_method(*args)
  foo =

Mutable singletons have ripple effects across the app and make it very difficult (and scary) to evolve. Even mostly-read configuration objects introduce tight and often invisible/forgotten coupling between objects. Continued…

Posted in Tutorials.

Book Sprint Interview

A few days ago Dave Wendland at BrightTag interviewed me about the Book Sprint process. Although I didn’t mention it in the announcement, this is the process used to co-author Developing an iOS 7 Edge in a weekend.

What is a Book Sprint, and how did you find out about it?

“I didn’t know about Book Sprints until Troy Mott from Bleeding Edge Press explained the concept to me. It’s when Continued…

Posted in News.

First Book: Developing an iOS 7 Edge

Developing an iOS 7 EdgeIf you know any iOS developers looking to learn more about all the sweet new iOS 7 stuff,  I can personally recommend this book since I co-authored it :)

The summary goes

Many of the features added to iOS 6 were incremental updates over iOS 5. This is not the case for iOS 7. Apple’s release of iOS 7 brought substantial improvements for both applications and application developers. This book attempts to highlight the features that will be most widely applicable, including upgrading from iOS 6 to iOS 7, making apps more accessible, refreshing content in the background, using the new transition and physics-based animations, building on the new maps APIs, and enhancing your development workflow with the new build and testing improvements. The introduction of iOS 7 is set to change the way users think about native applications as well as how developers think about building them.

Don’t take my word for it. Checkout all the positive reviews its already getting on Twitter. Continued…

Posted in News.

Lesson Learned: Circuit Breakers

I just finished reading Release It! by Michael T. Nygard. Unfortunately, however, I didn’t learn about circuit breakers until the app featured in the “Intro to Streams” series (part 1, part 2) was complete. Let’s walk through the streaming example again and add a circuit breaker to protect the integration point. Continued…

Posted in Tutorials.

Greenfoot: Teaching Java to 5th Graders

Once upon a time, I would have said that its impossible to teach 5th graders to program in Java. Even the most basic hello world requires exposure to complex concepts: the print statement must be wrapped by a method with very specific modifiers and parameters, which is then wrapped in an class and compiled. Enter Greenfoot.

When helping to teach a class for the Northwestern CTD weekend program**, I was introduced to Greenfoot for teaching and learning Java. After my first day of class, I was so inspired by the educational possibilities of Greenfoot that I wrote a little Breakout clone to show the kids the next day what they could do with Greenfoot.

Rather than using the classic programming education sequence, from hello world to user input, string manipulation, file I/O, and so on, Greenfoot instructors Continued…

Posted in Education.

Cocoapod lint error: [xcodebuild] No such file or directory

I just started learning how to write my own Cocoapod yesterday. There’s a great tutorial to get you started. However, I ran into an issue when trying to lint my new spec. It looked like this:

$ pod spec lint MyAwesomeLibrary.podspec
 -> MyAwesomeLibrary (0.0.1)
    - ERROR | [iOS] [xcodebuild]  2013-05-29 23:03:08.370 xcodebuild[91499:3f03] error: Error Domain=NSPOSIXErrorDomain Code=2 "Non-zero exit code 255 returned from shell command: /usr/bin/gcc-4.2 -v -E -dM -arch armv7 -isysroot /Applications/ -x objective-c -c /dev/null 2>&1" UserInfo=0x4001c4e60 {NSLocalizedDescription=Non-zero exit code 255 returned from shell command: /usr/bin/gcc-4.2 -v -E -dM -arch armv7 -isysroot /Applications/ -x objective-c -c /dev/null 2>&1, NSLocalizedFailureReason=No such file or directory}
    - ERROR | [iOS] [xcodebuild]  2013-05-29 23:03:08.457 xcodebuild[91499:3f03] error: Error Domain=NSPOSIXErrorDomain Code=2 "Non-zero exit code 1 returned from shell command: /usr/bin/gcc-4.2 -v -E -dM -arch armv7s -isysroot /Applications/ -x objective-c -c /dev/null 2>&1" UserInfo=0x4018946a0 {NSLocalizedDescription=Non-zero exit code 1 returned from shell command: /usr/bin/gcc-4.2 -v -E -dM -arch armv7s -isysroot /Applications/ -x objective-c -c /dev/null 2>&1, NSLocalizedFailureReason=No such file or directory}
Analyzed 1 podspec.
[!] The spec did not pass validation.


Posted in Tutorials.

Log in here!