Skip to content

Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.

Let’ see what each of these pieces gives us:

  • Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
  • We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
  • Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
  • We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.

Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.

Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.

If you want a peek into how these pieces fit together, peep the slides.

View Larger (and Presenter Notes).

This is still a work-in-progress, so if you have any ideas or suggestions for more effectively tuning or scaling this system, hit me up! :)

How does your company handle high-volume time-series data?

Posted in Tutorials.

One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Continuing the Discussion

  1. Mongo Multi-Key Index Performance – Cody A. Ray linked to this post on November 7, 2014

    […] as its one of the most well-known issues currently. A few months ago I gave a presentation on in Building a Scalable Distributed Stats System which describes a work-around for this issue when there’s a small number […]

Some HTML is OK

or, reply to this post via trackback.


Log in here!