Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.

Let’ see what each of these pieces gives us:

Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.

Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.

Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.

If you want a peek into how these pieces fit together, peep the slides.

View Larger (and Presenter Notes).

This is still a work-in-progress, so if you have any ideas or suggestions for more effectively tuning or scaling this system, hit me up! 🙂

How does your company handle high-volume time-series data?

Posted in Tutorials.

1 comment

By codyaray – May 27, 2014

One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Continuing the Discussion

Mongo Multi-Key Index Performance – Cody A. Ray linked to this post on November 7, 2014
[…] as its one of theÂ most well-known issuesÂ currently. A few months ago I gave a presentation on inÂ Building a Scalable Distributed Stats SystemÂ whichÂ describes a work-around for this issueÂ when there’s a small number […]

« Should I max out my 401(k) or pay down student loans? Custom JMeter Samplers and Config Elements »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

One Response

Continuing the Discussion

About Cody A. Ray

Recent Posts

Categories

Recent Comments

Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

One Response

Continuing the Discussion

Subscribe

About Cody A. Ray

Recent Posts

Categories

Recent Comments