Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.
Let’ see what each of these pieces gives us:
- Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
- We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
- Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
- We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.
Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.
Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.
If you want a peek into how these pieces fit together, peep the slides.
Libertarians? Greens? Lock ’em Out!
The 2014 midterm elections seem to be bigger than prior years with more ads, robo-calls, and social media posts. During this turmoil, I learned a number of new things about the leading political parties that disgust me. At the top of this list was the Republicans’ and Democrats’ efforts at controlling access to the ballot.
The idea of “ballot access” control is that third parties will undercut votes from the “Big 2″ parties. Specifically, the belief is that a vote for the Libertarian party is likely one less vote for the Republicans; likewise, the Democrats could lose votes to Green candidates. So, the story goes, its in the best interest of the two predominant parties to restrict other parties from being present on the ballot at all.
Posted in Commentary.
No comments
By codyaray – November 4, 2014