Skip to content


Simplify Deployment with Infrastructure Manifest (Part 1)

This is Part 1 in a short series about using a Manifest of your infrastructure for automation.

  • Part 1: Build the Infrastructure Manifest
  • Part 2: Manifest-Based Application Deployment

At the last few DevOps conferences I’ve attended, the lunch-time discussion have revolved around tying your test, build, and deploy workflows to your cloud infrastructure. A lot of people are trying to bend tools like Chef for this purpose and are generally unhappy with the result.

After a lot of trial and error, the strategy that we currently use at Signal is to

  • completely specify your infrastructure definitions in a simple JSON manifest; and
  • use your Cloud API(s) to transform this functional definition into a working Manifest which details all hosts in your infrastructure and which applications they run

Once we have the Manifest, Continued…

Posted in Tutorials.


Mongo Multi-Key Index Performance

tl;dr – Mongo multi-key indexes don’t scale well with high-cardinality arrays. This is the same bad behavior that KairosDB is experiencing with its own indexes.

This post explores a technique for using MongoDB as a general-purpose time-series database. It was investigated as a possible temporary workaround until KairosDB adds support for high-cardinality tags. In particular, this describes using a MongoDB multi-key index for associating arbitrary key-value metadata with time-series metrics in MongoDB and the associated performance problems.

Motivation

We’ve been using KairosDB in production for close to a year now. KairosDB is a general-purpose time-series database built on Cassandra. Each “metric” consists of a name, value, and a set of associated “tags” (key-value metadata). This metadata is extremely useful as it provides structured metadata for slicing, filtering, and grouping the stats.

The main issue restricting us from adopting it more widely is its poor support for high-cardinality tags; that is, tag keys with a large number of distinct values, such as IP addresses or other unique identifiers. Unfortunately, these types of values are also a prime use case for tags in the first place. You can read all about this issue on the KairosDB user group, as its one of the most well-known issues currently. A few months ago I gave a presentation on in Building a Scalable Distributed Stats System which describes a work-around for this issue when there’s a small number of high-cardinality tag keys.

However, the new use case requires a set of high-cardinality keys which is dynamic and unknown a priori. Since the KairosDB team is looking into fixing this issue but hasn’t actually resolved it, I wanted to investigate whether we could use MongoDB temporarily as a backing store behind the Kairos API. Continued…

Posted in Tutorials.


Libertarians? Greens? Lock ’em Out!

The 2014 midterm elections seem to be bigger than prior years with more ads, robo-calls, and social media posts. During this turmoil, I learned a number of new things about the leading political parties that disgust me. At the top of this list was the Republicans’ and Democrats’ efforts at controlling access to the ballot.

The idea of “ballot access” control is that third parties will undercut votes from the “Big 2″ parties. Specifically, the belief is that a vote for the Libertarian party is likely one less vote for the Republicans; likewise, the Democrats could lose votes to Green candidates. So, the story goes, its in the best interest of the two predominant parties to restrict other parties from being present on the ballot at all.

Continued…

Posted in Commentary.


Keep Out The Vote

In the chaos leading up to Election Day on Tuesday, we’ve all been inundated with Get Out The Vote messages from both parties.

This is supposed to be the parties’ way of encouraging citizens’ active participation in our great democratic society. So when Pretty Nerd and I got a call from one of Rauner’s people, we were initially pleasant and politely informed them that we were already voting, though not for Rauner. Imagine our surprise when, lo and behold, Rauner’s campaign caller responded with “just don’t go to the polls then.” She repeated this statement Continued…

Posted in Commentary.


Custom JMeter Samplers and Config Elements

tl;dr – Writing custom JMeter plugins doesn’t have to be complicated. This tutorial describes the process of developing a custom Sampler and Config Element. We develop a Kafka Producer Sampler and example Synthetic Load Generator Config Element. If you just want to send messages from JMeter to Kafka or see an example of generating synthetic traffic, you can go straight to the source.

So you want to load test a non-HTTP system. At first, you don’t think your favorite load testing tool, JMeter, will be of any help. But you remember that its open source and supposedly extensible. Let’s see if we can do this.

For my use case, I wanted a simple way to load test a system which reads its requests from Kafka. This has two requirements:

  1. read or generate synthetic requests (messages)
  2. publish the messages to a Kafka topic

For step 1, if I wanted to pre-generate all the requests, I could use the CSV Data Set Config to read them into JMeter. However, this would require generating a sufficiently-large request set for each test scenario. I preferred to let JMeter generate the actual request from a simple configuration describing the traffic distribution. This configuration could also be generated from real data to effectively simulate the shape of the data coming into the system. Thus, step 1 required development of a new “Config Element” in JMeter.

For step 2, there was no existing option for sending data to Kafka. But now you have one, so just use the Kafka Producer Sampler from kafkameter.

Let’s dig in.

Continued…

Posted in Tutorials.


Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.

Let’ see what each of these pieces gives us:

  • Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
  • We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
  • Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
  • We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.

Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.

Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.

If you want a peek into how these pieces fit together, peep the slides.

Continued…

Posted in Tutorials.


Should I max out my 401(k) or pay down student loans?

This is a common question new graduates ask. Although I graduated two years ago, I didn’t really run the numbers until recently… and boy am I disappointed in past-Cody for not doing this sooner.

The spreadsheet I used to answer this question for myself is below (but with fake numbers :-). Punch in your own numbers to see how much money you can save by increasing your 401(k) contributions. Now that I know better, I’m saving an extra $3,000 each year by maxing out my 401(k). How much can you save?

Continued…

Posted in Personal Finance, Tutorials.


Hacking Twitter Competitions: Automatically Tracking Followers Count

Just before Christmas, a Chicago Food Truck decided to give away free sandwiches for a year to their 1000th Twitter follower.

Cheesies_Truck Competition Tweet

You know I had to try.

After checking it a few times over a 15 minute period or so, I noticed that the follower count was increasing very slowly. I knew I wouldn’t have the diligence to continue checking, so I decided to write a script that would do the check and notify me every ten minutes or so. Since I’m on a Mac, I decided to use Growl for these notifications.

In this post, I’ll walk you through how to automatically check a Twitter user’s follower count and get a Growl notification periodically.

What You Need Continued…

Posted in Tutorials.


Add to Goodreads from Amazon

Tired of splitting your reading wish-list between Amazon and GoodReads? Me too. Here’s an “Add to GoodReads” bookmarklet. Just highlight the code and drag it to your bookmark bar. You might have to right-click->Edit to give it a title like “Add to GoodReads”. This should work from Amazon product detail pages where you would otherwise click “Add to Wish List”.

Instead of adding books to your Amazon Wish List, you can now add them to Goodreads instead. Yay!

Happy reading!

Posted in Tutorials.


Dependency Injection in Sinatra

Dependency injection (DI) is a very common development practice in many languages, but its never been huge in Ruby. Part of that is because Ruby is dynamic enough that it doesn’t really need dependency injection like, say, Java. But I argue that Ruby can greatly benefit from DI. Do you use a singleton configuration object? Or worse, other singleton objects, especially those with mutable state?

def some_method(*args)
  foo = MyApp.configuration.foo
end

Mutable singletons have ripple effects across the app and make it very difficult (and scary) to evolve. Even mostly-read configuration objects introduce tight and often invisible/forgotten coupling between objects. Continued…

Posted in Tutorials.




Log in here!