Skip to content

12 Factor Microservices on Docker

Of late I’ve been interested in the intersection of containerization and 12factor apps.

This led me to piecing together a small proof-of-concept 12factor microservice on top of Docker. I’ve published this effort as a two-part tutorial on PacktPub.

  1. Introduction and the first 4 Factors
  2. The Remaining 8 Factors and PaaS Preview

My company has slowly started experimenting with containers… now to move our (stateless) apps toward 12factor! :)

Posted in Tutorials.

Auto-Load Jinja2 Macros

Last night I was up hacking away on a side project for work. Some of my favorite “quick wins” involve automating and standardizing our operational infrastructure. This project involved generating HAProxy configs from Jinja templates instead of painstakingly managing each server’s (or server group’s) configuration by hand.

In the latest iteration, I was trying to automatically load a common macro in every template without requiring every template to specify it. This is equivalent to adding

{% from 'macros.txt' import macro1, macro2 with context %}

at the top of every template file. But of course I don’t want to repeat myself everywhere.

This article shows two techniques for achieving this goal, depending on Continued…

Posted in Tutorials.

Calculating Cash at Closing

As you might’ve gathered from my post on Analyzing Investment Real Estate a couple months ago, I’m looking to buy an investment property. Well, lo and behold, I’ve had an offer accepted for my first property! That, however, is not the subject of this post. After talking with multiple lenders and having them all spitball me different numbers, I think I finally understand what goes into the cash due at closing (at least in Chicago). Get ready for another spreadsheet!

Download my Cash at Closing Spreadsheet

The cash due at closing is composed of five categories:

  1. Down Payment is the initial equity or ownership that you’ll have in the property. The amount of your downpayment determines the type of financing, the interest rates, and many other aspects of your purchase.
  2. Lending Fees include the underwriting fee, appraisal cost, and the first year of insurance. Lenders require insurance to protect their investment. Unlike property taxes which are paid in arrears, Continued…

Posted in Tutorials.

Analyzing Investment Real Estate

Over the last couple months, I’ve been learning about investing in Real Estate. Like most new to this field, I’m initially focusing on residential rentals. This entails learning what differentiates the good from the bad, how to diversify Real Estate investments, how to evaluate properties, and, most critically, how to analyze their returns. In this post, I’m going to explain the four pillars of Real Estate investment returns, detail some of my favorite measures of investment performance, and share my Rental Analysis Spreadsheet. Feel free to jump straight to the spreadsheet; I won’t stop you. :)

Download my Rental Analysis Spreadsheet

Real Estate presents four different mechanisms for investment returns: Continued…

Posted in Tutorials.

Mongo Multi-Key Index Performance

tl;dr – Mongo multi-key indexes don’t scale well with high-cardinality arrays. This is the same bad behavior that KairosDB is experiencing with its own indexes.

This post explores a technique for using MongoDB as a general-purpose time-series database. It was investigated as a possible temporary workaround until KairosDB adds support for high-cardinality tags. In particular, this describes using a MongoDB multi-key index for associating arbitrary key-value metadata with time-series metrics in MongoDB and the associated performance problems.


We’ve been using KairosDB in production for close to a year now. KairosDB is a general-purpose time-series database built on Cassandra. Each “metric” consists of a name, value, and a set of associated “tags” (key-value metadata). This metadata is extremely useful as it provides structured metadata for slicing, filtering, and grouping the stats.

The main issue restricting us from adopting it more widely is its poor support for high-cardinality tags; that is, tag keys with a large number of distinct values, such as IP addresses or other unique identifiers. Unfortunately, these types of values are also a prime use case for tags in the first place. You can read all about this issue on the KairosDB user group, as its one of the most well-known issues currently. A few months ago I gave a presentation on in Building a Scalable Distributed Stats System which describes a work-around for this issue when there’s a small number of high-cardinality tag keys.

However, the new use case requires a set of high-cardinality keys which is dynamic and unknown a priori. Since the KairosDB team is looking into fixing this issue but hasn’t actually resolved it, I wanted to investigate whether we could use MongoDB temporarily as a backing store behind the Kairos API. Continued…

Posted in Tutorials.

Libertarians? Greens? Lock ’em Out!

The 2014 midterm elections seem to be bigger than prior years with more ads, robo-calls, and social media posts. During this turmoil, I learned a number of new things about the leading political parties that disgust me. At the top of this list was the Republicans’ and Democrats’ efforts at controlling access to the ballot.

The idea of “ballot access” control is that third parties will undercut votes from the “Big 2″ parties. Specifically, the belief is that a vote for the Libertarian party is likely one less vote for the Republicans; likewise, the Democrats could lose votes to Green candidates. So, the story goes, its in the best interest of the two predominant parties to restrict other parties from being present on the ballot at all.


Posted in Commentary.

Keep Out The Vote

In the chaos leading up to Election Day on Tuesday, we’ve all been inundated with Get Out The Vote messages from both parties.

This is supposed to be the parties’ way of encouraging citizens’ active participation in our great democratic society. So when Pretty Nerd and I got a call from one of Rauner’s people, we were initially pleasant and politely informed them that we were already voting, though not for Rauner. Imagine our surprise when, lo and behold, Rauner’s campaign caller responded with “just don’t go to the polls then.” She repeated this statement Continued…

Posted in Commentary.

Custom JMeter Samplers and Config Elements

tl;dr – Writing custom JMeter plugins doesn’t have to be complicated. This tutorial describes the process of developing a custom Sampler and Config Element. We develop a Kafka Producer Sampler and example Synthetic Load Generator Config Element. If you just want to send messages from JMeter to Kafka or see an example of generating synthetic traffic, you can go straight to the source.

So you want to load test a non-HTTP system. At first, you don’t think your favorite load testing tool, JMeter, will be of any help. But you remember that its open source and supposedly extensible. Let’s see if we can do this.

For my use case, I wanted a simple way to load test a system which reads its requests from Kafka. This has two requirements:

  1. read or generate synthetic requests (messages)
  2. publish the messages to a Kafka topic

For step 1, if I wanted to pre-generate all the requests, I could use the CSV Data Set Config to read them into JMeter. However, this would require generating a sufficiently-large request set for each test scenario. I preferred to let JMeter generate the actual request from a simple configuration describing the traffic distribution. This configuration could also be generated from real data to effectively simulate the shape of the data coming into the system. Thus, step 1 required development of a new “Config Element” in JMeter.

For step 2, there was no existing option for sending data to Kafka. But now you have one, so just use the Kafka Producer Sampler from kafkameter.

Let’s dig in.


Posted in Tutorials.

Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.

Let’ see what each of these pieces gives us:

  • Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
  • We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
  • Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
  • We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.

Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.

Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.

If you want a peek into how these pieces fit together, peep the slides.


Posted in Tutorials.

Should I max out my 401(k) or pay down student loans?

This is a common question new graduates ask. Although I graduated two years ago, I didn’t really run the numbers until recently… and boy am I disappointed in past-Cody for not doing this sooner.

The spreadsheet I used to answer this question for myself is below (but with fake numbers :-). Punch in your own numbers to see how much money you can save by increasing your 401(k) contributions. Now that I know better, I’m saving an extra $3,000 each year by maxing out my 401(k). How much can you save?


Posted in Tutorials.

Log in here!