Skip to content

Building Logstash Pipelines using @metadata

Its common for many companies to run multiple applications on a single physical host or virtual machine. Each of the applications usually has its own log file. A local logstash can be used to read all of these messages, process, and forward to Elasticsearch (or another Logstash or a message queue, anywhere really). You can even logically organize one logstash config file per application, complete with input, filters, and output. So what’s the problem?

How do I ensure that my filters/output only run on the right input?

A common practice is to add a “tags” field on the input and check for it in the filters and output. If you’re diligent about removing this tags field in the output, this can work… but ain’t nobody got time for that.Unfortunately, what often happens is that field is forgotten and ends up in your data downstream. Yuck. So what’s a better pattern?

Logstash 1.5 added the ability to add metadata to an event. This provides the building block for what I like to call the “Logstash Pipeline Pattern”. We can use this metadata to form an independent logstash pipeline (input/filters/output) for every application on the host without running multiple instances of logstash.

Here’s what this looks like in practice. Continued…

Posted in Tutorials.

How To Ingest App Metrics from Slack into Elasticsearch

Recently I started helping Cardbucks, a very early-stage startup team. They’re running pretty bare-bones during their early stage market-fit experiments and haven’t setup any application monitoring or business intelligence solution for their users yet. However, they’ve been logging all user actions to a Slack room from Day One, which is awesome. So for a hack day, I built a bot to scrape the historical messages as well as ingest all new incoming metrics from Slack into Elasticsearch.

Ingest Real-Time Metrics

The first thing was to find an easy bot framework that both lets me receive new messages in (near) real-time. The Slack Team have generously provided the skeleton with python-rtmbot. This is a callback-based bot engine, so we need only write a simple plugin and configure it with our Slack token for ingesting metrics from the real-time message stream.

Below is a simple example of how we did this for the Cardbucks team.  Continued…

Posted in Tutorials.

Guide to the Southwest Companion Pass

Many of us have large travel and lifestyle aspirations for our families, so I wanted to share a travel tip that I’ve put to use for my own family last year. If you haven’t noticed, Southwest has just added a bunch of international flights to beach, mountain, and island destinations like Puerto Rico, Costa Rica, and Belize. Here’s a few tips that will let you fly there for wayyyy cheaper.

While this tip itself won’t change your lifestyle, hopefully it’ll give you to enough juice to be able to explore a few of these locations without breaking the bank. See what you like, start living a couple of your dreams, and allow you more room to chart your course ahead.

Of course, some people aren’t comfortable with the tactics that I’m about to outline since they involve wise use of credit card bonuses and a bit of manufactured spending. So it may not be your thing either and I understand. But I thought I’d share anyway, just in case. :)

What You’ll Need

  1. Reasonable Credit (and Willingness to Use It)
  2. A Good Plan (and Expenses You Can Pay With Credit Cards)

Goal #1: Southwest Points

Southwest Points are awesome because Southwest doesn’t charge drastically more points even for far-flung flights. My wife and I booked honeymoon tickets last year to Costa Rica for 16,800 points each! (Plus $53.03 each in taxes and government fees, still not too shabby.) Continued…

Posted in Tutorials.

12 Factor Microservices on Docker

Of late I’ve been interested in the intersection of containerization and 12factor apps.

This led me to piecing together a small proof-of-concept 12factor microservice on top of Docker. I’ve published this effort as a two-part tutorial on PacktPub.

  1. Introduction and the first 4 Factors
  2. The Remaining 8 Factors and PaaS Preview

My company has slowly started experimenting with containers… now to move our (stateless) apps toward 12factor! :)

Posted in Tutorials.

Auto-Load Jinja2 Macros

Last night I was up hacking away on a side project for work. Some of my favorite “quick wins” involve automating and standardizing our operational infrastructure. This project involved generating HAProxy configs from Jinja templates instead of painstakingly managing each server’s (or server group’s) configuration by hand.

In the latest iteration, I was trying to automatically load a common macro in every template without requiring every template to specify it. This is equivalent to adding

{% from 'macros.txt' import macro1, macro2 with context %}

at the top of every template file. But of course I don’t want to repeat myself everywhere.

This article shows two techniques for achieving this goal, depending on Continued…

Posted in Tutorials.

Calculating Cash at Closing

As you might’ve gathered from my post on Analyzing Investment Real Estate a couple months ago, I’m looking to buy an investment property. Well, lo and behold, I’ve had an offer accepted for my first property! That, however, is not the subject of this post. After talking with multiple lenders and having them all spitball me different numbers, I think I finally understand what goes into the cash due at closing (at least in Chicago). Get ready for another spreadsheet!

Download my Cash at Closing Spreadsheet

The cash due at closing is composed of five categories:

  1. Down Payment is the initial equity or ownership that you’ll have in the property. The amount of your downpayment determines the type of financing, the interest rates, and many other aspects of your purchase.
  2. Lending Fees include the underwriting fee, appraisal cost, and the first year of insurance. Lenders require insurance to protect their investment. Unlike property taxes which are paid in arrears, Continued…

Posted in Tutorials.

Analyzing Investment Real Estate

Over the last couple months, I’ve been learning about investing in Real Estate. Like most new to this field, I’m initially focusing on residential rentals. This entails learning what differentiates the good from the bad, how to diversify Real Estate investments, how to evaluate properties, and, most critically, how to analyze their returns. In this post, I’m going to explain the four pillars of Real Estate investment returns, detail some of my favorite measures of investment performance, and share my Rental Analysis Spreadsheet. Feel free to jump straight to the spreadsheet; I won’t stop you. :)

Download my Rental Analysis Spreadsheet
(To get your own: Go to File > Make a Copy)

Real Estate presents four different mechanisms for investment returns: Continued…

Posted in Tutorials.

Simplify Deployment with Infrastructure Manifest (Part 1)

This is Part 1 in a short series about using a Manifest of your infrastructure for automation.

  • Part 1: Build the Infrastructure Manifest
  • Part 2: Manifest-Based Application Deployment

At the last few DevOps conferences I’ve attended, the lunch-time discussion have revolved around tying your test, build, and deploy workflows to your cloud infrastructure. A lot of people are trying to bend tools like Chef for this purpose and are generally unhappy with the result.

After a lot of trial and error, the strategy that we currently use at Signal is to

  • completely specify your infrastructure definitions in a simple JSON manifest; and
  • use your Cloud API(s) to transform this functional definition into a working Manifest which details all hosts in your infrastructure and which applications they run

Once we have the Manifest, Continued…

Posted in Tutorials.

Mongo Multi-Key Index Performance

tl;dr – Mongo multi-key indexes don’t scale well with high-cardinality arrays. This is the same bad behavior that KairosDB is experiencing with its own indexes.

This post explores a technique for using MongoDB as a general-purpose time-series database. It was investigated as a possible temporary workaround until KairosDB adds support for high-cardinality tags. In particular, this describes using a MongoDB multi-key index for associating arbitrary key-value metadata with time-series metrics in MongoDB and the associated performance problems.


We’ve been using KairosDB in production for close to a year now. KairosDB is a general-purpose time-series database built on Cassandra. Each “metric” consists of a name, value, and a set of associated “tags” (key-value metadata). This metadata is extremely useful as it provides structured metadata for slicing, filtering, and grouping the stats.

The main issue restricting us from adopting it more widely is its poor support for high-cardinality tags; that is, tag keys with a large number of distinct values, such as IP addresses or other unique identifiers. Unfortunately, these types of values are also a prime use case for tags in the first place. You can read all about this issue on the KairosDB user group, as its one of the most well-known issues currently. A few months ago I gave a presentation on in Building a Scalable Distributed Stats System which describes a work-around for this issue when there’s a small number of high-cardinality tag keys.

However, the new use case requires a set of high-cardinality keys which is dynamic and unknown a priori. Since the KairosDB team is looking into fixing this issue but hasn’t actually resolved it, I wanted to investigate whether we could use MongoDB temporarily as a backing store behind the Kairos API. Continued…

Posted in Tutorials.

Libertarians? Greens? Lock ’em Out!

The 2014 midterm elections seem to be bigger than prior years with more ads, robo-calls, and social media posts. During this turmoil, I learned a number of new things about the leading political parties that disgust me. At the top of this list was the Republicans’ and Democrats’ efforts at controlling access to the ballot.

The idea of “ballot access” control is that third parties will undercut votes from the “Big 2″ parties. Specifically, the belief is that a vote for the Libertarian party is likely one less vote for the Republicans; likewise, the Democrats could lose votes to Green candidates. So, the story goes, its in the best interest of the two predominant parties to restrict other parties from being present on the ballot at all.


Posted in Commentary.

Log in here!