Skip to content

Calculating Cash at Closing

As you might’ve gathered from my post on Analyzing Investment Real Estate a couple months ago, I’m looking to buy an investment property. Well, lo and behold, I’ve had an offer accepted for my first property! That, however, is not the subject of this post. After talking with multiple lenders and having them all spitball me different numbers, I think I finally understand what goes into the cash due at closing (at least in Chicago). Get ready for another spreadsheet!

Download my Cash at Closing Spreadsheet

The cash due at closing is composed of five categories:

  1. Down Payment is the initial equity or ownership that you’ll have in the property. The amount of your downpayment determines the type of financing, the interest rates, and many other aspects of your purchase.
  2. Lending Fees include the underwriting fee, appraisal cost, and the first year of insurance. Lenders require insurance to protect their investment. Unlike property taxes which are paid in arrears, property taxes are paid in advance. As insurance is often paid annually, the first year of insurance is required. You must have arranged for homeowners insurance before the loan can be closed. When you sell the property, the annual payment will be prorated for the used months and the remaining payments should be refunded by the insurance company.
  3. Misc Fees include the title company fees, attorney fees, and any applicable city/state transfer taxes for transferring the title. The title company fees include insurance and recording charges. The title insurance protects against title defects, liens, or other matters. The recording charges are fees paid for updating the public record for the property. In addition the administrative recording charge, some counties impose a tax or surcharge fee on transferring the title of property.
  4. Prepaid Items include 3-to-6 months of property taxes in escrow, 3 months of insurance escrow, and interest from the closing date to the end of the month. These items are not expenses per se but rather future costs that are being paid up-front or stored in an escrow account until the payment is due sometime in the future. Taxes are paid in arrears (after-the-fact), so you simply pay a few months in advance to serve as a buffer in case you fall behind. This is sometimes true for insurance as well; even though you’ve paid for the current term of a year, another several months of costs may be requested to help cover the next payment in case you fall behind near the end of your term and are unable to catch-up before the next term begins.
  5. Credits include credits to the seller’s prorated taxes, any earnest money already paid, and any negotiated seller credits toward closing. The tax credit is to cover the seller’s portion of this year’s taxes since the expense has been occurred prior to closing but not yet paid. Although not technically a credit, any money you’ve paid into escrow as earnest money must be deducted from the remaining amount due at closing. Lastly, if you were able to negotiate any credit from the seller to help cover closing costs, this must be deducted from the amount due as well.

Cash at Closing


This spreadsheet works for both conventional and FHA loans, including calculations of up-front MIP and monthly PMI if applicable. The transfer tax rate is currently hardcoded to Chicago rates (0.75% of purchase price) and it assumes you’ll escrow 3-months of insurance in addition to paying for the first year upfront.

Download my Cash at Closing Spreadsheet

You’ll need to talk to several lenders to get estimate for each of these fees, rates, and expenses. The example shown is for a fictional three-unit purchased for $300k using a 10% down 30 year FHA loan. Although these aren’t my exact numbers, they’re at least ballpark for recent quotes in Chicago. For example, every Real Estate attorney with whom I’ve spoken has had flat rates in the $600-$700 range so far.

A few other useful tidbits I’ve learned:

  • A conventional loan for a duplex requires 20% down even for owner-occupants.
  • A conventional loan for a triplex (3-unit) or quadplex (4-unit) requires 25% down
  • Owner-occupants can use FHA to put as little as 3.5% down for a 2-4 unit.
  • If you put 5% or less with FHA, the percentage paid for PMI increases from 0.8% per year to 0.85% per year.
  • Summary: my first investment will hopefully be a 3-unit owner-occupied property with 10% down 30-year fixed FHA.

Disclaimer: this is not a substitue for getting a Good Fath Estimate for one or more lenders prior to applying for a loan. I am not a lawyer, accountant, or your mom. Do your own homework.

If I’ve made any mistakes or you have any suggestions, let me know!

Posted in Tutorials.

Analyzing Investment Real Estate

Over the last couple months, I’ve been learning about investing in Real Estate. Like most new to this field, I’m initially focusing on residential rentals. This entails learning what differentiates the good from the bad, how to diversify Real Estate investments, how to evaluate properties, and, most critically, how to analyze their returns. In this post, I’m going to explain the four pillars of Real Estate investment returns, detail some of my favorite measures of investment performance, and share my Rental Analysis Spreadsheet. Feel free to jump straight to the spreadsheet; I won’t stop you. :)

Download my Rental Analysis Spreadsheet

Real Estate presents four different mechanisms for investment returns: Continued…

Posted in Tutorials.

Mongo Multi-Key Index Performance

tl;dr – Mongo multi-key indexes don’t scale well with high-cardinality arrays. This is the same bad behavior that KairosDB is experiencing with its own indexes.

This post explores a technique for using MongoDB as a general-purpose time-series database. It was investigated as a possible temporary workaround until KairosDB adds support for high-cardinality tags. In particular, this describes using a MongoDB multi-key index for associating arbitrary key-value metadata with time-series metrics in MongoDB and the associated performance problems.


We’ve been using KairosDB in production for close to a year now. KairosDB is a general-purpose time-series database built on Cassandra. Each “metric” consists of a name, value, and a set of associated “tags” (key-value metadata). This metadata is extremely useful as it provides structured metadata for slicing, filtering, and grouping the stats.

The main issue restricting us from adopting it more widely is its poor support for high-cardinality tags; that is, tag keys with a large number of distinct values, such as IP addresses or other unique identifiers. Unfortunately, these types of values are also a prime use case for tags in the first place. You can read all about this issue on the KairosDB user group, as its one of the most well-known issues currently. A few months ago I gave a presentation on in Building a Scalable Distributed Stats System which describes a work-around for this issue when there’s a small number of high-cardinality tag keys.

However, the new use case requires a set of high-cardinality keys which is dynamic and unknown a priori. Since the KairosDB team is looking into fixing this issue but hasn’t actually resolved it, I wanted to investigate whether we could use MongoDB temporarily as a backing store behind the Kairos API. Why MongoDB? Because its easy to use, we know how to scale it (even if its painful), and atomic increments are a powerful bonus.

MongoDB Schema

The first task in evaluating MongoDB for this general-purpose use case is to propose a schema of sorts; we need something flexible enough to use the same underlying model and update operations as KairosDB while allowing efficient querying using MongoDB indexes. The initial schema looked something like:

  "timestamp": ,
  "name": ,
  "value": ,
  "tags": [

You might be wondering why “tags” is an array of strings rather than a true subdocument. The answer is indexing. Ideally, we could use a hashed index on a proper “tags” subdocument; however, as the documentation states, “you cannot create compound indexes that have hashed index fields.” Instead, we try to use a multi-key index on an array of values. We can combine this multi-key index on tags with the timestamp and name to create a compound index by which we can query for specific metrics. If we call our collection metrics, then we create the index like so:


Query Plan Explanation

Before we went any farther with this proof-of-concept, I wanted to understand whether these indices were likely to be performant for our query and update operations. If you don’t know about MongoDB’s explain() operator, I want you to stop what you’re doing right now and go read: cursor.explain()

Finished? Good. Hopefully you can see where we’re going with this now. We’ll execute an example query with various documents in the database and let MongoDB walk us through the query operations.

Let’s get a baseline with an empty collection.


Amongst the output, you should see

"cursor" : "BtreeCursor timeStamp_1_name_1_tags_1 multi",
"isMultiKey" : true,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,

This confirms that we’re using our new multikey compound index. Although the indexOnly=false line may look scary, it means that there are fields to be returned that aren’t in the index; namely, the value itself is stored in the document and must be consulted.  This StackOverflow article helped me understand this output field better.

Let’s review the most important fields for our use case. From the documentation:

  • n is the number of documents that match the query
  • nscanned is the total number of index entries scanned
  • nscannedObjects is the total number of documents scanned

Since there are no index entries or documents yet, all three values are 0 initially.

Okay, now let’s add the first metric.

db.metrics.update({'timeStamp':1234,'name':'metric1','tags':['tag1=val1','tag2=val2']}, {'$inc':{'value':1}}, {upsert:true})

Here we’re just atomically incrementing the value field by one. Let’s run the same explain request to see what the query plan looks like now.

"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 2,

We see that the query now scans two index entries and one document. That seems reasonable.

What if we insert a record with a new name but the same tags?

db.metrics.update({'timeStamp':1234,'name':'metric2','tags':['tag1=val1','tag2=val2']}, {'$inc':{'value':1}}, {upsert:true})

The query plan for the original document would still look like

"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 2,

Great, so it seems like the query criteria is good at only selecting the single correct document.

Now let’s insert a record with the same name but a different value for “tag2″.

db.metrics.update({'timeStamp':1234,'name':'metric1','tags':['tag1=val1','tag2=other']}, {'$inc':{'value':1}}, {upsert:true})

Let’s look at the query plan now.

"n" : 1,
"nscannedObjects" : 2,
"nscanned" : 3,

Uh-oh. This doesn’t look too good. Adding one new value for the second tag increased the number of scanned index entries and documents by one.

What happens if we add a new value for “tag1″ instead?

db.metrics.update({'timeStamp':1234,'name':'metric1','tags':['tag1=other','tag2=val2']}, {'$inc':{'value':1}}, {upsert:true})

Let’s look at the query plan now.

"n" : 1,
"nscannedObjects" : 2,
"nscanned" : 3,

Well, that’s not so bad. Its the same before and after So, in the worst case, the number of scans increases linearly with the number of tag permutations.

If you continue with this exercise, you’ll start to understand the pattern. Essentially, each new tag in the tags array adds a new entry into the index. Since its doing a range search on the tags, it depends on where the new tag entry falls in the index. If its the last tag, its going to fall near or at the end, depending on the new and previous values of the final tag.

What we’ve learned is that mongo multi-key indexes don’t scale well with high-cardinality arrays. Since we’re in the same position as Cassandra-backed KairosDB, its back to the drawing board for me.

It feels like others must have solved these stats problems before. From real-time pre-write aggregation to attaching high-cardinality metadata, we must be reinventing the wheel. Is everything that is good at these tasks proprietary?

What systems do you use for real-time stats?

Posted in Tutorials.

Libertarians? Greens? Lock ’em Out!

The 2014 midterm elections seem to be bigger than prior years with more ads, robo-calls, and social media posts. During this turmoil, I learned a number of new things about the leading political parties that disgust me. At the top of this list was the Republicans’ and Democrats’ efforts at controlling access to the ballot.

The idea of “ballot access” control is that third parties will undercut votes from the “Big 2″ parties. Specifically, the belief is that a vote for the Libertarian party is likely one less vote for the Republicans; likewise, the Democrats could lose votes to Green candidates. So, the story goes, its in the best interest of the two predominant parties to restrict other parties from being present on the ballot at all.


Posted in Commentary.

Keep Out The Vote

In the chaos leading up to Election Day on Tuesday, we’ve all been inundated with Get Out The Vote messages from both parties.

This is supposed to be the parties’ way of encouraging citizens’ active participation in our great democratic society. So when Pretty Nerd and I got a call from one of Rauner’s people, we were initially pleasant and politely informed them that we were already voting, though not for Rauner. Imagine our surprise when, lo and behold, Rauner’s campaign caller responded with “just don’t go to the polls then.” She repeated this statement Continued…

Posted in Commentary.

Custom JMeter Samplers and Config Elements

tl;dr – Writing custom JMeter plugins doesn’t have to be complicated. This tutorial describes the process of developing a custom Sampler and Config Element. We develop a Kafka Producer Sampler and example Synthetic Load Generator Config Element. If you just want to send messages from JMeter to Kafka or see an example of generating synthetic traffic, you can go straight to the source.

So you want to load test a non-HTTP system. At first, you don’t think your favorite load testing tool, JMeter, will be of any help. But you remember that its open source and supposedly extensible. Let’s see if we can do this.

For my use case, I wanted a simple way to load test a system which reads its requests from Kafka. This has two requirements:

  1. read or generate synthetic requests (messages)
  2. publish the messages to a Kafka topic

For step 1, if I wanted to pre-generate all the requests, I could use the CSV Data Set Config to read them into JMeter. However, this would require generating a sufficiently-large request set for each test scenario. I preferred to let JMeter generate the actual request from a simple configuration describing the traffic distribution. This configuration could also be generated from real data to effectively simulate the shape of the data coming into the system. Thus, step 1 required development of a new “Config Element” in JMeter.

For step 2, there was no existing option for sending data to Kafka. But now you have one, so just use the Kafka Producer Sampler from kafkameter.

Let’s dig in.


Posted in Tutorials.

Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.

Let’ see what each of these pieces gives us:

  • Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
  • We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
  • Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
  • We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.

Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.

Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.

If you want a peek into how these pieces fit together, peep the slides.


Posted in Tutorials.

Should I max out my 401(k) or pay down student loans?

This is a common question new graduates ask. Although I graduated two years ago, I didn’t really run the numbers until recently… and boy am I disappointed in past-Cody for not doing this sooner.

The spreadsheet I used to answer this question for myself is below (but with fake numbers :-). Punch in your own numbers to see how much money you can save by increasing your 401(k) contributions. Now that I know better, I’m saving an extra $3,000 each year by maxing out my 401(k). How much can you save?


Posted in Tutorials.

Hacking Twitter Competitions: Automatically Tracking Followers Count

Just before Christmas, a Chicago Food Truck decided to give away free sandwiches for a year to their 1000th Twitter follower.

Cheesies_Truck Competition Tweet

You know I had to try.

After checking it a few times over a 15 minute period or so, I noticed that the follower count was increasing very slowly. I knew I wouldn’t have the diligence to continue checking, so I decided to write a script that would do the check and notify me every ten minutes or so. Since I’m on a Mac, I decided to use Growl for these notifications.

In this post, I’ll walk you through how to automatically check a Twitter user’s follower count and get a Growl notification periodically.

What You Need Continued…

Posted in Tutorials.

Add to Goodreads from Amazon

Tired of splitting your reading wish-list between Amazon and GoodReads? Me too. Here’s an “Add to GoodReads” bookmarklet. Just highlight the code and drag it to your bookmark bar. You might have to right-click->Edit to give it a title like “Add to GoodReads”. This should work from Amazon product detail pages where you would otherwise click “Add to Wish List”.

Instead of adding books to your Amazon Wish List, you can now add them to Goodreads instead. Yay!

Happy reading!

Posted in Tutorials.

Log in here!