Skip to content

Personal Capital’s Tactical Weighting Approach

Last week I had the free Personal Capital consultation. My “advisor” had run a portfolio analysis based on my aggregated information in the PC dashboard for me and wanted to share the results. I was a bit intrigued with their so-called “Tactical Weighting” portfolio allocation approach and wanted to discuss it here today.

Personal Portfolio Review

To set the stage for the need for Tactical Weighting, let’s discuss a few things that my advisor highlighted as a bit worrisome:

  • cash holdings: I’m holding too much cash for my risk profile, both within investment accounts and in savings
  • geographic concentration: I’m almost entirely invested in U.S. equities (besides cash and real estate holdings)
  • sector concentration: I’m heavily weighted toward technology and financials
  • market-cap concentration: I’m heavily weighted towards the largest of U.S. companies


Posted in Commentary.

Fastest way to un-cap a MongoDB capped collection

A couple of weeks ago I converted a mongodb collection to a capped collection. This is for an event archive which we only need to keep the last month of events stored locally. The issue is that this data grows unbounded for a long time until we manually free up disk space. Enter capped collections, which create a fixed-size collection by automatically removing the oldest documents. Unfortunately, I didn’t realize that our application would also update existing documents. Updates which cause a document to grow will fail. Bummer.

Now we need to rollback to uncapped collections and find another way to manage the database size (ahem, cron job). The recommended approach using “copyTo(…)” is deprecated in newer versions and agonizingly slow for a large 100GB+ data set in older versions. (This database was still running 2.4. I know.)

For a basic benchmark, using “copyTo(…)” took about 10 hours to copy ~80% of a 100GB capped collection. But the catch is that there’s no progress indicator at all. The database is locked during the copy so you can’t even look at the size of the new collection, and there’s nothing useful printed in the logs. I only know it completed that much because I halted the copy and then looked. Probably should’ve let it finish, but I didn’t know if it was making any progress, much less that close.

Following the advice of a kind stranger in IRC (#mongodb), I decided to try mongodump and mongorestore.

The dump was fast and showed progress the whole time. (Took 22 minutes total to dump.)

PROD root@myhost:/data/uncap # mongodump -d mydb -c mycoll
connected to:
Sun Feb  5 00:02:28.879 DATABASE: mydb   to     dump/mydb
Sun Feb  5 00:02:28.880         mydb.mycoll to dump/mydb/mycoll.bson
Sun Feb  5 00:02:31.004                 Collection File Writing Progress: 868400/67569879       1%  (objects)
Sun Feb  5 00:24:13.004                 Collection File Writing Progress: 67480900/67569879     99% (objects)
Sun Feb  5 00:24:14.203                  67569879 objects
Sun Feb  5 00:24:14.203         Metadata for mydb.mycoll to dump/mydb/mycoll.metadata.json

But then the restore created a new capped collection. Dratz!

Fortunately, the dump includes a metadata file in JSON format.

cat dump/mydb/mycoll.metadata.json
  "options": {
    "capped": true,
    "size": 107374182400
  "indexes": [
      "v": 1,
      "key": {
        "_id": 1
      "ns": "mydb.mycoll",
      "name": "_id_"

Go ahead and remove that “options” section which specifies the capped collection size. Now restore.

PROD root@myhost:/data/uncap # mongorestore -d mydb -c mycoll_tmp dump/mydb/mycoll.bson
connected to:
Sun Feb  5 00:29:41.473 dump/mydb/mycoll.bson
Sun Feb  5 00:29:41.473         going into namespace [mydb.mycoll_tmp]
Sun Feb  5 00:29:44.060                 Progress: 51202216/106169934834 0%      (bytes)
Sun Feb  5 00:29:47.007                 Progress: 106497873/106169934834        0%      (bytes)
Sun Feb  5 01:57:19.065                 Progress: 106159626025/106169934834     99%     (bytes)
67569879 objects found
Sun Feb  5 01:57:19.637         Creating index: { key: { _id: 1 }, ns: "mydb.mycoll_tmp", name: "_id_" }

So it automatically creates the indices for us. (If you have a lot of indices, this takes a long time.)

Check it out now.

rs-prod:PRIMARY> db.mycoll.isCapped()
rs-prod:PRIMARY> db.mycoll_tmp.isCapped()
rs-prod:PRIMARY> db.mycoll.count()
rs-prod:PRIMARY> db.mycoll_tmp.count()

Perfect! Now we can just drop the old collection and rename the new collection.


Lessons Learned:

  • Read the fine print on capped collections before deciding they’re perfect.
  • Things that use eval internally (ahem, copyTo) can be painfully slow.
  • Sometimes doing a dump, manually tweaking a config, and then restoring is fastest.


Posted in Tutorials.

Restricting uploads to public PyPI

Many companies use an internal PyPI server for storing their proprietary python packages. This makes managing python libraries and application dependencies so much easier. But unfortunately this also makes it easy for people to accidentally upload their private code to the public PyPI unintentionally.

Lucky for us, there’s a cool extension to setuptools called restricted_pkg! Unlucky for us, it leaves something to be desired in terms of user experience. Let’s say we have an example library called xl which uses restricted_pkg to prevent accidental uploads. Continued…

Posted in Tutorials.

AWS User Policy for Single S3 Bucket

A common requirement is to have a backup service or script that uploads objects to S3 for storage. Since its good practice to scope user permissions as narrowly as possible, this leads to creating separate “api users” in Amazon for each service. Each user is only given permission for the buckets it needs to access. Unfortunately, the Resource URIs for AWS are non-intuitive and you have to remember to whitelist both the bucket and its contents. If you’re kind, you’ll also allow listing all buckets to make navigating through the UI or other tools possible.

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": [
            "Effect": "Allow",
            "Action": [
            "Resource": [

#protip #selfreference

Posted in Tutorials.

How To See A Process’s Environment in Linux

One of our sysadmins recently taught me that we can see the environment with which a process is launched by looking in /proc. Whoa! That’s helpful.

Unfortunately, the environment file is null terminated so not pleasant to read or pipe together with other commands. So here’s a handy one-liner to print them “properly” for easier inspection or command chaining.

cat /proc/{pid}/environ | xargs --null --max-args=1 echo

#protip #selfreference

Posted in Tech Reference.

How Much Does Tax Deferral Save You?

Ever wondered exactly how tax deferral saves you money? Although deferring your income is often cited as a “good thing”, the most common explanation given is a tax arbitrage between your current tax rate and a presumably lower tax rate in retirement. Many people even go so far to say that if your tax rate is the same in retirement then its equivalent, excluding the time value of money. Let’s see if this is actually true.

I’ve put together a small scenario which compares a one-time contribution to a deductible Traditional IRA to a standard investment account (non-tax-advantaged) which is then allowed to grow for the next two decades.

Tax Deferral Advantage

Wow! We realized a 31% gain using a tax-deferred investment option given the same tax rate! (30% in this example)

So how does this deferral advantage change with different tax rates, investment returns, and holding periods? Continued…

Posted in Tutorials.

Life Insurance for the Financial Independence Crowd

As my wife and I move along on our journey to Financial Independence (FI) while thinking about starting a family, we’ve been wondering about getting life insurance. It feels like one of those things we “should” do, but does it really make sense for us? Does it make sense for anyone pursuing Financial Independence?

Life insurance is a bewildering, fear-driven world that, until recently, I didn’t know enough about to think systematically. This post is going to introduce a few systems that have helped me think about life insurance: how to determine how much you need, how seeking FI impacts your needs, and how to build an affordable life insurance plan.

Posted in Tutorials.

Mapping SinceDB Files to Logstash File Input

Sometimes you need to know which SinceDB files map to which file inputs for Logstash. This could be for a bug with the file input plugin or to force logstash to reparse a specific file. The contents of a SinceDB file look like

479 0 64515 31175

Not very intuitive, is it?

A little googling will show you that the first field in this file is an inode number. A little more searching will show you how to map from an inode number back to a file path. The rest of this post shows how to put together a little two-liner that will just print the map of all SinceDB files to the monitored files.


Posted in Tutorials.

Free StatusPage Hosted On Github & Amazon

After having clients call for status updates during a production outage earlier this week, I started thinking more about the classic “status page.” This is for a side business, so I don’t really want to pay $30/mo to for this functionality. How could I do something high quality but for low cost?

Luckily, I found an open source statuspage that gets me most of the way there. It allows for hosting on Github Pages, thus decoupling my production infrastructure from the status infrastructure. (Its not very useful if your status page goes down at the same time as your production stuff, so its best to decouple them as much as possible.)

Unfortunately, this project requires manually running a shell command every time we create, update, comment on, or close an issue. While this is OK, in the heat of the moment, the fewer things I need to remember to do the better. I saw an option to pay the creators $30/year to automate this for you, but clicking the link took me to a dead site. Plus I wanted to play with some of the newer AWS stuff anyway.

So without further adieu, let’s walk through using Amazon to automate updates to your status page… for free!


Posted in Tutorials.

Building Logstash Pipelines using @metadata

Its common for many companies to run multiple applications on a single physical host or virtual machine. Each of the applications usually has its own log file. A local logstash can be used to read all of these messages, process, and forward to Elasticsearch (or another Logstash or a message queue, anywhere really). You can even logically organize one logstash config file per application, complete with input, filters, and output. So what’s the problem?

How do I ensure that my filters/output only run on the right input?

A common practice is to add a “tags” field on the input and check for it in the filters and output. If you’re diligent about removing this tags field in the output, this can work… but ain’t nobody got time for that.Unfortunately, what often happens is that field is forgotten and ends up in your data downstream. Yuck. So what’s a better pattern?

Logstash 1.5 added the ability to add metadata to an event. This provides the building block for what I like to call the “Logstash Pipeline Pattern”. We can use this metadata to form an independent logstash pipeline (input/filters/output) for every application on the host without running multiple instances of logstash.

Here’s what this looks like in practice. Continued…

Posted in Tutorials.

Log in here!