Fastest way to un-cap a MongoDB capped collection

A couple of weeks ago I converted a mongodb collection to a capped collection. This is for an event archive which we only need to keep the last month of events stored locally. The issue is that this data grows unbounded for a long time until we manually free up disk space. Enter capped collections, which create a fixed-size collection by automatically removing the oldest documents. Unfortunately, I didn’t realize that our application would also update existing documents. Updates which cause a document to grow will fail. Bummer.

Now we need to rollback to uncapped collections and find another way to manage the database size (ahem, cron job). The recommended approach using “copyTo(…)” is deprecated in newer versions and agonizingly slow for a large 100GB+ data set in older versions. (This database was still running 2.4. I know.)

For a basic benchmark, using “copyTo(…)” took about 10 hours to copy ~80% of a 100GB capped collection. But the catch is that there’s no progress indicator at all. The database is locked during the copy so you can’t even look at the size of the new collection, and there’s nothing useful printed in the logs. I only know it completed that much because I halted the copy and then looked. Probably should’ve let it finish, but I didn’t know if it was making any progress, much less that close.

Following the advice of a kind stranger in IRC (#mongodb), I decided to try mongodump and mongorestore.

The dump was fast and showed progress the whole time. (Took 22 minutes total to dump.)

PROD root@myhost:/data/uncap # mongodump -d mydb -c mycoll
connected to: 127.0.0.1:27017
Sun Feb  5 00:02:28.879 DATABASE: mydb   to     dump/mydb
Sun Feb  5 00:02:28.880         mydb.mycoll to dump/mydb/mycoll.bson
Sun Feb  5 00:02:31.004                 Collection File Writing Progress: 868400/67569879       1%  (objects)
...
Sun Feb  5 00:24:13.004                 Collection File Writing Progress: 67480900/67569879     99% (objects)
Sun Feb  5 00:24:14.203                  67569879 objects
Sun Feb  5 00:24:14.203         Metadata for mydb.mycoll to dump/mydb/mycoll.metadata.json

But then the restore created a new capped collection. Dratz!

Fortunately, the dump includes a metadata file in JSON format.

cat dump/mydb/mycoll.metadata.json
{
  "options": {
    "capped": true,
    "size": 107374182400
  },
  "indexes": [
    {
      "v": 1,
      "key": {
        "_id": 1
      },
      "ns": "mydb.mycoll",
      "name": "_id_"
    }
  ]
}

Go ahead and remove that “options” section which specifies the capped collection size. Now restore.

PROD root@myhost:/data/uncap # mongorestore -d mydb -c mycoll_tmp dump/mydb/mycoll.bson
connected to: 127.0.0.1:27018
Sun Feb  5 00:29:41.473 dump/mydb/mycoll.bson
Sun Feb  5 00:29:41.473         going into namespace [mydb.mycoll_tmp]
Sun Feb  5 00:29:44.060                 Progress: 51202216/106169934834 0%      (bytes)
Sun Feb  5 00:29:47.007                 Progress: 106497873/106169934834        0%      (bytes)
Sun Feb  5 01:57:19.065                 Progress: 106159626025/106169934834     99%     (bytes)
67569879 objects found
Sun Feb  5 01:57:19.637         Creating index: { key: { _id: 1 }, ns: "mydb.mycoll_tmp", name: "_id_" }

So it automatically creates the indices for us. (If you have a lot of indices, this takes a long time.)

Check it out now.

rs-prod:PRIMARY> db.mycoll.isCapped()
true
rs-prod:PRIMARY> db.mycoll_tmp.isCapped()
false
rs-prod:PRIMARY> db.mycoll.count()
9876543210
rs-prod:PRIMARY> db.mycoll_tmp.count()
9876543210

Perfect! Now we can just drop the old collection and rename the new collection.

db.mycoll.drop()
db.mycoll_tmp.renameCollection('mycoll')
db.mycoll.count()
9876543210

Lessons Learned:

Read the fine print on capped collections before deciding they’re perfect.
Things that use eval internally (ahem, copyTo) can be painfully slow.
Sometimes doing a dump, manually tweaking a config, and then restoring is fastest.

Posted in Tutorials.

No comments

By codyaray – March 5, 2017

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

« Restricting uploads to public PyPI Personal Capital’s Tactical Weighting Approach »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

Fastest way to un-cap a MongoDB capped collection

0 Responses

About Cody A. Ray

Recent Posts

Categories

Recent Comments

Fastest way to un-cap a MongoDB capped collection

0 Responses

Subscribe

About Cody A. Ray

Recent Posts

Categories

Recent Comments