A couple of weeks ago I converted a mongodb collection to a capped collection. This is for an event archive which we only need to keep the last month of events stored locally. The issue is that this data grows unbounded for a long time until we manually free up disk space. Enter capped collections, which create a fixed-size collection by automatically removing the oldest documents. Unfortunately, I didn’t realize that our application would also update existing documents. Updates which cause a document to grow will fail. Bummer.
Now we need to rollback to uncapped collections and find another way to manage the database size (ahem, cron job). The recommended approach using “copyTo(…)” is deprecated in newer versions and agonizingly slow for a large 100GB+ data set in older versions. (This database was still running 2.4. I know.)
For a basic benchmark, using “copyTo(…)” took about 10 hours to copy ~80% of a 100GB capped collection. But the catch is that there’s no progress indicator at all. The database is locked during the copy so you can’t even look at the size of the new collection, and there’s nothing useful printed in the logs. I only know it completed that much because I halted the copy and then looked. Probably should’ve let it finish, but I didn’t know if it was making any progress, much less that close.
Following the advice of a kind stranger in IRC (#mongodb), I decided to try mongodump and mongorestore.
The dump was fast and showed progress the whole time. (Took 22 minutes total to dump.)
PROD root@myhost:/data/uncap # mongodump -d mydb -c mycoll connected to: 127.0.0.1:27017 Sun Feb 5 00:02:28.879 DATABASE: mydb to dump/mydb Sun Feb 5 00:02:28.880 mydb.mycoll to dump/mydb/mycoll.bson Sun Feb 5 00:02:31.004 Collection File Writing Progress: 868400/67569879 1% (objects) ... Sun Feb 5 00:24:13.004 Collection File Writing Progress: 67480900/67569879 99% (objects) Sun Feb 5 00:24:14.203 67569879 objects Sun Feb 5 00:24:14.203 Metadata for mydb.mycoll to dump/mydb/mycoll.metadata.json
But then the restore created a new capped collection. Dratz!
Fortunately, the dump includes a metadata file in JSON format.
cat dump/mydb/mycoll.metadata.json { "options": { "capped": true, "size": 107374182400 }, "indexes": [ { "v": 1, "key": { "_id": 1 }, "ns": "mydb.mycoll", "name": "_id_" } ] }
Go ahead and remove that “options” section which specifies the capped collection size. Now restore.
PROD root@myhost:/data/uncap # mongorestore -d mydb -c mycoll_tmp dump/mydb/mycoll.bson connected to: 127.0.0.1:27018 Sun Feb 5 00:29:41.473 dump/mydb/mycoll.bson Sun Feb 5 00:29:41.473 going into namespace [mydb.mycoll_tmp] Sun Feb 5 00:29:44.060 Progress: 51202216/106169934834 0% (bytes) Sun Feb 5 00:29:47.007 Progress: 106497873/106169934834 0% (bytes) Sun Feb 5 01:57:19.065 Progress: 106159626025/106169934834 99% (bytes) 67569879 objects found Sun Feb 5 01:57:19.637 Creating index: { key: { _id: 1 }, ns: "mydb.mycoll_tmp", name: "_id_" }
So it automatically creates the indices for us. (If you have a lot of indices, this takes a long time.)
Check it out now.
rs-prod:PRIMARY> db.mycoll.isCapped() true rs-prod:PRIMARY> db.mycoll_tmp.isCapped() false rs-prod:PRIMARY> db.mycoll.count() 9876543210 rs-prod:PRIMARY> db.mycoll_tmp.count() 9876543210
Perfect! Now we can just drop the old collection and rename the new collection.
db.mycoll.drop() db.mycoll_tmp.renameCollection('mycoll') db.mycoll.count() 9876543210
Lessons Learned:
- Read the fine print on capped collections before deciding they’re perfect.
- Things that use eval internally (ahem, copyTo) can be painfully slow.
- Sometimes doing a dump, manually tweaking a config, and then restoring is fastest.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.