Snail in a Turtleneck
Kristina Chodorow's Blog
Kristina Chodorow's Blog
Jan 25th
This is a rant from my role as a driver developer and person who gives support on the mailing list/IRC.
Command helpers are terrible. They confuse users, result in a steeper learning curve, and make MongoDB’s interface seem arbitrary.
The basics: what are command helpers?
Command helpers are wrappers for database commands. Database commands are everything other than CRUD (create, retrieve, update, delete) that you can do with MongoDB. This includes things like dropping a collection, doing a MapReduce, adding a member to a replica set, seeing what arguments you started mongod with, and finding out if the last write operation succeeded. They’re everywhere, if you’ve used MongoDB, you’ve run a database command (even if you weren’t aware of it).
So, what are command helpers? These are wrappers around the raw command, turning something like db.adminCommand({serverStatus:1}) into db.serverStatus(). This makes it slightly quicker to run and look “nicer” than the command. However, there are honey bunches of reasons that they’re a bad idea and should be avoided whenever possible.
Database helpers are unportable
Helpers are extremely unportable. If you know how to run db.serverStatus() in the shell, that’s great, but all you know is how to do it in the shell. If you know how to run the serverStatus command, you know how to get the server status in every language you’ll ever use.
Similarly, each language handles command options differently. Take a command like group: the shell helper chooses one order of options (a single argument “options”, incidentally) and the Python driver chooses another (“key”, “condition”, “initial”, “reduce”, “finalize”) and the PHP driver another (“key”, “initial”, “reduce”, “options”). If you just learn the group command itself, you can execute it in any language you please.
This affects almost everyone using MongoDB, as almost everyone uses at least two languages (JavaScript and something else). I have seen hundreds of questions of the form “How do I run <shell function> using my driver?” If these users knew it was a database command (and knew what a database command was), they wouldn’t have to ask.
Database helpers lock you to a certain API, often an out-of-date one
Suppose the database changes the options for a command. All of the drivers that support helpers for that command are suddenly out-of-date. Conversely, if you have a recent version of a driver and an old version of the database, you can have helpers for features that don’t exist yet or have different options.
An example of old driver/new database: MapReduce’s options changed in version 1.7.4. As far as I know, none of the drivers support the new options, yet.
You can’t support database helpers for everything
Next, there’s just the sheer volume of database commands, which makes it impossible to implement helpers for all of them. Everyone has their favorites: aggregation is important to some people, administration helpers are important to others, etc. If all of them had helpers, not only would there be a ridiculous number of methods polluting the API documentation, but it would leads to tons of compatibility problems between the driver and the database (as mentioned above).
Database helpers conceal what’s going on, giving users less options
Finally, using command helpers keeps people from understanding what’s actually going on, which is pointless and can lead to problems. It’s pointless to conceal the gory details because the details aren’t very gory: all database commands are queries. This means you can deconstruct command helpers as follows (example in PHP):
// the command helper $db->lastError(); // is the same as $db->command(array("getlasterror" => 1)); // is the same as $db->selectCollection('$cmd')->findOne(array("getlasterror" => 1)); // is the same as $db->selectCollection('$cmd')->find(array("getlasterror" => 1))->limit(1)->getNext();
Every command helper is just a find() in disguise! This means you can do (almost) anything with a database command that you could with a query.
This gives you more control. Not only can you use whatever options you want, you can do a few other things:
Finally, if you’re using an unfamiliar driver, you might not know what its helpers are called but all drivers have a find() method, so you can always use that.
Exceptions
There are a couple command helpers worth implementing. I think that count and drop (at both the database and collection levels) are common enough to be worth having helpers for. Also, at a higher level (e.g., frameworks on top of the driver and admin GUIs) I think helpers are absolutely fine. However, as someone who has been maintaining a driver and supporting users for the last few years, I think that, at a driver level, command helpers are a terrible idea.
Jan 19th

Rollin' rollin' rollin', keep that oplog rollin'
If you’re using replica sets, you can get into a situation where you have conflicting data. MongoDB will roll back conflicting data, but it never throws it out.
Let’s take an example, say you have three servers: A (arbiter), B, and C. You initialize A, B, and C:
$ mongo B:27017/foo > rs.initiate() > rs.add("C:27017") { "ok" : 1 } > rs.addArb("A:27017") { "ok" : 1 }
Now do a couple of writes to the master (say it’s B).
> B = connect("B:27017/foo") > B.bar.insert({_id : 1}) > B.bar.insert({_id : 2}) > B.bar.insert({_id : 3})
Then C gets disconnected (if you’re trying this out, you can just hit Ctrl-C—in real life, this might be caused by a network partition). B handles some more writes:
> B.bar.insert({_id : 4}) > B.bar.insert({_id : 5}) > B.bar.insert({_id : 6})
Now B gets disconnected. C gets reconnected and the arbiter elects it master, so it starts handling writes.
> C = connect("C:27017/foo") > C.bar.insert({_id : 7}) > C.bar.insert({_id : 8}) > C.bar.insert({_id : 9})
But now B gets reconnected. B has data that C doesn’t have and C has data that B doesn’t have! What to do? MongoDB chooses to roll back B’s data, since it’s “further behind” (B’s latest timestamp is before C’s latest timestamp).
If we query the databases after the millisecond or so it takes to roll back, they’ll be the same:
> C.bar.find() { "_id" : 1 } { "_id" : 2 } { "_id" : 3 } { "_id" : 7 } { "_id" : 8 } { "_id" : 9 } > B.bar.find() { "_id" : 1 } { "_id" : 2 } { "_id" : 3 } { "_id" : 7 } { "_id" : 8 } { "_id" : 9 }
Note that the data B wrote and C didn’t is gone. However, if you look in B’s data directory, you’ll see a rollback directory:
$ ls /data/db journal local.0 local.1 local.ns mongod.lock rollback foo.0 foo.1 foo.ns _tmp $ ls /data/db/rollback foo.bar.2011-01-19T18-27-14.0.bson
If you look in the rollback directory, there will be a file for each rollback MongoDB has done. You can examine what was rolled back with the bsondump utility (comes with MongoDB):
$ bsondump foo.bar.2011-01-19T18-27-14.0.bson { "_id" : 4 } { "_id" : 5 } { "_id" : 6 } Wed Jan 19 13:33:32 3 objects found
If these won’t conflict with your existing data, you can add them back to the collection with mongorestore.
$ mongorestore -d foo -c bar foo.bar.2011-01-19T18-27-14.0.bson connected to: 127.0.0.1 Wed Jan 19 13:36:27 foo.bar.2011-01-19T18-27-14.0.bson Wed Jan 19 13:36:27 going into namespace [foo.bar] Wed Jan 19 13:36:27 3 objects found
Note that you need to specify -d foo and -c bar to get it into the correct collection. If it would conflict, you could restore it into another collection and do a more delicate merge operation.
Now, if you do a find, you’ll get all of the documents:
> B.bar.find() { "_id" : 1 } { "_id" : 2 } { "_id" : 3 } { "_id" : 7 } { "_id" : 8 } { "_id" : 9 } { "_id" : 4 } { "_id" : 5 } { "_id" : 6 }
Hopefully this sort of thing can tide most people over until MongoDB supports multi-master.
Jan 4th

Choosing the right shard key for a MongoDB cluster is critical: a bad choice will make you and your application miserable. Shard Carks is a cooperative strategy game to help you choose a shard key. You can try out a few pre-baked strategies I set up online (I recommend reading this post first, though). Also, this won’t make much sense unless you understand the basics of sharding.
Mapping from MongoDB to Shard Carks
This game maps the pieces of a MongoDB cluster to the game “pieces:”
Before play begins, the dealer orders the deck to mimic the application being modeled. For this example, we’ll pretend we’re programming a news application, where users are mostly concerned with the latest few weeks of information. Since the data is “ascending” through time, it can be modeled by sorting the cards in ascending order: two through ace for spades, then two through ace of hearts, then diamonds, then clubs for the first deck. Multiple decks can be used to model longer periods of time.
Once the decks are prepared, the players decide on a shard key: the criteria used for chunk ranges. The shard key can be any deterministic strategy that an independent observer could calculate. Some examples: order dealt, suit, or deck number.
Gameplay
The game begins with Player 1 having a single chunk (chunk1). chunk1 has 0 cards and the shard key range [-∞, ∞).
Each turn, the dealer flips over a card, computes the value for the shard key, figures out which player has a chunk range containing that key, and hands the card to that player. Because the first card’s shard key value will obviously fall in the range [-∞, ∞), it will go to Player 1, who will add it to chunk1. The second and the third cards go to chunk1, too. When the fourth card goes to chunk1, the chunk is full (chunks can only hold up to four cards) so the player splits it into two chunks: one with a range of [-∞, midchunk1), the other with a range of [midchunk1, ∞), where midchunk1 is the midpoint shard key value for the cards in chunk1, such that two cards will end up in one chunk and two cards will end up in the other.
The dealer flips over the next card and computes the shard key’s value. If it’s in the [-∞, midchunk1) range, it will be added to that chunk. If it’s in the [midchunk1, ∞) range, it will be added to that chunk.
Splitting
Whenever a chunk gets four cards, the player splits the chunk into two 2-card chunks. If a chunk has the range [x, z), it can be split into two chunks with ranges [x, y), [y, z), where x < y < z.
Balancing
All of the players should have roughly the same number of chunks at all times. If, after splitting, Player A ends up with more chunks than Player B, Player A should pass one of their chunks to Player B.
The goals of the game are for no player to be overwhelmed and for the gameplay to remain easy indefinitely, even if more players and dealers are added. For this to be possible, the players have to choose a good shard key. There are a few different strategies people usually try:
Sample Strategy 1: Let George Do It
The players huddle together and come up with a plan: they’ll choose “order dealt” as the shard key.
The dealer starts dealing out cards: 2 of spades, 3 of spades, 4 of spades, etc. This works for a bit, but all of the cards are going to one player (he has the [x, ∞) chunk, and each card’s shard key value is closer to ∞ than the preceding card’s). He’s filling up chunks and passing them to his friends like mad, but all of the incoming cards are added to this single chunk. Add a few more dealers and the situation becomes completely unmaintainable.
Ascending shard keys are equivalent to this strategy: ObjectIds, dates, timestamps, auto-incrementing primary keys.
Sample Strategy 2: More Scatter, Less Gather

When George falls over dead from exhaustion, the players regroup and realize they have to come up with a different strategy. “What if we go the other direction?” suggests one player. “Let’s have the shard key be the MD5 hash of the order dealt, so it’ll be totally random.” The players agree to give it a try.
The dealer begins calculating MD5 hashes with his calculator watch. This works great at first, at least compared to the last method. The cards are dealt at a pretty much even rate to all of the players. Unfortunately, once each player has a few dozen chunks in front of them, things start to get difficult. The dealer is handing out cards at a swift pace and the players are scrambling to find the right chunk every time the dealer hands them a card. The players realize that this strategy is just going to get more unmanageable as the number of chunks grows.
Sharding keys equivalent to this strategy: MD5 hashes, UUIDs. If you shard on a random key, you lose data locality benefits.
Sample Strategy 3: Combo Plate

What the players really want is something where they can take advantage of the order (like the first strategy) and distribute load across all of the players (like the second strategy). They figure out a trick: couple a coarsely-grained order with the random element. “Let’s say everything in a given deck is ‘equal,’” one player suggests. “If all of the cards in a deck are equivalent, we’ll need a way of splitting chunks, so we’ll also use the MD5 hash as a secondary criteria.”
The dealer passes the first four cards to Player 1. Player 1 splits his chunk and passes the new chunk to Player 2. Now the cards are being evenly distributed between Player 1 and Player 2. When one of them gets a full chunk again, they split it and hand a chunk to Player 3. After a few more cards, the dealer will be evenly distributing cards among all of the players because within a given deck, the order the players are getting the cards is random. Because the cards are being split in roughly ascending order, once a deck has finished, the players can put aside those cards and know that they’ll never have to pick them up again.
This strategy manages to both distribute load evenly and take advantage of the natural order of the data.
Applying Strategy 3 to MongoDB
For many applications where the data is roughly chronological, a good shard key is:
{<coarse timestamp> : 1, <search criteria> : 1}
The coarseness of the timestamp depends on your data: you want a bunch of chunks (a chunk is 200MB) to fit into one “unit” of timestamp. So, if 1 month’s worth of writes is 30GB, 1 month is a good granularity and your shard key could start with {"month" : 1}. If one month’s worth of data is 1 GB you might want to use the year as your timestamp. If you’re getting 500GB/month, a day would work better. If you’re inserting 5000GB/sec, sub-second timestamps would qualify as “coarse.”
If you only use a coarse granularity, you’ll end up with giant unsplittable chunks. For example, say you chose the shard key {"year" : 1}. You’d get one chunk per year, because MongoDB wouldn’t be able to split chunks based on any other criteria. So you need another field to target queries and prevent chunks from getting too big. This field shouldn’t really be random, as in Strategy 3, though. It’s good to group data by the criteria you’ll be looking for it by, so a good choice might be username, log level, or email, depending on your application.
Warning: this pattern is not the only way to choose a shard key and it won’t work well for all applications. Spend some time monitoring, figuring out what your application’s write and read patterns are. Setting up a distributed system is hard and should not be taken lightly.
If you’re going to be sharding and you’re not sure what shard key to choose, try running through a few Shark Carks scenarios with coworkers. If a certain player start getting grouchy because they’re having to do twice the work or everyone is flailing around trying to find the right cards, take note and rethink your strategy. Your servers will be just as grumpy and flailing, only at 3am.
If you don’t have anyone easily persuadable around, I made a little web application for people to try out the strategies mentioned above. The source is written in PHP and available on Github, so feel free to modify. (Contribute back if you come up with something cool!)
And that’s how you choose a shard key.
Dec 26th
A dongle is a USB thingy (as you can see, I’m very qualified in this area) that lets you connect your computer to the internet wherever you go. It uses the same type of connection your cellphone data plan uses (3G or 4G).
A few months ago, Clear asked if they could send me a free sample dongle, as I am such a prestigious tech blogger. And I, being a sucker for free things (take note marketers) agreed to try out their dongle. And I have to say, it’s been pretty cool having free wifi wherever I go. The good bits:
But… there aren’t a whole lot of places I go where I don’t have free wifi already. Almost all of the coffeeshops and bookstores (and even bars) I go to already advertise free wifi. I used the dongle maybe once a week. I’ll miss it when my free trial runs out, but I won’t miss it $55-per-month-worth.
Also, I should be able to get the same sort of behavior by tethering my cellphone–if Sprint didn’t cripple their cellphones to prevent you from tethering. I actually don’t like having a phone, period, so when my contract runs out I’ll probably get a phone with just a data plan and a less douchey carrier.
So, my conclusions are: it’s super handy, but my cellphone should really be able to serve the same function. But that’s just me, and it is really cool being able to go online anywhere.
Dec 9th
This post covers a couple “toolbox” topics that are easy to brush up on before the technical interview.
I recently read a post that drove me nuts, written by someone looking for a job. They said:
I can’t seem to crack the on-site coding interviews… [Interviews are geared towards] those who can suavely implement a linked list code library (inserting, deleting, reversing) as well as a data structure using that linked list (i.e. a stack) on a white board, no syntax errors, compilable, all error paths covered, interfaces cleanly buttoned up. Lather, rinse, repeat for binary search trees and sorting algorithms.
These are a programmer’s multiplication tables! If someone asked me “what’s 6×15?” on an interview, I wouldn’t throw my hands up and complain that I learned it 20 years ago, I’d be fucking thrilled that they had given me such a softball question.
Believe me, if you can’t figure out my basic algorithm questions, you do not want me to ask my “fun” questions.
If you’re looking for a job, I’d recommend accepting that interviewers want to see you know your multiplication tables and spend a few hours cramming if you need to. Make sure you have a basic toolbox set up in your brain, covering a couple basic categories:
If you are applying for a language-specific job, the interviewer will probably ask you about some specifics. A good interviewer shouldn’t try to trap you with obscure language trivia, but make sure you’re familiar with the basics. So, if you’re applying for, say, a Java position, get comfortable with java.lang, java.util, how garbage collection works, basic synchronization, and know that Strings are immutable.
Protip: when I was looking for a job, every single place I interviewed asked me about Java’s public/protected/private keywords. Nearly all of them asked about final, too.
Don’t freak out if you get up to the board and can’t remember whether it’s foo.toString() or (String)foo, or if you forget a semicolon. Any reasonable interviewer knows that it’s hard to program on a whiteboard and doesn’t expect compiler-ready code. On the other hand, if your resume says you’ve been doing C for 10 years and you allocate an array of chars as char *x[], we expect you to laugh and understand your mistake when we point it out (I know I might do something like that out of nerves, so I wouldn’t hold it against you as long as you understood the problem).
Good luck out there. Remember that, if a company brings you in for an interview, they want to hire you. Do everything you can to let them!
Nov 30th

NYU's asbestos-filled math and CS building where I spent my undergrad
I started programming when I was 20. My original college plan was to major in mathematics and become a saxophonist (I didn’t feel like starving while I tried to make it as a musician).
Luckily, I had a crush on a computer science major so I tagged along with him to a programming team meeting. Progteam blew my mind: programming was like math, only fun! Majoring in math made me feel smart and dignified, but it was never like “Wow, this it fun.” It was more like “Ow, my brain hurts, but I guess it’s building brain muscles…”
It turned out I was good at computer science, so I decided (somewhat randomly) that I was getting into MIT for grad school, dammit. I knew they’d want to see research, so I asked a professor to mentor an independent research project. Over the next year, I did researched a classic optimization algorithm and wrote a paper on an algorithm I came up with to improve its performance for certain cases.
The problem was that, when the time came to apply to grad school, I wasn’t sure I wanted to go to at all anymore. I had liked learning about optimization and coming up with a new algorithm, but I had hated research, in and of itself. I asked my parents for advice.
“Just apply,” they said. “Keep your options open.”

The computer science building at MIT
Grad school had been my goal for a while, so I applied to a couple of PhD programs. I half hoped that they would all reject me and make the choice easier. Of course, they all accepted me, even MIT (poor me</sarcasm>). I thought about it some more and told my parents that I still didn’t think I wanted to go.
“Just try a semester,” they said. “You can always leave if you don’t like it.”
I ended up accepting Columbia, not MIT. I had really liked every professor I met at Columbia, which I figured would give me more advisor options. Unfortunately, I continued to hate research and I was thoroughly sick of school. The next three months the most miserable of my life.
“Just stick it out,” said my parents. “Until you get a master’s degree, at least.”
I finally put my foot down. Usually they have good advice but I realized that this was their thing, not mine. I dropped out of grad school and got a job I loved. My parents were happy that I was happy and got over the disappointment that I would never be Dr. Chodorow. I’m still at the same job and couldn’t be happier.
So, in the spirit of Thanksgiving, I’m really thankful that I lucked into discovering computer science. Math kind of sucks.
Nov 3rd

A demo of Firesheep, courtesy of a fellow bus rider
If you use an open wifi network, people around you can see what you’re doing. They not only can look at your accounts, but log in as you with a double click. Even if you’re non-technical (especially if you’re non-technical!) you should know how this works and how to protect your accounts. Here’s what’s happening:
When you use wireless internet you are sending information through the air from your computer to a router* somewhere. This information is like broadcasting your own little radio station: it can be picked up and seen by anyone in the area. The problem is, your radio station is broadcasting you checking email, updating your OkCupid profile, writing stupid messages to friends on Facebook… activities that you don’t want random “listeners” to know about.

To keep your radio station private, websites support encoding all of the data you send so it looks like gibberish to anyone on the outside. So, when you sign into Gmail (or Amazon or Chase) your computer turns your username and password into gibberish and sends it into the air. The website receives the message, decodes the gibberish, and says “Now that you’ve given me your credentials, I’ll assume you’re Joe Shmoe if you give me the unlikely combination of digits ’874328972387498234′ every time you make a request.” And then most sites stop encoding anything.
So, when you post a status update to your wall, you send along “874328972387498234″ as clear as day and Facebook says “Aha, it’s you. Okay, I’ll post that.”
However, remember that you’re broadcasting this on your own personal radio station. Well, someone finally built a tuner, called Firesheep. If you have Firesheep installed and you sit down in a coffeeshop (or anywhere with an open wifi network), you are logged in as everyone around you to every site the other patrons are visiting.
Important takeaways for non-geeks:
The code for Firesheep is open source and available on Github. You can try it out by starting up Firefox, downloading Firesheep, going to File->Open File and selecting the file you just downloaded. You may have to select View->Sidebar->Firesheep if it doesn’t pop up automatically.
That’s it, it’s ready to start capturing data from other people on your wifi network.
* Geeks: I know it’s not necessarily a router, but most lay people know that a router is where internet comes out and it’s close enough.
Oct 27th
Part 3 of the replication internals series: three handy tricks.
This is the third post in a three-part series on replication. See also parts 1 (replication internals) and 2 (getting to know your oplog).
MongoDB has a type of query that behaves like the tail -f command: it shows you new data as it’s written to a collection. This is great for the oplog, where you want to see new records as they pop up and don’t want to query over and over.
If you want this type of ongoing query, MongoDB returns a tailable cursor. When this cursor gets to the end of the result set it will hang around and wait for more elements to be added to the collection. As they’re added, the cursor will return them. If no elements are added for a while, the cursor will time out and the client has to requery if they want more results.
Using your knowledge of the oplog’s format, you can use a tailable cursor to do a long pull for activities in a certain collection, of a certain type, at a certain time… almost any criteria you can imagine.
Using the oplog for crash recovery
Suppose your database goes down, but you have a fairly recent backup. You could put a backup into production, but it’ll be a bit behind. You can bring it up-to-date using your oplog entries.
If you use the trigger mechanism (described above) to capture the entire oplog and send it to a non-capped collection on another server, you can then use an oplog replayer to play the oplog over your dump, bringing it as up-to-date as possible.
Pick a time pre-dump and start replaying the oplog from there. It’s okay if you’re not sure exactly when the dump was taken because the oplog is idempotent: you can apply it to your data as many times as you want and your data will end up the same.
Also, warning: I haven’t tried out the oplog replayer I linked to, it’s just the first one I found. There are a few different ones out there and they’re pretty easy to write.
Creating non-replicated collections
The local database contains data that is local to a given server: it won’t be replicated anywhere. This is one reason why it holds all of the replication info.
local isn’t reserved for replication stuff: you can put your own data there, too. If you do a write in the local database and then check the oplog, you’ll notice that there’s no record of the write. The oplog doesn’t track changes to the local database, since they won’t be replicated.
And now I’m all replicated out. If you’re interested in learning more about replication, check out the core documentation on it. There’s also core documentation on tailable cursors and language-specific instructions in the driver documentation.
Oct 15th
Don’t: contact the startup before you know what they do. I’ve recruited at a couple college job fairs and almost everyone comes up and says, “Hi, I’m a masters student in computer science and I’m looking for a job. Can I give you my resume?” Yes, you can, and I’ll put it on the pile of 200 other resumes.
Also, please don’t walk me through your resume line-by-line: it’s boring. I’ll hate you and I won’t be able to think of a polite way of cutting you off.
Do: say, “I love MongoDB! I’ve been using it with Ruby for <some project> and I would love to work on it full time! I’m really interested in replication/sharding/geospatial/etc. stuff!” Keep in mind: you’re talking to startup employees. Working is our life (which sounds depressing, but we’re doing what we love). It’s annoying to have people apply who are looking for a job, any job, and obviously don’t give a crap what we do.
Startups tend to get romanticized (and I’m about to romanticize them out the wazoo), but working at one definitely isn’t for everyone. The salary isn’t as good, the job security is going to suck, it’s tons more work and investment than a “normal” company, and in all likelihood, after pouring your heart and soul into it for years, it’ll flop.
On the other hand, working at a startup is awesome. You get to do everything: I’ve done C socket programming and jQuery and everything in between. I’m two years out of school and manage release cycles and user communities. I’ve gotten to travel everywhere from Belgium to Brazil and written a book.

It’s a great match if you like being independent: not the Rambo-”don’t tie me down, baby”-independent, the “::snerk::, I like dinosaurs so I wrote a research paper on sauropods”-independent. You have to be willing to work hard under your own steam.
Your resume
Don’t: have a boring resume.
Your resume should prove that we are fools if we don’t bring you in for an interview.
If yours doesn’t, think about what your dream job would look for on your resume. Open source development? Independent research? A penchant for robot design? Now go out and get that stuff on your resume.
Don’t use fluffy language, your resume is going to be read by programmers, not managers. “Did in-depth research to enable optimization of processes” is going to make us groan. “Made a genome-crunching aggregation script 50 times faster by researching how Java memory allocation works” is going to make us go “cool!” Have you done other optimization research? Do you like benchmarking? Do you know a lot about Java internals? Heck, tell us about the human genome.
Your interview is going to be a lot more fun for everyone involved (and much more likely to actually occur) if you make us think, “this person sounds really interesting, I want to talk to them.”
When I was in college I had no idea what I wanted to do, other than a vague idea of “solving interesting problems.” So, you don’t exactly have to be dedicated to the cause to get a job at a startup. Just express some enthusiasm for what they do, write a kick-ass resume, and the rest is up to your technical ability.
Oh, and by the way: if you’re looking for an awesome job, 10gen is recruiting!
Oct 14th
Our application could do billions of writes and the oplog has to record them all, but we don’t want our entire disk consumed by the oplog. To prevent this, MongoDB makes the oplog a fixed-size, or capped, collection (the oplog is actually the reason capped collections were invented).
When you start up the database for the first time, you’ll see a line that looks like:
Mon Oct 11 14:25:21 [initandlisten] creating replication oplog of size: 47MB... (use --oplogSize to change)
Your oplog is automatically allocated to be a fraction of your disk space. As the message suggests, you may want to customize it as you get to know your application.
Protip: you should make sure you start up arbiter processes with
--oplogSize 1, so that the arbiter doesn’t preallocate an oplog. There’s no harm in it doing so, but it’s a waste of space as the arbiter will never use it.
Implications of using a capped collection
The oplog is a fixed size so it will eventually fill up. At this point, it’ll start overwriting the oldest entries, like a circular queue.
It’s usually fine to overwrite the oldest operations because the slaves have already copied and applied them. Once everyone has an operation there’s no need to keep it around. However, sometimes a slave will fall very far behind and “fall off” the end of the oplog: the latest operation it knows about is before the earliest operation in the master’s oplog.
oplog time ->
<..............................>
^ ^ ^ ^
| | | |
slave master
If this occurs, the slave will start giving error messages about needing to be resynced. It can’t catch up to the master from the oplog anymore: it might miss operations between the last oplog entry it has and the master’s oldest oplog entry. It needs a full resync at this point.
Resyncing
On a resync or an initial sync, the slave will make a note of the master’s current oplog time and call the copyDatabase command on all of the master’s databases. Once all of the master’s databases have been copied over, the slave makes a note of the time. Then it applies all of the oplog operations from the time the copy started up until the end of the copy.
Once it has completed the copy and run through the operations that happened during the copy, it is considered resynced. It can now begin replicating normally again. If so many writes occur during the resync that the slave’s oplog cannot hold them all, you’ll end up in the “need to resync” state again. If this occurs, you need to allocate a larger oplog and try again (or try it at a time when the system is has less traffic).
Next up: using the oplog in your application.