Kristina Chodorow's Blog
Linux
––thursday #5: diagnosing high readahead
May 10th
Having readahead set too high can slow your database to a crawl. This post discusses why that is and how you can diagnose it.
The #1 sign that readahead is too high is that MongoDB isn’t using as much RAM as it should be. If you’re running Mongo Monitoring Service (MMS), take a look at the “resident” size on the “memory” chart. Resident memory can be thought of as “the amount of space MongoDB ‘owns’ in RAM.” Therefore, if MongoDB is the only thing running on a machine, we want resident size to be as high as possible. On the chart below, resident is ~3GB:

Is 3GB good or bad? Well, it depends on the machine. If the machine only has 3.5GB of RAM, I’d be pretty happy with 3GB resident. However, if the machine has, say, 15GB of RAM, then we’d like at least 15GB of the data to be in there (the “mapped” field is (sort of) data size, so I’m assuming we have 60GB of data).
Assuming we’re accessing a lot of this data, we’d expect MongoDB’s resident set size to be 15GB, but it’s only 3GB. If we try turning down readahead and the resident size jumps to 15GB and our app starts going faster. But why is this?
Let’s take an example: suppose all of our docs are 512 bytes in size (readahead is set in 512-byte increments, called sectors, so 1 doc = 1 sector makes the math easier). If we have 60GB of data then we have ~120 million documents (60GB of data/(512 bytes/doc)). The 15GB of RAM on this machine should be able to hold ~30 million documents.
Our application accesses documents randomly across our data set, so we’d expect MongoDB to eventually “own” (have resident) all 15GB of RAM, as 1) it’s the only thing running and 2) it’ll eventually fetch at least 15GB of the data.
Now, let’s set our readahead to 100 (100 512-byte sectors, aka 100 documents): blockdev --set-ra 100. What happens when we run our application?
Picture our disk as looking like this, where each o is a document:
... ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ... // keep going for millions more o's
Let’s say our app requests a document. We’ll mark it with “x” to show that the OS has pulled it into memory:
... ooooooooooooooooooooooooo ooooxoooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ...
See it on the third line there? But that’s not the only doc that’s pulled into memory: readahead is set to 100 so the next 99 documents are pulled into memory, too:
... ooooooooooooooooooooooooo ooooxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xxxxooooooooooooooooooooo ooooooooooooooooooooooooo ooooooooooooooooooooooooo ...

Is your OS returning this with every document?
Now we have 100 docs in memory, but remember that our application is accessing documents randomly: the likelihood of the next document we access is in that block of 100 docs is almost nil. At this point, there’s 50KB of data in RAM (512 bytes * 100 docs = 51,200 bytes) and MongoDB’s resident size has only increase by 512 bytes (1 doc).
Our app will keep bouncing around the disk, reading docs from here and there and filing up memory with docs MongoDB never asked for until RAM is completely full of junk that’s never been used. Then, it’ll start evicting things to make room for new junk as our app continues to make requests.
Working this out, there’s a 25% chance of our app requesting a doc that’s already in memory, so 75% of the requests are going to go to disk. Say we’re doing 2 requests a sec. Then 1 hour of requests is 2 requests * 3600 seconds/hour = 7200 requests, 4800 of which are going to disk (.75 * 7200). If each request pulls back 50KB, that’s 240MB read from disk/hour. If we set readahead to 0, we’ll have 2MB read from disk/hour.
Which brings us to the next symptom of a too-high readahead: unexpectedly high disk IO. Because most of the data we want isn’t in memory, we keep having to go to disk, dragging shopping-carts full of junk into RAM, perpetuating the high disk io/low resident mem cycle.
The general takeaway is that a DB is not a “normal” workload for an OS. The default settings may screw you over.
––thursday #4: blockdev
Apr 5th
Disk IO is slow. You just won’t believe how vastly, hugely, mind-bogglingly slow it is. I mean, you may think your network is slow, but that’s just peanuts to disk IO.
The image below helps visualize how slow (post continues below).
(Originally found on Hacker News and inspired by Gustavo Duarte’s blog.)
The kernel knows how slow the disk is and tries to be smart about accessing it. It not only reads the data you requested, it also returns a bit more. This way, if you’re reading through a file or watching a movie (sequential access), your system doesn’t have to go to disk as frequently because you’re pulling more data back than you strictly requested each time.
You can see how far the kernel reads ahead using the blockdev tool:
$ sudo blockdev --report RO RA SSZ BSZ StartSec Size Device rw 256 512 4096 0 80026361856 /dev/sda rw 256 512 4096 2048 80025223168 /dev/sda1 rw 256 512 4096 0 2000398934016 /dev/sdb rw 256 512 1024 2048 98566144 /dev/sdb1 rw 256 512 4096 194560 7999586304 /dev/sdb2 rw 256 512 4096 15818752 19999490048 /dev/sdb3 rw 256 512 4096 54880256 1972300152832 /dev/sdb4
Readahead is listed in the “RA” column. As you can see, I have two disks (sda and sdb) with readahead set to 256 on each. But what unit is that 256? Bytes? Kilobytes? Dolphins? If we look at the man page for blockdev, it says:
$ man blockdev ... --setra N Set readahead to N 512-byte sectors. ...
This means that my readahead is 512 bytes*256=131072 or 128KB. That means that, whenever I read from disk, the disk is actually reading at least 128KB of data, even if I only requested a few bytes.
So what value should you set your readahead to? Please don’t set it to a number you find online without understanding the consequences. If you Google for “blockdev setra”, the first result uses blockdev –setra 65536, which translates to 32MB of readahead. That means that, whenever you read from disk, the disk is actually doing 32MB worth of work. Please do not set your readahead this high if you’re doing a lot of random-access reads and writes, as all of the extra IO can slow things down a lot (and if your low on memory, you’ll be forcing the kernel to fill up your RAM with data you won’t need).
Getting a good readahead value can help disk IO issues to some extent, but if you are using MongoDB (in particular), please consider your typical document size and access patterns before changing your blockdev settings. I’m not recommending any particular value because what’s perfect for one application/machine can be death for another.
I’m really enjoying these –thursday posts because every week people have commented with different/better/interesting ways of doing what I talked about (or ways of telling the difference between stalagmites and stalactites), which is really cool. So I’m throwing this out there: how would you figure out what a good readahead setting is? Next week I’m planning to do iostat for –thursday which should cover this a bit, but please leave a comment if you have any ideas.
––thursday #2: diff ‘n patch
Mar 15th
I’m trying something new: every Thursday I’ll do a short post on how to do something with the command line.
I always seem to either create or apply patches in the wrong direction. It’s like stalagmites vs. stalactites, which I struggled with until I heard the nemonic: “Stalagmites might hang from the ceiling… but they don’t.”
Moving right along, you can use diff to get line-by-line changes between any two files. Generally I use git diff because I’m dealing with a git repo, so that’s what I’ll use here.
Let’s get a diff of MongoDB between version 2.0.2 and 2.0.3.
$ git clone git://github.com/mongodb/mongo.git $ cd mongo $ git diff r2.0.2..r2.0.3 > mongo.patch
This takes all of the changes between 2.0.2 and 2.0.3 (r2.0.2..r2.0.3) and dumps them into a file called mongo.patch (that’s the > mongo.patch part).
Now, let’s get the code from 2.0.2 and apply mongo.patch, effectively making it 2.0.3 (this is kind of a silly example but if you’re still with me after the stalagmite thing, I assume you don’t mind silly examples):
$ git checkout r2.0.2 Note: checking out 'r2.0.2'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at 514b122... BUMP 2.0.2 $ $ patch -p1 < mongo.patch
What intuitive syntax!
What does the -p1 mean? How many forward slashes to remove from the path in the patch, of course.
To take an example, if you look at the last 11 lines of the patch, you can see that it is the diff for the file that changes the version number. It looks like this:
$ tail -n 11 mongo.patch
--- a/util/version.cpp
+++ b/util/version.cpp
@@ -38,7 +38,7 @@ namespace mongo {
* 1.2.3-rc4-pre-
* If you really need to do something else you'll need to fix _versionArray()
*/
- const char versionString[] = "2.0.2";
+ const char versionString[] = "2.0.3";
// See unit test for example outputs
static BSONArray _versionArray(const char* version){Note the a/util/version.cpp and b/util/version.cpp. These indicate the file the patch should be applied to, but there are no a or b directories in the MongoDB repository. The a and b prefixes indicate that one is the previous version and one is the new version. And -p says how many slashes to strip from this path. An example may make this clearer:
- -p0 (equivalent to not specifying -p): “apply this patch to a/util/version.cpp” (which doesn’t exist)
- -p1: “apply this patch to util/version.cpp” ← bingo, that’s what we want
- -p2: “apply this patch to version.cpp” (which doesn’t exist)
So, we use -p1, because that makes the patch’s paths match the actually directory structure. If someone sent you a patch and the path is something like /home/bob/bobsStuff/foo.txt and your name is not Bob, you’re just trying to patch foo.txt, you’d probably want to use -p4.
On the plus side, if you’re using patches generated by git, they’re super-easy to apply. Git chose the intuitive verb “apply” to patch a file. If you have a patch generated by git diff, you can patch your current tree with:
$ git apply mongo.patchSo, aside from the stupid choice of verbiage, that is generally easier.
Did I miss anything? Get anything wrong? Got a suggestion for next week? Leave a comment below and let me know!
––thursday #1: screen
Mar 8th
I’m trying something new: every Thursday I’ll go over how to do something with the command line. Let me know what you think.
If you are using a modern-ish browser, you probably use tabs to keep multiple things open at once: your email, your calendar, whatever you’re actually doing, etc. You can do the same thing with the shell using screen: in a single terminal, you can compile a program while you’re editing a file and watching another process out of the corner of your eye.
Note that screen is super handy when SSH’d into a box. SSH in once, then start screen and open up all of the windows you need.
Using screen
To start up screen, run:
$ screenNow your shell will clear and screen will give you a welcome message.
Screen version 4.00.03jw4 (FAU) 2-May-06
Copyright (c) 1993-2002 Juergen Weigert, Michael Schroeder
Copyright (c) 1987 Oliver Laumann
...
[Press Space or Return to end.]As it says at the bottom, just hit Return to clear the welcome message. Now you’ll see an empty prompt and you can start working normally.
Let’s say we have three things we want to do:
- Run top
- Edit a file
- Tail a log
Go ahead and start up top:
$ top
Well, now we need to edit a file but top‘s using the shell. What to do now? Just create a new window. While top is still running, hit ^A c (I’m using ^A as shorthand for Control-a, so this means “hit Control-a, then hit c”) to create a new window. The new window gets put right on top of the old one, so you’ll see a fresh shell and be at the prompt again. But where did top go? Not to worry, it’s still there. We can switch back to it with ^A n or ^A p (next or previous window).
Now we can start up our editor and begin editing a file. But now we want to tail a file, so we create another new window with ^A c and run our tail -f filename. We can continue to use ^A n and ^A p to switch between the three things we’re doing (and open more windows as necessary).
Availability
screen seems pretty ubiquitous, it has been on every Linux machine I’ve ever tried running it on and even OS X (although it may be part of XCode, I haven’t checked).
Note for Emacs Users
^A is an annoying escape key, as it is also go-to-beginning-of-line shortcut in Emacs (and the shell). To fix this, create a .screenrc file and add one line to change this to something else:
# use ^T escape ^Tt # or ^Y escape ^Yy
The escape sequence is 3 characters: carat, T, and t. (It is not using the single special character “^T”.) The traditional escape key is actually Ctrl-^, as the carat is the one character Emacs doesn’t use for anything. In a .screenrc file, this results in the rather bizarre string:
escape ^^^^
…which makes sense when you think about it, but looks a bit weird.
Odds and Ends
As long as you’re poking at the .screenrc file, you might want to turn off the welcome message, too:
startup_message off
Run ^A ? anytime for help, or check out the manual’s list of default bindings.
Did I miss anything? Get anything wrong? Got a suggestion for next week? Leave a comment below and let me know!
Playing with Virtual Memory
Aug 30th

Linux: the developer's personal gentleman
When you run a process, it needs some memory to store things: its heap, its stack, and any libraries it’s using. Linux provides and cleans up memory for your process like an extremely conscientious butler. You can (and generally should) just let Linux do its thing, but it’s a good idea to understand the basics of what’s going on.
One easy way (I think) to understand this stuff is to actually look at what’s going on using the pmap command. pmap shows you memory information for a given process.
For example, let’s take a really simple C program that prints its own process id (PID) and pauses:
#include <stdio.h> #include <unistd.h> #include <sys/types.h> int main() { printf("run `pmap %d`\n", getpid()); pause(); }
Save this as mem_munch.c. Now compile and run it with:
$ gcc mem_munch.c -o mem_munch $ ./mem_munch run `pmap 25681`
The PID you get will probably be different than mine (25681).
At this point, the program will “hang.” This is because of the pause() function, and it’s exactly what we want. Now we can look at the memory for this process at our leisure.
Open up a new shell and run pmap, replacing the PID below with the one mem_munch gave you:
$ pmap 25681 25681: ./mem_munch 0000000000400000 4K r-x-- /home/user/mem_munch 0000000000600000 4K r---- /home/user/mem_munch 0000000000601000 4K rw--- /home/user/mem_munch 00007fcf5af88000 1576K r-x-- /lib/x86_64-linux-gnu/libc-2.13.so 00007fcf5b112000 2044K ----- /lib/x86_64-linux-gnu/libc-2.13.so 00007fcf5b311000 16K r---- /lib/x86_64-linux-gnu/libc-2.13.so 00007fcf5b315000 4K rw--- /lib/x86_64-linux-gnu/libc-2.13.so 00007fcf5b316000 24K rw--- [ anon ] 00007fcf5b31c000 132K r-x-- /lib/x86_64-linux-gnu/ld-2.13.so 00007fcf5b512000 12K rw--- [ anon ] 00007fcf5b539000 12K rw--- [ anon ] 00007fcf5b53c000 4K r---- /lib/x86_64-linux-gnu/ld-2.13.so 00007fcf5b53d000 8K rw--- /lib/x86_64-linux-gnu/ld-2.13.so 00007fff7efd8000 132K rw--- [ stack ] 00007fff7efff000 4K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 3984K
This output is how memory “looks” to the mem_munch process. If mem_munch asks the operating system for 00007fcf5af88000, it will get libc. If it asks for 00007fcf5b31c000, it will get the ld library.
This output is a bit dense and abstract, so let’s look at how some more familiar memory usage shows up. Change our program to put some memory on the stack and some on the heap, then pause.
#include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <stdlib.h> int main() { int on_stack, *on_heap; // local variables are stored on the stack on_stack = 42; printf("stack address: %p\n", &on_stack); // malloc allocates heap memory on_heap = (int*)malloc(sizeof(int)); printf("heap address: %p\n", on_heap); printf("run `pmap %d`\n", getpid()); pause(); }
Now compile and run it:
$ ./mem_munch stack address: 0x7fff497670bc heap address: 0x1b84010 run `pmap 11972`
Again, your exact numbers will probably be different than mine.
Before you kill mem_munch, run pmap on it:
$ pmap 11972 11972: ./mem_munch 0000000000400000 4K r-x-- /home/user/mem_munch 0000000000600000 4K r---- /home/user/mem_munch 0000000000601000 4K rw--- /home/user/mem_munch 0000000001b84000 132K rw--- [ anon ]00007f3ec4d98000 1576K r-x-- /lib/x86_64-linux-gnu/libc-2.13.so 00007f3ec4f22000 2044K ----- /lib/x86_64-linux-gnu/libc-2.13.so 00007f3ec5121000 16K r---- /lib/x86_64-linux-gnu/libc-2.13.so 00007f3ec5125000 4K rw--- /lib/x86_64-linux-gnu/libc-2.13.so 00007f3ec5126000 24K rw--- [ anon ] 00007f3ec512c000 132K r-x-- /lib/x86_64-linux-gnu/ld-2.13.so 00007f3ec5322000 12K rw--- [ anon ] 00007f3ec5349000 12K rw--- [ anon ] 00007f3ec534c000 4K r---- /lib/x86_64-linux-gnu/ld-2.13.so 00007f3ec534d000 8K rw--- /lib/x86_64-linux-gnu/ld-2.13.so 00007fff49747000 132K rw--- [ stack ] 00007fff497bb000 4K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 4116K
Note that there’s a new entry between the final mem_munch section and libc-2.13.so. What could that be?
# from pmap
0000000001b84000 132K rw--- [ anon ]
# from our program
heap address: 0x1b84010
The addresses are almost the same. That block ([ anon ]) is the heap. (pmap labels blocks of memory that aren’t backed by a file [ anon ]. We’ll get into what being “backed by a file” means in a sec.)
The second thing to notice:
# from pmap
00007fff49747000 132K rw--- [ stack ]
# from our program
stack address: 0x7fff497670bc
And there’s your stack!
One other important thing to notice: this is how memory “looks” to your program, not how memory is actually laid out on your physical hardware. Look at how much memory mem_munch has to work with. According to pmap, mem_munch can address memory between address 0x0000000000400000 and 0xffffffffff600000 (well, actually 0x00007fffffffffffffff, beyond that is special). For those of you playing along at home, that’s almost 10 million terabytes of memory. That’s a lot of memory. (If your computer has that kind of memory, please leave your address and times you won’t be at home.)
So, the amount of memory the program can address is kind of ridiculous. Why does the computer do this? Well, lots of reasons, but one important one is that this means you can address more memory than you actually have on the machine and let the operating system take care of making sure the right stuff is in memory when you try to access it.
Memory Mapped Files
Memory mapping a file basically tells the operating system to load the file so the program can access it as an array of bytes. Then you can treat a file like an in-memory array.
For example, let’s make a (pretty stupid) random number generator ever by creating a file full of random numbers, then mmap-ing it and reading off random numbers.
First, we’ll create a big file called random (note that this creates a 1GB file, so make sure you have the disk space and be patient, it’ll take a little while to write):
$ dd if=/dev/urandom bs=1024 count=1000000 of=/home/user/random 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 123.293 s, 8.3 MB/s $ ls -lh random -rw-r--r-- 1 user user 977M 2011-08-29 16:46 random
Now we’ll mmap random and use it to generate random numbers.
#include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <stdlib.h> #include <sys/mman.h> int main() { char *random_bytes; FILE *f; int offset = 0; // open "random" for reading f = fopen("/home/user/random", "r"); if (!f) { perror("couldn't open file"); return -1; } // we want to inspect memory before mapping the file printf("run `pmap %d`, then press <enter>", getpid()); getchar(); random_bytes = mmap(0, 1000000000, PROT_READ, MAP_SHARED, fileno(f), 0); if (random_bytes == MAP_FAILED) { perror("error mapping the file"); return -1; } while (1) { printf("random number: %d (press <enter> for next number)", *(int*)(random_bytes+offset)); getchar(); offset += 4; } }
If we run this program, we’ll get something like:
$ ./mem_munch run `pmap 12727`, then press <enter>
The program hasn’t done anything yet, so the output of running pmap will basically be the same as it was above (I’ll omit it for brevity). However, if we continue running mem_munch by pressing enter, our program will mmap random.
Now if we run pmap it will look something like:
$ pmap 12727 12727: ./mem_munch 0000000000400000 4K r-x-- /home/user/mem_munch 0000000000600000 4K r---- /home/user/mem_munch 0000000000601000 4K rw--- /home/user/mem_munch 000000000147d000 132K rw--- [ anon ] 00007fe261c6f000 976564K r--s- /home/user/random00007fe29d61c000 1576K r-x-- /lib/x86_64-linux-gnu/libc-2.13.so 00007fe29d7a6000 2044K ----- /lib/x86_64-linux-gnu/libc-2.13.so 00007fe29d9a5000 16K r---- /lib/x86_64-linux-gnu/libc-2.13.so 00007fe29d9a9000 4K rw--- /lib/x86_64-linux-gnu/libc-2.13.so 00007fe29d9aa000 24K rw--- [ anon ] 00007fe29d9b0000 132K r-x-- /lib/x86_64-linux-gnu/ld-2.13.so 00007fe29dba6000 12K rw--- [ anon ] 00007fe29dbcc000 16K rw--- [ anon ] 00007fe29dbd0000 4K r---- /lib/x86_64-linux-gnu/ld-2.13.so 00007fe29dbd1000 8K rw--- /lib/x86_64-linux-gnu/ld-2.13.so 00007ffff29b2000 132K rw--- [ stack ] 00007ffff29de000 4K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 980684K
This is very similar to before, but with an extra line (bolded), which kicks up virtual memory usage a bit (from 4MB to 980MB).
However, let’s re-run pmap with the -x option. This shows the resident set size (RSS): only 4KB of random are resident. Resident memory is memory that’s actually in RAM. There’s very little of random in RAM because we’ve only accessed the very start of the file, so the OS has only pulled the first bit of the file from disk into memory.
pmap -x 12727 12727: ./mem_munch Address Kbytes RSS Dirty Mode Mapping 0000000000400000 0 4 0 r-x-- mem_munch 0000000000600000 0 4 4 r---- mem_munch 0000000000601000 0 4 4 rw--- mem_munch 000000000147d000 0 4 4 rw--- [ anon ] 00007fe261c6f000 0 4 0 r--s- random 00007fe29d61c000 0 288 0 r-x-- libc-2.13.so 00007fe29d7a6000 0 0 0 ----- libc-2.13.so 00007fe29d9a5000 0 16 16 r---- libc-2.13.so 00007fe29d9a9000 0 4 4 rw--- libc-2.13.so 00007fe29d9aa000 0 16 16 rw--- [ anon ] 00007fe29d9b0000 0 108 0 r-x-- ld-2.13.so 00007fe29dba6000 0 12 12 rw--- [ anon ] 00007fe29dbcc000 0 16 16 rw--- [ anon ] 00007fe29dbd0000 0 4 4 r---- ld-2.13.so 00007fe29dbd1000 0 8 8 rw--- ld-2.13.so 00007ffff29b2000 0 12 12 rw--- [ stack ] 00007ffff29de000 0 4 0 r-x-- [ anon ] ffffffffff600000 0 0 0 r-x-- [ anon ] ---------------- ------ ------ ------ total kB 980684 508 100
If the virtual memory size (the Kbytes column) is all 0s for you, don’t worry about it. That’s a bug in Debian/Ubuntu’s -x option. The total is correct, it just doesn’t display correctly in the breakdown.
You can see that the resident set size, the amount that’s actually in memory, is tiny compared to the virtual memory. Your program can access any memory within a billion bytes of 0x00007fe261c6f000, but if it accesses anything past 4KB, it’ll probably have to go to disk for it*.
What if we modify our program so it reads the whole file/array of bytes?
#include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <stdlib.h> #include <sys/mman.h> int main() { char *random_bytes; FILE *f; int offset = 0; // open "random" for reading f = fopen("/home/user/random", "r"); if (!f) { perror("couldn't open file"); return -1; } random_bytes = mmap(0, 1000000000, PROT_READ, MAP_SHARED, fileno(f), 0); if (random_bytes == MAP_FAILED) { printf("error mapping the file\n"); return -1; } for (offset = 0; offset < 1000000000; offset += 4) { int i = *(int*)(random_bytes+offset); // to show we're making progress if (offset % 1000000 == 0) { printf("."); } } // at the end, wait for signal so we can check mem printf("\ndone, run `pmap -x %d`\n", getpid()); pause(); }
Now the resident set size is almost the same as the virtual memory size:
$ pmap -x 5378 5378: ./mem_munch Address Kbytes RSS Dirty Mode Mapping 0000000000400000 0 4 4 r-x-- mem_munch 0000000000600000 0 4 4 r---- mem_munch 0000000000601000 0 4 4 rw--- mem_munch 0000000002271000 0 4 4 rw--- [ anon ] 00007fc2aa333000 0 976564 0 r--s- random 00007fc2e5ce0000 0 292 0 r-x-- libc-2.13.so 00007fc2e5e6a000 0 0 0 ----- libc-2.13.so 00007fc2e6069000 0 16 16 r---- libc-2.13.so 00007fc2e606d000 0 4 4 rw--- libc-2.13.so 00007fc2e606e000 0 16 16 rw--- [ anon ] 00007fc2e6074000 0 108 0 r-x-- ld-2.13.so 00007fc2e626a000 0 12 12 rw--- [ anon ] 00007fc2e6290000 0 16 16 rw--- [ anon ] 00007fc2e6294000 0 4 4 r---- ld-2.13.so 00007fc2e6295000 0 8 8 rw--- ld-2.13.so 00007fff037e6000 0 12 12 rw--- [ stack ] 00007fff039c9000 0 4 0 r-x-- [ anon ] ffffffffff600000 0 0 0 r-x-- [ anon ] ---------------- ------ ------ ------ total kB 980684 977072 104
Now if we access any part of the file, it will be in RAM already. (Probably. Until something else kicks it out.) So, our program can access a gigabyte of memory, but the operating system can lazily load it into RAM as needed.
And that’s why your virtual memory is so damn high when you’re running MongoDB.
Left as an exercise to the reader: try running pmap on a mongod process before it’s done anything, once you’ve done a couple operations, and once it’s been running for a long time.
* This isn’t strictly true**. The kernel actually says, “If they want the first N bytes, they’re probably going to want some more of the file” so it’ll load, say, the first dozen KB of the file into memory but only tell the process about 4KB. When your program tries to access this memory that is in RAM, but it didn’t know was in RAM, it’s called a minor page fault (as opposed to a major page fault when it actually has to hit disk to load new info). back to context
** This note is also not strictly true. In fact, the whole file will probably be in memory before you map anything because you just wrote the thing with dd. So you’ll just be doing minor page faults as your program “discovers” it.
Installing Linux on a MacBook Air
Nov 7th
It’s not a clean victory, but I got Linux onto my MacBook Air.
When I first got my Air, I launched the Ubuntu install disk and followed the instructions on the Ubuntu wiki. Unfortunately, these instructions are apparently for the MacBook Air 1,1, and I had a MacBook Air 2,1. The Linux kernel froze in the middle of initializing.
After a couple, ahem, weeks of playing around with kernel parameters, I got it to a point where I realized it was Ubuntu, not Linux, that was screwing up, so I decided to try some other distro. I got a Debian network install CD (the full install is 31 CDs!) and tried it. It booted into the installer fine, and started merrily installing the system. I suddenly realized I had a doctor’s appointment, and had a terrible premonition that, by the time I got back, something would have gone wrong.
My premonition was correct. When I returned, the CD had stopped working. I checked it for errors, and it was fine. However, every time I started the computer now, the CD driver would make an ominous clicking noise and pop open. If I held it closed, it would make a downright alarming snapping noise. And reFit couldn’t even recognize it.
So, I installed VMWare Fusion on the Mac partition, and installed Linux on that. I’m trying to look on the bright side: I get OS X power management, wireless, and sound with a Linux environment.

Subscribe to all posts