Wednesday, November 11, 2009

NFS Load Tweaks: a Brief Guide for the Interested Enthusiast

I was asked about the mystery of NFS server tweaking in a dteam meeting, so I thought I'd compile this brief blog post.
As with all actions, there are two steps: first, gather your information, second, act on this information.

1) Determining your current NFS load statistics.

NFS logs useful information in its /proc entry...


> cat /proc/net/rpc/nfsd

rc 0 28905480 1603148913
fh 133 0 0 0 0
io 3663786355 2268252
th 63 362541 16645.121 3156.556 747.974 280.920 148.129 100.155 61.480
42.249 40.829 90.461
ra 256 1069115586 4089582 3055815 2625032 2228952 2114496 1983622
1765372 1743563 1610465 89609536
net 1634942152 0 1634971040 2214677
rpc 1630024431 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 1573543 1535237104 8743056 1545350887 1532645717 29571823
1179900114 9214599 6691508 538717 366274 0 2801854 39816 505310 4298
2486034 62181794 53164 2414727 0 986878
proc4 2 0 0

This somewhat arcane looking output is full of variously useful
statistics about your nfs daemon.

The "rc" (read cache) field gives the fraction of cache hits, misses
and "nocache" (interactions which bypassed the cache) for read

The "fh" (file handle) field's most important entry is the first - the
number of stale file handles in the system. If you have flaky NFS, for
example, this will be non-zero.

The io field is simple cumulative io (read, and then written) in bytes.

The "th" (threads) field is the most interesting field for NFS load
optimisation. The first entry is the total number of threads currently
executing. The second is the number of seconds (?) all threads were in use
(which means your NFS was maxed out in active connections). The
remaining 10 entries are a histogram of NFS thread utilisation, in
seconds (it seems to be hard to get NFS to reset this; restarting the
daemon definitely doesn't). Plotting this gives you an idea of how
much time your NFS server spends in various load states.
Ideally, you want the last entry (90-100% use) to be comfortably in
the tail of your distribution...
If you have indications that your server spends a lot of its time with
all threads in use, you should increase the maximum number of threads
- powers of 2 are recommended.

The "ra" (read-ahead cache) field gives similar results, but for the
read-ahead cache. The first number is the size of the cache, the next
10 are a histogram showing how far into the cache entries were found
(so, the first number is the number of times an entry was read from
the first 10% of the cache), and the last is for cache misses.
Obviously, if you're getting a lot of cache misses *and* your cache
hits histogram is heavily right-skewed, it's worth increasing the
cache size. (Conversely, if you have a heavily left-skewed histogram,
and few cache misses, you may be able to manage with a smaller cache.)

The remaining fields are rpc process info fields, which are less
relevant to us for our purposes.

2. Optimising your NFS.

The most important things to ensure are that there are enough
resources for the peak load on your NFS service. NFS will spawn new
threads to handle new active connections, and if its max-threads limit
is too low, you'll get brown-outs under high load.
Starting at least four instances of nfsd per processor (and, on modern
processors, up to 8 per core) is recommended as a sensible
configuration. You can set this on the command line for the nfsd
service by simply using the bare number as an option.

And, of course, if you can bear the risk of data-loss (or silent data
corruption!) on sudden server loss, setting the export option "async"
trivially increases your network throughput by removing the need for
confirmation and syncing of writes between clients and server.
See the NFS config faq at:
for more details.

You may also wish to do the standard setting of packet sizes with
respect to MTU that you would normally do for a network-based
protocol. The general process (and some more details) are covered at:

No comments: