Category: ZFS

ZFS Resources / Links

I see some of the same questions come across a variety of mailing lists. Often times the questions are phrased differently, but they are essentially asking the same things over and over again. This is not a bad thing as new people are introduced to ZFS and start asking about it. I find that I am sending people to a number of links where people who know more about ZFS than I have already answered the questions. To simplify giving out all these URLs I am gathering them up here for reference in one place. Expect this list to grow as I find more good write-ups on ZFS and ZFS related topics.

These will all open in a new tab or window.

Matthew Ahrens on RAIDz stripe width

Richard Elling on MTTDL and ZFS configurations MTTDL == Mean Time To Data Loss or a relative measure of how safe your data will be

My ZFS Resilver Observations from replacing a drive in 2014

My ZFS Performance vs ZPOOL Layout results from testing I did while at a client in 2010

FreeBSD Wiki ZFS Tuning Guide FreeBSD specific, slightly dated, has some recommendations but does not fully explain them

Solaris Internals ZFS Best Practices Guide this one is very old but still has good information, even if it is Solaris specific

Solaris Internals ZFS Evil Tuning Guide this one is also very old, but still contains some very good information on how ZFS works

ZFS Protects Data Again

One of the servers I manage is essentially a file server. There are two SSD boot devices and 16 data drives. The configuration is five 3-way mirrors plus one hot spare. At one point four of the 16 drives go offline all at once. My first thought is a controller gone bad. There are four controllers in play, the onboard with six ports and three add-in cards with four ports each. But looking at the device names they are not consecutive as would be expected of the four ports on one controller. So it must be something else and I don’t buy coincidence.

The hardware is a SuperMicro chassis with a SAS/SATA disk backplane. Looking at the backplane I can see that one of the four power connectors does not look right. The pin on the yellow lead has backed itself out of the molex shell. So I shutdown the server and remove the connector. The pin with the yellow lead falls out. I reseat the pin, making sure that it locks in place and plug it back in. I restart the server.

ZFS now sees all of it’s drives and automatically resilvers the drives with missing data. There was about 3.5 GB of data to resilver and it took about one minute. No drama, no loss of data, this is the way storage systems should work.

ZFS Resilver Observations

As it has been discussed on the ZFS mailing list recently (subscribe here), I figured I would post my most recent observations on resilver performance.

My home server is an HP MicroProliant N36L (soon to be an N54L) with 8GB RAM, a Marvell based eSATA card (I forget which one), and a StarTech 4-drive external enclosure (which uses port multipliers). The system is running FreeBSD 9.1-RELEASE-p7.

The zpool in question is a 5 drive RAIDz2 made up of 1TB drives, a mix of Seagate and HGST. Note that drives ada0 through ada2 are in the external enclosure and ada3 through ada6 are internal to the system. So ada0 – ada2 are behind the port multiplier and ada3 – ada6 each have individual SATA ports.

One of the HGST drives failed and I swapped in a convenient 2TB HGST I had for another project that has not started yet. Normally I have a hot spare, but I have yet to RMA the last failed drive and bicycle the hot spare back in. So the current state is a RAIDz2 resilvering.

# zpool status export
  pool: export
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Aug  1 13:41:48 2014
        248G scanned out of 2.84T at 57.1M/s, 13h14m to go
        49.4G resilvered, 8.53% done
config:

	NAME                       STATE     READ WRITE CKSUM
	export                     DEGRADED     0     0     0
	  raidz2-0                 DEGRADED     0     0     0
	    replacing-0            UNAVAIL      0     0     0
	      3166455989486803094  UNAVAIL      0     0     0  was /dev/ada5p1
	      ada6p1               ONLINE       0     0     0  (resilvering)
	    ada5p1                 ONLINE       0     0     0
	    ada4p1                 ONLINE       0     0     0
	    ada2p1                 ONLINE       0     0     0
	    ada1p1                 ONLINE       0     0     0

errors: No known data errors
#

As expected the missing drive is being replaced by the new drive. Here are some throughput numbers from zpool iostat -v 60:

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
export                     2.84T  1.69T    291     37  35.6M   163K
  raidz2                   2.84T  1.69T    291     37  35.6M   163K
    replacing                  -      -      0    331      0  12.0M
      3166455989486803094      -      -      0      0      0      0
      ada6p1                   -      -      0    207      0  12.0M
    ada5p1                     -      -    227      9  11.9M  53.4K
    ada4p1                     -      -    236      9  11.9M  53.6K
    ada2p1                     -      -    195      9  11.9M  53.8K
    ada1p1                     -      -    197      9  11.9M  53.8K
-------------------------  -----  -----  -----  -----  -----  -----

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
export                     2.84T  1.69T    292     29  35.5M   127K
  raidz2                   2.84T  1.69T    292     29  35.5M   127K
    replacing                  -      -      0    321      0  11.9M
      3166455989486803094      -      -      0      0      0      0
      ada6p1                   -      -      0    206      0  11.9M
    ada5p1                     -      -    225      9  11.9M  40.3K
    ada4p1                     -      -    235      9  11.9M  40.1K
    ada2p1                     -      -    196      9  11.9M  40.2K
    ada1p1                     -      -    196      8  11.9M  40.2K
-------------------------  -----  -----  -----  -----  -----  -----

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
export                     2.84T  1.69T    276     31  33.1M   114K
  raidz2                   2.84T  1.69T    276     31  33.1M   114K
    replacing                  -      -      0    305      0  11.1M
      3166455989486803094      -      -      0      0      0      0
      ada6p1                   -      -      0    197      0  11.1M
    ada5p1                     -      -    211      7  11.1M  35.0K
    ada4p1                     -      -    221      7  11.1M  35.0K
    ada2p1                     -      -    181      7  11.1M  35.0K
    ada1p1                     -      -    183      7  11.1M  34.8K
-------------------------  -----  -----  -----  -----  -----  -----

And here are some raw disk drive numbers from iostat -x -w 60:

                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0       0.0   0.0     0.0     0.0    0   0.0   0 
ada1     211.7   9.2 10038.9    48.4    0   2.2  19 
ada2     209.4   9.1 10038.9    48.2    0   2.3  19 
ada3       0.0   4.0     0.0    23.7    0   0.2   0 
ada4     240.9   9.1 10041.6    48.3    0   0.8   9 
ada5     233.1   9.1 10041.1    48.2    0   0.9  11 
ada6       0.0 176.4     0.0  9994.3    4  18.5  85 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0       0.0   0.0     0.0     0.0    0   0.0   0 
ada1     191.2   7.6  9472.1    33.5    0   3.6  26 
ada2     189.2   7.4  9474.6    33.5    0   3.9  27 
ada3       0.0   3.8     0.0    27.9    0   0.2   0 
ada4     220.1   7.5  9475.1    33.3    0   1.4  13 
ada5     222.5   7.4  9476.8    33.4    0   1.1  12 
ada6       0.0 170.2     0.0  9460.4    4  18.7  83 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0       0.0   0.0     0.0     0.0    0   0.0   0 
ada1     224.5   6.4  9949.5    20.5    2   2.0  19 
ada2     221.4   6.4  9950.6    20.3    2   2.2  20 
ada3       0.0   4.5     0.0    35.6    0   0.2   0 
ada4     249.6   6.3  9947.8    20.5    1   0.8  10 
ada5     243.7   6.3  9947.7    20.5    2   0.8  11 
ada6       0.0 172.4     0.0  9875.7    3  19.1  86 

Do not try to correlate the numbers as the samples were taken at different times, but the general picture of the resilver is fairly accurate. The zpool is about 62% full:

# zpool list 
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
export    4.53T  2.84T  1.69T    62%  1.00x  DEGRADED  -
#

So the resilver is limited by roughly the performance of one of the five drives in the RAIDz2 zpool. About 10 MB/sec and 170 I/Ops RANDOM performance is not bad for a single 7,200 RPM SATA drive. I have been told by those smarter than I not to expect more than about 100 I/Ops random I/O from a single spindle so 170 seems like a win.

What all was said and done, this was the final result:

# zpool status export
  pool: export
 state: ONLINE
  scan: resilvered 580G in 16h55m with 0 errors on Sat Aug  2 06:37:02 2014
config:

	NAME        STATE     READ WRITE CKSUM
	export      ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    ada6p1  ONLINE       0     0     0
	    ada5p1  ONLINE       0     0     0
	    ada4p1  ONLINE       0     0     0
	    ada2p1  ONLINE       0     0     0
	    ada1p1  ONLINE       0     0     0

errors: No known data errors
#

So it took almost 17 hours to resilver 580GB which is the amount of data + parity + metadata on one of the five drives in the RAIDz2. The total amount of space allocated is 2.84TB as shown in the zpool list above.

Your mileage may vary…