PK1048

ZFS Protects Data Again

One of the servers I manage is essentially a file server. There are two SSD boot devices and 16 data drives. The configuration is five 3-way mirrors plus one hot spare. At one point four of the 16 drives go offline all at once. My first thought is a controller gone bad. There are four controllers in play, the onboard with six ports and three add-in cards with four ports each. But looking at the device names they are not consecutive as would be expected of the four ports on one controller. So it must be something else and I don’t buy coincidence.

The hardware is a SuperMicro chassis with a SAS/SATA disk backplane. Looking at the backplane I can see that one of the four power connectors does not look right. The pin on the yellow lead has backed itself out of the molex shell. So I shutdown the server and remove the connector. The pin with the yellow lead falls out. I reseat the pin, making sure that it locks in place and plug it back in. I restart the server.

ZFS now sees all of it’s drives and automatically resilvers the drives with missing data. There was about 3.5 GB of data to resilver and it took about one minute. No drama, no loss of data, this is the way storage systems should work.

ZFS Resilver Observations

As it has been discussed on the ZFS mailing list recently (subscribe here), I figured I would post my most recent observations on resilver performance.

My home server is an HP MicroProliant N36L (soon to be an N54L) with 8GB RAM, a Marvell based eSATA card (I forget which one), and a StarTech 4-drive external enclosure (which uses port multipliers). The system is running FreeBSD 9.1-RELEASE-p7.

The zpool in question is a 5 drive RAIDz2 made up of 1TB drives, a mix of Seagate and HGST. Note that drives ada0 through ada2 are in the external enclosure and ada3 through ada6 are internal to the system. So ada0 – ada2 are behind the port multiplier and ada3 – ada6 each have individual SATA ports.

One of the HGST drives failed and I swapped in a convenient 2TB HGST I had for another project that has not started yet. Normally I have a hot spare, but I have yet to RMA the last failed drive and bicycle the hot spare back in. So the current state is a RAIDz2 resilvering.

# zpool status export
  pool: export
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Aug  1 13:41:48 2014
        248G scanned out of 2.84T at 57.1M/s, 13h14m to go
        49.4G resilvered, 8.53% done
config:

	NAME                       STATE     READ WRITE CKSUM
	export                     DEGRADED     0     0     0
	  raidz2-0                 DEGRADED     0     0     0
	    replacing-0            UNAVAIL      0     0     0
	      3166455989486803094  UNAVAIL      0     0     0  was /dev/ada5p1
	      ada6p1               ONLINE       0     0     0  (resilvering)
	    ada5p1                 ONLINE       0     0     0
	    ada4p1                 ONLINE       0     0     0
	    ada2p1                 ONLINE       0     0     0
	    ada1p1                 ONLINE       0     0     0

errors: No known data errors
#

As expected the missing drive is being replaced by the new drive. Here are some throughput numbers from zpool iostat -v 60:

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
export                     2.84T  1.69T    291     37  35.6M   163K
  raidz2                   2.84T  1.69T    291     37  35.6M   163K
    replacing                  -      -      0    331      0  12.0M
      3166455989486803094      -      -      0      0      0      0
      ada6p1                   -      -      0    207      0  12.0M
    ada5p1                     -      -    227      9  11.9M  53.4K
    ada4p1                     -      -    236      9  11.9M  53.6K
    ada2p1                     -      -    195      9  11.9M  53.8K
    ada1p1                     -      -    197      9  11.9M  53.8K
-------------------------  -----  -----  -----  -----  -----  -----

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
export                     2.84T  1.69T    292     29  35.5M   127K
  raidz2                   2.84T  1.69T    292     29  35.5M   127K
    replacing                  -      -      0    321      0  11.9M
      3166455989486803094      -      -      0      0      0      0
      ada6p1                   -      -      0    206      0  11.9M
    ada5p1                     -      -    225      9  11.9M  40.3K
    ada4p1                     -      -    235      9  11.9M  40.1K
    ada2p1                     -      -    196      9  11.9M  40.2K
    ada1p1                     -      -    196      8  11.9M  40.2K
-------------------------  -----  -----  -----  -----  -----  -----

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
export                     2.84T  1.69T    276     31  33.1M   114K
  raidz2                   2.84T  1.69T    276     31  33.1M   114K
    replacing                  -      -      0    305      0  11.1M
      3166455989486803094      -      -      0      0      0      0
      ada6p1                   -      -      0    197      0  11.1M
    ada5p1                     -      -    211      7  11.1M  35.0K
    ada4p1                     -      -    221      7  11.1M  35.0K
    ada2p1                     -      -    181      7  11.1M  35.0K
    ada1p1                     -      -    183      7  11.1M  34.8K
-------------------------  -----  -----  -----  -----  -----  -----

And here are some raw disk drive numbers from iostat -x -w 60:

                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0       0.0   0.0     0.0     0.0    0   0.0   0 
ada1     211.7   9.2 10038.9    48.4    0   2.2  19 
ada2     209.4   9.1 10038.9    48.2    0   2.3  19 
ada3       0.0   4.0     0.0    23.7    0   0.2   0 
ada4     240.9   9.1 10041.6    48.3    0   0.8   9 
ada5     233.1   9.1 10041.1    48.2    0   0.9  11 
ada6       0.0 176.4     0.0  9994.3    4  18.5  85 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0       0.0   0.0     0.0     0.0    0   0.0   0 
ada1     191.2   7.6  9472.1    33.5    0   3.6  26 
ada2     189.2   7.4  9474.6    33.5    0   3.9  27 
ada3       0.0   3.8     0.0    27.9    0   0.2   0 
ada4     220.1   7.5  9475.1    33.3    0   1.4  13 
ada5     222.5   7.4  9476.8    33.4    0   1.1  12 
ada6       0.0 170.2     0.0  9460.4    4  18.7  83 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0       0.0   0.0     0.0     0.0    0   0.0   0 
ada1     224.5   6.4  9949.5    20.5    2   2.0  19 
ada2     221.4   6.4  9950.6    20.3    2   2.2  20 
ada3       0.0   4.5     0.0    35.6    0   0.2   0 
ada4     249.6   6.3  9947.8    20.5    1   0.8  10 
ada5     243.7   6.3  9947.7    20.5    2   0.8  11 
ada6       0.0 172.4     0.0  9875.7    3  19.1  86 

Do not try to correlate the numbers as the samples were taken at different times, but the general picture of the resilver is fairly accurate. The zpool is about 62% full:

# zpool list 
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
export    4.53T  2.84T  1.69T    62%  1.00x  DEGRADED  -
#

So the resilver is limited by roughly the performance of one of the five drives in the RAIDz2 zpool. About 10 MB/sec and 170 I/Ops RANDOM performance is not bad for a single 7,200 RPM SATA drive. I have been told by those smarter than I not to expect more than about 100 I/Ops random I/O from a single spindle so 170 seems like a win.

What all was said and done, this was the final result:

# zpool status export
  pool: export
 state: ONLINE
  scan: resilvered 580G in 16h55m with 0 errors on Sat Aug  2 06:37:02 2014
config:

	NAME        STATE     READ WRITE CKSUM
	export      ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    ada6p1  ONLINE       0     0     0
	    ada5p1  ONLINE       0     0     0
	    ada4p1  ONLINE       0     0     0
	    ada2p1  ONLINE       0     0     0
	    ada1p1  ONLINE       0     0     0

errors: No known data errors
#

So it took almost 17 hours to resilver 580GB which is the amount of data + parity + metadata on one of the five drives in the RAIDz2. The total amount of space allocated is 2.84TB as shown in the zpool list above.

Your mileage may vary…

Japanese Industrial Video Formats of the 1970’s

When I was in High School (1978 through 1981) we had a small closed circuit TV station with B&W cameras, video recorders, and a small switcher. Much of the equipment was Sony or old broadcast cast-offs (Conrac monitors, Tektronix waveform monitor, etc.).

The video recorders we had were all Sony. Starting with the largest (and biggest), the EV-200 was an EIAJ 1″ B&W helical scan recorder with mechanical transport control. In other words, a big Rewind – Stop – Play/Rec – Fast Forward lever / knob. The tape wrap was 180 degrees around the drum.

Next was the EV-340 which was also EIAJ 1″, but had electronic control and an optional ColorPack (this did color under, see the related video post here). I never recall this machine working well and it was rarely used. I never saw it work in color.

Then we got the EIAJ 1/2″ AV-3650 which was a marvel because it could edit. Both Assemble Edit as well as Insert Edit. A mechanical transport control meant that you could not control it via any sort of edit controller, just manually drop into record cleanly (assemble edit) or punch into and drop out of record while playing (insert edit) cleanly.

The AV-3400 was EIAJ 1/2″, portable and included a portable camera (all B&W). It could even run for a bit (an hour if memory serves) from a built in rechargeable battery!

At some point we got a new fangled Industrial (not home) BetaMax with mechanical tuner and large “piano” keys mechanical operation. It recorded color!

My senior year we recorded the Presidential Inauguration (Ronald Reagan) and then showed it during every class period the next day for all the Social Studies classes. We did the recording on three machines; EV-200, AV-3650, BetaMax. For playback we started by rotating through all three, but after the third period we decided to use the EV-200 for all the remaining playback because (in B&W, which is what all the classroom TV sets were) it looked the best of the three. The AV-3650 looked slightly soft and the BetaMax was much softer as it had all the filtering to handle the color component.

So even in 1981 I was comparing video formats and picking the best looking.