One of the servers I manage is essentially a file server. There are two SSD boot devices and 16 data drives. The configuration is five 3-way mirrors plus one hot spare. At one point four of the 16 drives go offline all at once. My first thought is a controller gone bad. There are four controllers in play, the onboard with six ports and three add-in cards with four ports each. But looking at the device names they are not consecutive as would be expected of the four ports on one controller. So it must be something else and I don’t buy coincidence.
The hardware is a SuperMicro chassis with a SAS/SATA disk backplane. Looking at the backplane I can see that one of the four power connectors does not look right. The pin on the yellow lead has backed itself out of the molex shell. So I shutdown the server and remove the connector. The pin with the yellow lead falls out. I reseat the pin, making sure that it locks in place and plug it back in. I restart the server.
ZFS now sees all of it’s drives and automatically resilvers the drives with missing data. There was about 3.5 GB of data to resilver and it took about one minute. No drama, no loss of data, this is the way storage systems should work.