I’ve two identical and identically configured fileservers with LSI megaraidsas controllers. They’ve each got 12x 3TB (haha, that’s 2794GiB to you, sunshine) disks, configured as a RAID6 array over disks 0-10, with 11 as the hot spare.
At some point in the last mumble months, disk 11 went to state “BAD” (as reported by megaraidsas-status). On both machines. This makes me very suspicious.
There’s a command called megacli, with possibly the nastiest set of command line arguments ever seen this side of an HP Lovecraft novel, that permits inspection and modification of the controller/disk state.
megacli -PDList -a0
allows us to find out what disks exist in which enclosures. “-a0” means “on the first controller”, and is needed in most megacli commands. Alternatively “-aALL” will also work if you have multiple controllers, etc.
This revealed, in both cases, that disk 11 in enclosure “245” had
Firmware state: Unconfigured(bad)
And yet all the entries corresponding to the disk’s SMART or error state suggested now detected errors. You can confirm that you’re looking at the right disk by using the command
megacli -PdInfo -PhysDrv '[245:11]' -a0
The magic for making the disk happy again is
megacli -PDMakeGood -PhysDrv '[245:11]' -a0
which actually marks it ok (the Firmware state should now switch to “Unconfigured(good)”, but the drive is still marked as
Foreign State: Foreign
So now we want to check for foreign drives (ie ones that might have been inadvertently imported from another RAID array)
megacli -CfgForeign -Scan -a0
And now if we’re happy with what we’re doing, we clear all the “foreign drive” states for that controller:
megacli -CfgForeign -Clear -a0
We can now make the drive a hotspare again.
megacli -PDHSP -Set -PhysDrv '[245:11]' -a0
I’m very suspicious that this happened on two different pieces of hardware, identically. I’m wondering if the hotspare is getting spun down due to idleness, and this is then confusing the controller. If it happens again, I’ll pursue that – in the meantime, there’s other complexities to deal with.