Blog

Ponderings of a kind

This is my own personal blog, each article is an XML document and the code powering it is hand cranked in XQuery and XSLT. It is fairly simple and has evolved only as I have needed additional functionality. I plan to Open Source the code once it is a bit more mature, however if you would like a copy in the meantime drop me a line.

Atom Feed

Data 1, Disk 0

NAS disk failure

NAS disk failure outputAfter having finally built my NAS and had it happily working away in the background for a couple of weeks, it would seem that failure has struck; one of the disks forming the ZFS RAIDZ2 storage pool has failed! Whilst I am sure this seems a little ironic or sounds like a commissioned advert for ZFS by Sun, I can only try to reassure you that this is not the case.

Recently I experienced an unexpected crash with the NAS (no network response whatsoever), I am still unsure of the cause but have not had the time to investigate further. However, after powering the NAS off (ouch!) and back on again, I did take a quick look to make sure my data was intact by checking the zpool status. Unfortunately the bad news was that the pool status was reported as "degraded" with details of a failed disk, the good news however (and the whole point behind this setup) was that my data was fine :-)

I am fairly new to ZFS so I made some enquiries with the seasoned professionals in #opensolaris on FreeNode, to make sure that the errors I was seeing were definitely hardware related and not misconfiguration on my part. Whilst I was surprised that such a new disk would fail so soon, I was pointed to something called the "bathtub curve", which can be seen in chapter 4.2 of this paper. The "bathtub curve" basically follows that there will be high failure rates at the begining of a product's life (infant mortality) and at the end (wear-out); the statistics gathered in a further paper by Google entitled "Failure Trends in a Large Disk Drive Population" also seems to back this to a certain extent.

Overall I was glad to be reassured that this was a hardware failure and not a mistake on my part, and most importantly that I lost no data. The failed disk will shortly be replaced by the supplier, lets hope the replacement lasts a little longer.

Adam Retter posted on Sunday, 12th July 2009 at 17.50 (GMT+01:00)
Updated: Sunday, 12th 2009 at July 17.50 (GMT+01:00)

tags: ZFSZPOOLfailNASdiskOpenSolaris

Comments (1)


I have a similar NAS box as described in my blog.  I have also encounter unresponsiveness before, but did solve it with a custom network card driver.  I am not sure what chip your MSI board use for network, but my Intel board was using Realtek and the Opensolaris driver was not up to par.
My setup is even riskier, my main pool is directly off SATA ports on the motherboard.  That runs perfectly for months (since May 2009) without any issue or errors.

My pools off the USB2-eSATA have more issues.  There are occassionally some errors that ZFS simply fixes.  I think that if I use any other file system, my data would have been corrupted.  No data lost so far.  Check one of my post where I thought I lost everything, but an zfs export and import solves the problem...

Add Comment



(will not be shown)






Tag Cloud