<blog:entry xmlns:xh="http://www.w3.org/1999/xhtml" xmlns:blog="http://www.adamretter.org.uk/blog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.adamretter.org.uk/blog http://www.adamretter.org.uk/blog/entry.xsd" status="published" id="d9baffb6-419b-400e-88ec-be092ea46c14">
    <blog:article timestamp="2009-07-12T17:50:00.000+01:00" last-updated="2009-07-12T17:50:00.000+01:00" author="Adam Retter">
        <blog:title>Data 1, Disk 0</blog:title>
        <blog:sub-title>NAS disk failure</blog:sub-title>
        <blog:article-content>
            <xh:p>
                <xh:a href="blog/images/nas-disk-failure.jpg" title="click for full-size image">
                    <xh:img class="left" src="blog/images/nas-disk-failure_small.jpg" alt="NAS disk failure output" title="NAS disk failure output"/>
                </xh:a>After having finally <xh:a href="http://www.adamretter.org.uk/blog/entries/diy-nas-build.xml" title="Building my DIY NAS (DIY NAS part 3 of 3)">built my NAS</xh:a> and had it happily working away in the background for a couple of weeks, it would seem that failure has struck; one of the disks forming the ZFS RAIDZ2 storage pool has failed! Whilst I am sure this seems a little ironic or sounds like a commissioned advert for ZFS by <xh:a href="http://www.sun.com" title="Sun Microsystems">Sun</xh:a>, I can only try to reassure you that this is not the case.</xh:p>
            <xh:p>Recently I experienced an unexpected crash with the NAS (no network response whatsoever), I am still unsure of the cause but have not had the time to investigate further. However, after powering the NAS off (ouch!) and back on again, I did take a quick look to make sure my data was intact by checking the zpool status. Unfortunately the bad news was that the pool status was reported as "degraded" with details of a failed disk, the good news however (and the whole point behind this setup) was that <xh:span style="font-weight: bold">my data was fine</xh:span> :-)</xh:p>
            <xh:p>I am fairly new to ZFS so I made some enquiries with the seasoned professionals in <xh:a href="irc://irc.freenode.net/opensolaris" title="IRC #opensolaris on FreeNode">#opensolaris on FreeNode</xh:a>, to make sure that the errors I was seeing were definitely hardware related and not misconfiguration on my part. Whilst I was surprised that such a new disk would fail so soon, I was pointed to something called the "bathtub curve", which can be seen in chapter 4.2 of this <xh:a href="http://usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html" title="Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?">paper</xh:a>. The "bathtub curve" basically follows that there will be high failure rates at the begining of a product's life (infant mortality) and at the end (wear-out); the statistics gathered in a further paper by Google entitled "<xh:a href="http://labs.google.com/papers/disk_failures.html" title="Failure Trends in a Large Disk Population">Failure Trends in a Large Disk Drive Population</xh:a>" also seems to back this to a certain extent.</xh:p>
            <xh:p>Overall I was glad to be reassured that this was a hardware failure and not a mistake on my part, and most importantly that I lost no data. The failed disk will shortly be replaced by the supplier, lets hope the replacement lasts a little longer.</xh:p>
        </blog:article-content>
    </blog:article>
    <blog:tags>
        <blog:tag>ZFS</blog:tag>
        <blog:tag>ZPOOL</blog:tag>
        <blog:tag>fail</blog:tag>
        <blog:tag>NAS</blog:tag>
        <blog:tag>disk</blog:tag>
        <blog:tag>OpenSolaris</blog:tag>
    </blog:tags>
</blog:entry>