Yeep...
Those numbers I pulled when I was tired. You can see it in the math. Erf.
The unit currently shows 36516 hours of usage. That's about four years. I've a note from about 3 years ago when I noticed the unit had rolled over. I assumed that it rolled at 65535, but I never did the math on it.
However, I had notated roll at 100,000 hours. I was troubleshooting an reiserFS corruption issue at the time that later turned out to be the fault of a combination of issues.
Units in question were purchased the same day in late '01 or early '02. Hitachi claims the SN's are invalid. IBM links to Hitachi to look.
Technical information on the SMART side of things is relatively impossible to find. Everyone who had one of these series exploded over having a "deathstar", but none of the GXP120's or GXP180's ever suffered from the bad run/bad plant/ possible oxidization issues that the GXP75's, which were the actual death stars, had.
I doubt most of them understood that glass platter drive technology will *shatter* if you drop it.
But I digress. My notations were incorrect. 65535 was obviously not the rollover point, and neither was 100000. (And my math on that 100000 was done horribly. erp.)
My guesstimation is that it rolls somewhere in the 40,000 range. I have smart logs run at 34k hours, followed by smart logs run at 14k hours, &c in the firmware SMART log.
I will say that these unit have been in basically a 24/7 RAID array or paired, since I purchased them in late 01 / early 02.
They still have yet to gain a bad sector.
Here's a dump of the SMART information from one of the drives:
Code:
1 Raw_Read_Error_Rate 0x000b 100 100 060 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 108 108 024 Pre-fail Always - 263 (Average 244)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 994
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 36517
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 969
192 Power-Off_Retract_Count 0x0032 098 098 050 Old_age Always - 2509
193 Load_Cycle_Count 0x0012 098 098 050 Old_age Always - 2509
194 Temperature_Celsius 0x0002 196 196 000 Old_age Always - 28 (Lifetime Min/Max 14/54)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 36042 -
# 2 Extended offline Completed without error 00% 33671 -
# 3 Extended offline Completed without error 00% 14417 -
# 4 Short offline Completed without error 00% 14416 -
As you can see, I've never really paid attention to these drives. They're "backed up", and are old enough that I use them as emergency fallbacks in case the seagates go bad.... But I've started to debate swapping the seagates out now and binning them.
For comparison, here's outputs from one of the seagates:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 115 093 006 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 12
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 054 045 030 Pre-fail Always - 150331351537
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 445
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 25
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always - 0
190 Unknown_Attribute 0x0022 072 048 045 Old_age Always - 605683740
194 Temperature_Celsius 0x0022 028 052 000 Old_age Always - 28 (Lifetime Min/Max 0/26)
195 Hardware_ECC_Recovered 0x001a 061 059 000 Old_age Always - 174058699
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 40 -
# 2 Extended offline Completed without error 00% 24 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
As you can see, before I'd had these drives for 48 hours I was running smart tests because "Stuff was wrong, yo."
OEM's like seagate do not take kindly to being told that a /pair/ of drives have irregular data-read spindle speeds indicative of misaligned heads and/or internal formatting or track spacing issues. Note, that the lifetime high temperature is below the current temperature - this shouldn't be possible.
These drives are currently less then 50% full. I will be binning them as soon as I can stick Hitachi 1TB drives in there.
As for the counters themselves, doing math and trying to figure out what happened, I'm starting to wonder if the drives lose usage data if they're left powered down for too long. They have been bounced from state to state a couple of times and were once off for two weeks while I waited for the rest of the components to catch up.
Hrm. Tis a puzzle I do not know the answer to. I can guarantee they have run the last 5 years 24/7, and they were originally used in a server that was up 24/7, and I've used them in a number of 24/7 roles since then. I've no idea what the "correct" numbers are though.
I should probably do an hourly check on them and log the value.
-----
JonnyGURU: It's entirely possible the distributor didn't handle them properly. I snagged them for an emergency replacement of a WD that died after 8 months. (Controller fried. Controller was half-fried when I bought it. WD wouldn't RMA for internal thermistor recording constant temps of 70'c when drive was cool to touch. Why I ever buy *anything* but Hitachi's, I have no idea.)
Ironically, at this point I have a good enough working relationship with the company I use as a distributor, that I can walk in with any piece of equipment that has any price tag, and simply say "It's bad." and have no questions asked. Still, it is definitely not a top tier distributor.
In fact, being as I assemble my servers out of what is essentially home PC components and get roughly 99% availability out of them, I tend to be /really/ picky.
Sadly, I qualify as merely a home hobbiest. I've never had the opportunity to try and stick a server I built in a truly high load situation. Someday, maybe.
Otoh, my statistics graphs show with out a doubt that my home server is roughly 5x as reliable as my broadband ISP's ciscos are. Isn't that irony?