MTBF is short for mean time between failures and it’s the most common specification related to drive reliability. usually measured in hours, is meant to represent the average amount of time that will pass between random failures on a drive of a given type. It is usually in the range of 300,000 to 1,200,000 hours for modern drives today (with the range increasing every few years) and is specified for almost every drive.

To be interpreted properly, the MTBF figure is intended to be used in conjunction with the useful service life of the drive, the typical amount of time before the drive enters the period where failures due to component wear-out increase. MTBF only applies to the aggregate analysis of large numbers of drives; it says nothing about a particular unit. If the MTBF of a model is 500,000 hours and the service life is five years, this means that a drive of that type is supposed to last for five years, and that of a large group of drives operating within this timeframe, on average they will accumulate 500,000 of total run time (amongst all the drives) before the first failure of any drive. Or, you can think of it this way: if you used one of these drives and replaced it every five years with another identical one, in theory it should last 57 years before failing, on average.

There are in fact two different types of MTBF figures. When a manufacturer is introducing a new drive to the market, it obviously has not been in use in the real world, so they have no data on how the drive will perform. Still, they can’t just shrug and say “who knows?”, because many customers want to know what the reliability of the drive is likely to be. To this end, the companies calculate what is called a theoretical MTBF figure. This number is based primarily upon the analysis of historical data; for example: the historical failure rate of other drives similar to the one being placed on the market, and the failure rate of the components used in the new model. It’s important to realize that these MTBF figures are estimates based on a theoretical model of reality, and thus are limited by the constraints of that model. There are typically assumptions made for the MTBF figure to be valid: the drive must be properly installed, it must be operating within allowable environmental limits, and so on. Theoretical MTBF figures also cannot typically account for “random” or unusual conditions such as a temporary quality problem during manufacturing a particular lot of a specific type of drive.

After a particular model of drive has been in the market for a while, say a year, the actual failures of the drive can be analyzed and a calculation made to determine the drive’s operational MTBF. This figure is derived by analyzing field returns for a drive model and comparing them to the installed base for the model and how long the average drive in the field has been running. Operational MTBFs are typically lower than theoretical MTBFs because they include some “human element” and “unforeseeable” problems not accounted for in theoretical MTBF. Despite being arguably more accurate, operational MTBF is rarely discussed as a reliability specification because most manufacturers don’t provide it as a specification, and because most people only look at the MTBFs of new drives–for which operational figures are not yet available.

The key point to remember when looking at any MTBF figure is that it is meant to be an average, based on testing done on many hard disks over a smaller period of time. Despite the theoretical numbers sometimes seeming artificially high, they do have value when put in proper perspective; a drive with a much higher MTBF figure is probably going to be more reliable than one with a much lower figure. As with most specifications, small differences don’t account for much; given that these are theoretical numbers anyway, 350,000 is not much different than 300,000.

In the real world, the actual amount of time between failures will depend on many factors, including the operating conditions of the drive and how it is used. Ultimately, however, luck is also a factor, So backup your data regularly.
