Cache and Cache Circuitry of Hard Disk drive
Hard Disk Cache Circuitry and Operation
All modern hard disks contain an integrated cache, also often called a buffer. The purpose of this cache is not dissimilar to other caches used in the PC , even though it is not normally thought of as part of the regular PC cache hierarchy. The function of cache is to act as a buffer between a relatively fast device and a relatively slow one. For hard disks, the cache is used to hold the results of recent reads from the disk, and also to “pre-fetch” information that is likely to be requested in the near future, for example, the sector or sectors immediately after the one just requested.
The use of cache improves performance of any hard disk, by reducing the number of physical accesses to the disk on repeated reads and allowing data to stream from the disk uninterrupted when the bus is busy. Most modern hard disks have between 512 KiB and 2 MiB of internal cache memory, although some high-performance SCSI drives have as much as 16 MiB, more than many whole PCs have!
Note: When someone speaks generically about a “disk cache”, they are usually not referring to this small memory area inside the hard disk, but rather to a cache of system memory set aside to buffer accesses to the disk system.
Hard Disk Cache Size
In the last couple of years, hard disk manufacturers have dramatically increased the size of the hard disk buffers in their products. Even as recently as the late 1990s, 256 to 512 kiB was common on consumer drives, and it was not unusual to find only 512 kiB buffers on even some SCSI units (though many had from 1 MiB to 4 MiB). Today, 2 MiB buffers are common on retail IDE/ATA drives, and some SCSI drives are now available with a whopping 16 MiB!
I believe there are two main reasons for this dramatic increase in buffer sizes. The first is that memory prices have dropped precipitously over the last few years. With the cost of memory only about $1 per MiB today, it doesn’t cost much to increase the amount the manufacturers put into their drives. The second is related to marketing: hard disk purchasers have a perception that doubling or quadrupling the size of the buffer will have a great impact on the performance of the hardware.
The size of the disk’s cache is important to its overall impact in improving the performance of the system, for the same reason that adding system memory will improve system performance, and why increasing the system cache will improve performance as well. However, the attention that the size of the hard disk buffer is getting today is largely unwarranted. It has become yet another “magic number” of the hardware world that is tossed around too loosely and overemphasized by salespeople. In fact, a benchmarking comparison done by StorageReview.com showed very little performance difference between 512 kiB and 1 MiB buffer versions of the same Maxtor hard drive. See this section for more on this performance metric.
So, where does this leave us? Basically, with the realization that the size of the buffer is important only to an extent, and that only large differences (4 MiB vs. 512 kiB) are likely to have a significant impact on performance. Also remember that the size of the drive’s internal buffer will be small on most systems compared to the amount of system memory set aside by the operating system for its disk cache. These two caches, the one inside the drive and the one the operating system uses to avoid having to deal with the drive at all, perform a similar function, and really work together to improve performance.
Hard Disk Write Caching
Caching reads from the hard disk and caching writes to the hard disk are similar in some ways, but very different in others. They are the same in their overall objective: to decouple the fast PC from the slow mechanics of the hard disk. The key difference is that a write involves a change to the hard disk, while a read does not.
With no write caching, every write to the hard disk involves a performance hit while the system waits for the hard disk to access the correct location on the hard disk and write the data. As mentioned in the general discussion of the cache circuitry and operation, this takes at least 10 milliseconds on most drives, which is a long time in the computer world and really slows down performance as the system waits for the hard disk. This mode of operation is called write-through caching. (The contents of the area written actually are put into the cache in case it needs to be read again later, but the write to the disk always occurs at the same time.)
When write caching is enabled, when the system sends a write to the hard disk, the logic circuit records the write in its much faster cache, and then immediately sends back an acknowledgement to the operating system saying, in essence, “all done!” The rest of the system can then proceed on its merry way without having to sit around waiting for the actuator to position and the disk to spin, and so on. This is called write-back caching, because the data is stored in the cache and only “written back” to the platters later on.
Write-back functionality of course improves performance. There’s a catch however. The drive sends back saying “all done” when it really isn’t done–the data isn’t on the disk at all, it’s only in the cache. The hard disk’s logic circuits begin to write the data to the disk, but of course this takes some time. The hard disk is using a variant of that old “the check is in the mail” trick you might hear when you call someone to remind them of that loan they were supposed to pay back three weeks ago.
Now, this isn’t really a problem most of the time, as long as the power stays on. Since cache memory is volatile, if the power goes out, its contents are lost. If there were any pending writes in the cache that were not written to the disk yet, they are gone forever. Worse, the rest of the system has no way to know this, because when it is told by the hard disk “all done”, it can’t really know what that means. So not only is some data lost, the system doesn’t even know which data, or even that it happened. The end result can be file consistency problems, operating system corruption, and so on. (Of course, this problem doesn’t affect cached reads at all. They can be discarded at any time.)
Due to this risk, in some situations write caching is not used at all. This is especially true for applications where high data integrity is critical. Due to the improvement in performance that write caching offers, however, it is increasingly being used despite the risk, and the risk is being mitigated through the use of additional technology. The most common technique is simply ensuring that the power does not go off! In high-end server environments, with their uninterruptible power supplies and even redundant power supplies, having unfilled cached writes is much less of a concern. For added peace of mind, better drives that employ write caching have a “write flush” feature that tells the drive to immediately write to disk any pending writes in its cache. This is a command that would commonly be sent before the UPS batteries ran out if a power interruption was detected by the system, or just before the system was to be shut down for any other reason.
Data recovery Salon welcomes your comments and share with us your ideas, suggestions and experience. Data recovery salon is dedicated in sharing the most useful data recovery information with our users and only if you are good at data recovery or related knowledge, please kindly drop us an email and we will publish your article here. We need to make data recovery Salon to be the most professional and free data recovery E-book online