The ext2 or second extended filesystem is a file system for the Linux kernel. It was initially designed by Rémy Card as a replacement for the extended file system (ext). ext2 was the default filesystem in several Linux distributions, including Debian and Red Hat Linux, until supplanted more recently by ext3, which is almost completely compatible with ext2 and is a journaling file system. ext2 is still the filesystem of choice for flash-based storage media (such as SD cards, SSDs, and USB flash drives) since its lack of a journal minimizes the number of writes. Flash devices have only a limited number of write cycles.
The space in ext2 is split up in blocks, and organized into block groups, analogous to cylinder groups in the Unix File System. This is done to reduce external fragmentation and minimize the number of disk seeks when reading a large amount of consecutive data.
Each block group may contain a copy of the superblock and block group descriptor table, and all block groups contain a block bitmap, an inode bitmap, an inode table and followed by the actual data blocks.
The superblock contains important information that is crucial to the booting of the operating system, thus backup copies are made in multiple block groups in the file system. However, typically only the first copy of it, which is found at the first block of the file system, is used in the booting.
The group descriptor stores the location of the block bitmap, inode bitmap and the start of the inode table for every block group and these, in turn are stored in a group descriptor table.
Example of ext2 inode structure:
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper and later in a February 1999 kernel mailing list posting, and the filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.
Although its performance (speed) is less attractive than competing Linux filesystems such as JFS, ReiserFS and XFS, it has a significant advantage in that it allows in-place upgrades from the ext2 file system without having to back up and restore data. Ext3 also uses less CPU power than ReiserFS and XFS. It is also considered safer than the other Linux file systems due to its relative simplicity and wider testing base.
The ext3 file system adds, over its predecessor:
* A Journaling file system
* Online file system growth
* Htree indexing for larger directories. An HTree is a specialized version of a B-tree (not to be confused with the H tree fractal).
Without these, any ext3 file system is also a valid ext2 file system. This has allowed well-tested and mature file system maintenance utilities for maintaining and repairing ext2 file systems to also be used with ext3 without major changes. The ext2 and ext3 file systems share the same standard set of utilities, e2fsprogs, which includes a fsck tool. The close relationship also makes conversion between the two file systems (both forward to ext3 and backward to ext2) straightforward.
While in some contexts the lack of “modern” filesystem features such as dynamic inode allocation and extents could be considered a disadvantage, in terms of recoverability this gives ext3 a significant advantage over file systems with those features. The file system metadata is all in fixed, well-known locations, and there is some redundancy inherent in the data structures that may allow ext2 and ext3 to be recoverable in the face of significant data corruption, where tree-based file systems may not be recoverable.
The ext4 or fourth extended filesystem is a journaling file system developed as the successor to ext3.
Large file system
The ext4 filesystem can support volumes with sizes up to 1 Exbibyte and files with sizes up to 16 tebibytes.
Extents are introduced to replace the traditional block mapping scheme used by ext2/3 filesystems. An extent is a range of contiguous physical blocks, improving large file performance and reducing fragmentation. A single extent in ext4 can map up to 128MB of contiguous space with a 4KB block size. There can be 4 extents stored in the inode. When there are more than 4 extents to a file, the rest of the extents are indexed in an Htree.
The ext4 filesystem is backward compatible with ext3 and ext2, making it possible to mount ext3 and ext2 filesystems as ext4. This will already slightly improve performance, because certain new features of ext4 can also be used with ext3 and ext2, such as the new block allocation algorithm.
The ext3 file system is partially forward compatible with ext4, that is, an ext4 filesystem can be mounted as an ext3 partition (using “ext3” as the filesystem type when mounting). However, if the ext4 partition uses extents (a major new feature of ext4), then the ability to mount the file system as ext3 is lost.
The ext4 filesystem allows for pre-allocation of on-disk space for a file. The current methodology for this on most file systems is to write the file full of 0s to reserve the space when the file is created. This method would no longer be required for ext4; instead, a new fallocate() system call was added to the Linux kernel for use by filesystems, including ext4 and XFS, that have this capability. The space allocated for files such as these would be guaranteed and would likely be contiguous. This has applications for media streaming and databases.
Ext4 uses a filesystem performance technique called allocate-on-flush, also known as delayed allocation. It consists of delaying block allocation until the data is going to be written to the disk, unlike some other file systems, which may allocate the necessary blocks before that step. This improves performance and reduces fragmentation by improving block allocation decisions based on the actual file size.
Break 32,000 subdirectory limit
In ext3 the number of subdirectories that a directory can contain is limited to 32,000. This limit has been raised to 64,000 in ext4, and with the “dir_nlink” feature it can go beyond this (although it will stop increasing the link count on the parent). To allow for continued performance given the possibility of much larger directories, Htree indexes (a specialized version of a B-tree) are turned on by default in ext4. This feature is implemented in Linux kernel 2.6.23. Htree is also available in ext3 when the dir_index feature is enabled.
Ext4 uses checksums in the journal to improve reliability, since the journal is one of the most used files of the disk. This feature has a side benefit; it can safely avoid a disk I/O wait during the journaling process, improving performance slightly. The technique of journal checksumming was inspired by a research paper from the University of Wisconsin titled IRON File Systems (specifically, section 6, called “transaction checksums”).
There are a number of proposals for an online defragmenter, but that support is not yet included in the mainline kernel. Even with the various techniques used to avoid fragmentation, a long lived file system does tend to become fragmented over time. Ext4 will have a tool which can defragment individual files or entire file systems.
Faster file system checking
In ext4, unallocated block groups and sections of the inode table are marked as such. This enables e2fsck to skip them entirely on a check and greatly reduce the time it takes to check a file system of the size ext4 is built to support. This feature is implemented in version 2.6.24 of the Linux kernel.
Ext4 allocates multiple blocks for a file in a single operation, which reduces fragmentation by attempting to choose contiguous blocks on the disk. The multiblock allocator is active when using O_DIRECT or if delayed allocation is on. This allows the file to have many dirty blocks submitted for writes at the same time, unlike the existing kernel mechanism of submitting each block to the filesystem separately for allocation.
As computers become faster in general and as Linux becomes used more for mission critical applications, the granularity of second-based timestamps becomes insufficient. To solve this, ext4 provides timestamps measured in nanoseconds. In addition, 2 bits of the expanded timestamp field are added to the most significant bits of the seconds field of the timestamps to defer the year 2038 problem for an additional 204 years.
Ext4 also adds support for date-created timestamps. But, as Theodore Ts’o points out, while it is easy to add an extra creation-date field in the inode (thus technically enabling support for date-created timestamps in ext4), it is more difficult to modify or add the necessary system calls, like stat() (which would probably require a new version), and the various libraries that depend on them (like glibc). These changes would require coordination of many projects. So, even if ext4 developers implement initial support for creation-date timestamps, this feature will not be available to user programs for now.