You really only need smaller recordsize settings in special cases with a lot of random access inside large files, such as database binaries and VM images. We want to reiterate that this is a pretty friendly configuration for any normal "directory full of files" type of situation-ZFS will write smaller files in smaller blocks automatically. In this section, we're going to re-run our earlier 1MiB read, write, and sync write workloads against ZFS datasets with recordsize=1M set. We know everybody loves to see big performance numbers, so let's look at some. (If you want to re-write the existing data structure, you also need to do a block for block copy of it, eg with the mv command.) ZFS recordsize=1M-large blocks for large files If all you do is change recordsize, the already-written data won't change, but any new writes to the database will follow the dataset's new recordsize parameter. If you install a PostgreSQL instance later, you can tune for its default 8KiB page size just as easily: zfs create zfs set recordsize=8K pool/postgresĪnd if you later re-tune your MySQL instance to use a larger or smaller page size, you can re-tune your ZFS dataset to match. Meanwhile, the MySQL database gets a recordsize which perfectly matches its own internal 16KiB pagesize, optimizing performance there without hurting it on the rest of the server. If your users create a bunch of 4KiB files, that's fine-the 4KiB files will still only occupy one sector, while the larger files reap the benefit of similarly large logical block sizes. Just like that, you've created what look like "folders" on the server which are optimized for the workloads to be found within. If your ZFS server has 20TiB of random user-saved files (most of which are several MiB, such as photos, movies, and office documents) along with a 2TiB MySQL database, each can coexist peacefully and simply: zfs create zfs set recordsize=1M zfs create zfs set recordsize=16K pool/mysql In sharp contrast, if you've got a 48TB ZFS pool, you can set recordsize per dataset-and datasets can be created and destroyed as easily as folders. To fix that, you must destroy the entire array and any filesystems and data on it, recreate everything from scratch, and restore your data from backup-and it can still only be tuned for one performance use case. If you tune a 48TB mdraid10 for 4KiB I/O, it's going to absolutely suck at 1MiB I/O-and similarly, a 48TB mdraid10 tuned for 1MiB I/O will perform horribly at 4KiB I/O. But there's a reason for that, and it's not just "we really like ZFS." While you can certainly tune chunk size on a kernel RAID array, any such tuning affects the entire device globally. Is ZFS getting "special treatment" here?Īn experienced sysadmin might reasonably object to ZFS being given special treatment while mdraid is left to its default settings. In this section, we're going to zfs set recordsize=4K test for the 4KiB random I/O tests, and zfs set recordsize=1M for the 1MiB random I/O tests. After all, it's not exactly optimal to store and retrieve an 8MiB digital photo in 64 128KiB blocks, rather than only 8 1MiB blocks. But when performing 4KiB random I/O with fio, since the 4KiB requests are pieces of a very large file, ZFS must read (and write) the requests in full-sized 128KiB increments.Īlthough the impact is somewhat smaller, the default 128KiB recordsize also penalizes large file access. With the default recordsize=128K, ZFS will store a 4KiB file in an undersized record, which only occupies a single 4KiB sector-and reads of that file later will also only need to light up a single 4KiB sector on disk. When you ask fio to show you random read and write behavior, it creates one very large file for each testing process (eight of them, in today's tests), and that process seeks within that large file. Most user interaction with storage can be characterized by reading and writing files in their entirety-and that's not what fio does. While the ZFS defaults are reasonably sane, fio doesn't interact with disks in quite the same way most users normally do. The shipping defaults should be sane defaults, and that's a good place for everyone to start from. We believe it's important to test things the way they come out of the box. Retesting ZFS with recordsize set correctly
0 Comments
Leave a Reply. |