232
submitted 8 months ago* (last edited 8 months ago) by rutrum@lm.paradisus.day to c/linux@lemmy.ml

You know, ZFS, ButterFS (btrfs...its actually "better" right?), and I'm sure more.

I think I have ext4 on my home computer I installed ubuntu on 5 years ago. How does the choice of file system play a role? Is that old hat now? Surely something like ext4 has its place.

I see a lot of talk around filesystems but Ive never found a great resource that distiguishes them at a level that assumes I dont know much. Can anyone give some insight on how file systems work and why these new filesystems, that appear to be highlights and selling points in most distros, are better than older ones?

Edit: and since we are talking about filesystems, it might be nice to describe or mention how concepts like RAID or LUKS are related.

you are viewing a single comment's thread
view the rest of the comments
[-] aksdb@feddit.de 144 points 8 months ago* (last edited 8 months ago)

As with every software/product: they have different features.

ZFS is not really hip. It's pretty old. But also pretty solid. Unfortunately it's licensed in a way that is maybe incompatible with the GPL, so no one wants to take the risk of trying to get it into Linux. So in the Linux world it is always a third-party-addon. In the BSD or Solaris world though ....

btrfs has similar goals as ZFS (more to that soon) but has been developed right inside the kernel all along, so it typically works out of the box. It has a bit of a complicated history with it's stability/reliability from which it still suffers (the history, not the stability). Many/most people run it with zero problems, some will still cite problems they had in the past, some apparently also still have problems.

bcachefs is also looming around the corner and might tackle problems differently, bringing us all the nice features with less bugs (optimism, yay). But it's an even younger FS than btrfs, so only time will tell.

ext4 is an iteration on ext3 on ext2. So it's pretty fucking stable and heavily battle tested.

Now why even care? ZFS, btrfs and bcachefs are filesystems following the COW philisophy (copy on write), meaning you might lose a bit performance but win on reliability. It also allows easily enabling snapshots, which all three bring you out of the box. So you can basically say "mark the current state of the filesystem with tag/label/whatever 'x'" and every subsequent changes (since they are copies) will not touch the old snapshots, allowing you to easily roll back a whole partition. (Of course that takes up space, but only incrementally.)

They also bring native support for different RAID levels making additional layers like mdadm unnecessary. In case of ZFS and bcachefs, you also have native encryption, making LUKS obsolete.

For typical desktop use: ext4 is totally fine. Snapshots are extremely convenient if something breaks and you can basically revert the changes back in a single command. They don't replace a backup strategy, so in the end you should have some data security measures in place anyway.

*Edit: forgot a word.

[-] excitingburp@lemmy.world 41 points 8 months ago

Btw COW isn't necessarily (and isn't at least for ZFS) a performance trade-off. Data isn't really copied, new data is simply written elsewhere on the disk (and the old data is not marked as free space).

Ultimately it actually means "the data behaves as though it was copied," which can be achieved in many ways. There are many ways to do that without actually copying.

[-] teawrecks@sopuli.xyz 9 points 8 months ago

So let me give an example, and you tell me if I understand. If you change 1MB in the middle of a 1GB file, the filesystem is smart enough to only allocate a new 1MB chunk and update its tables to say "the first 0.5GB lives in the same old place, then 1MB is over here at this new location, and the remaining 0.5GB is still at the old location"?

If that's how it works, would this over time result in a single file being spread out in different physical blocks all over the place? I assume sequential reads of a file stored contiguously would usually have better performance than random reads of a file stored all over the place, right? Maybe not for modern SSDs...but also fragmentation could become a problem, because now you have a bunch of random 1MB chunks that are free.

I know ZFS encourages regular "scrubs" that I thought just checked for data integrity, but maybe it also takes the opportunity to defrag and re-serialize? I also don't know if the other filesystems have a similar operation.

[-] d3Xt3r@lemmy.nz 6 points 8 months ago* (last edited 8 months ago)

Not OP, but yes, that's pretty much how it works. (ZFS scrubs do not defrgment data however).

Fragmentation isn't really a problem for several reasons.

  • Some (most?) COW filesystems have mechanisms to mitigate fragmentation. ZFS, for instance, uses a special allocation strategy to minimize fragmentation and can reallocate data during certain operations like resilvering or rebalancing.

  • ZFS doesn't even have a traditional defrag command. Because of its design and the way it handles file storage, a typical defrag process is not applicable or even necessary in the same way it is with other traditional filesystems

  • Btrfs too handles chunk allocation effeciently and generally doesn't require defragmentation, and although it does have a defrag command, it's almost never used by anyone, unless you have a special reason to (eg: maybe you have a program that is reading raw sectors of a file, and needs the data to be contiguous).

  • Fragmentation is only really an issue for spinning disks, however, that is no longer a concern for most spinning disk users because:

    • Most home users who still have spinning disks use it for archival/long term storage/media that rarely changes (eg: photos, movies, other infrequently accessed data), so fragmentation rarely occurs here and even if it does, it's not a concern.
    • Power users typically have a DAS or NAS setup where spinning disks are in a RAID config with striping, so the spread of data across multiple sectors actually has an advantage for averaging out read times (so no file is completely stuck in the slow regions of a disk), but also, any performance loss is also generally negated because a single file can typically be read from two or more drives simultaneously, depending on the redundancy config.
  • Enterprise users also almost always use a RAID (or similar) setup, so the same as above applies. They also use filesystems like ZFS which employs heavy caching mechanisms, typically backed by SSDs/NVMes, so again, fragmentation isn't really an issue.

[-] teawrecks@sopuli.xyz 3 points 8 months ago

Cool, good to know. I'd be interested to learn how they mitigate fragmentation, though. It's not clear to me how COW could mitigate the copy cost without fragmentation, but I'm certain people smarter than me have been thinking about the problem for my whole life. I know spinning disks have their own set of limitations, but even SSDs perform better on sequential reads over random reads, so it seems like the preference would still be to not split a file up too much.

[-] chaorace 1 points 8 months ago

That's interesting! I was under the same impression as the OP in terms of thinking that "COW" was a single monolothic "thing". It makes me realize I don't actually understand what's going on at the nuts & bolts level very well.

If you've the time and knowledge, could you talk more about that? In modern implementations, is COW a true upgrade with no practical downsides? If so, what's up with the popular myth that there's a fundamental performance tradeoff? Moving forward should we start thinking of COW as an innovation which obsoletes older technologies or should it continue to be considered something of a sidegrade?

[-] ReversalHatchery@beehaw.org 12 points 8 months ago* (last edited 8 months ago)

In case of ZFS and bcachefs, you also have native encryption, making LUKS obsolete.

I don't think that it makes LUKS obsolete. LUKS encrypts the entire partition, but ZFS (and BTRFS too as I know) only encrypt the data and some of the metadata, the rest is kept as it is.

https://openzfs.github.io/openzfs-docs/man/v2.2/8/zfs-load-key.8.html#Encryption

Data that is not encrypted can be modified from the outside (the checksums have to be updated of course), which can mean from a virus on a dual booted OS to an intruder/thief/whatever.
If you have read recently about the logofail attack, the same could happen with modifying the technical data of a filesystem, but it may be bad enough if they just swap the names of 2 of your snapshots if they just want to cause trouble.

But otherwise this is a good summary.

[-] lemann@lemmy.one 4 points 8 months ago

BTRFS has encryption now? Yay!! I have been wrapping it inside a LUKS partition for years at this point...

[-] KiranWells@pawb.social 1 points 8 months ago

They said bcachefs; I don't think BTRFS has it, at least not since I last checked.

[-] Fizz@lemmy.nz 6 points 8 months ago

So ext4 is the best for desktop gaming performance?

[-] aksdb@feddit.de 28 points 8 months ago

It likely has an edge. But I think on SSDs the advantage is negligible. Also games have the most performance critical stuff in-memory anyway so the only thing you could optimize is read performance when changing scenes.

Here are some comparisons: https://www.phoronix.com/news/Linux-5.14-File-Systems

But again ... practically you can likely ignore the difference for desktop usage (also gaming). The workloads where it matters are typically on servers with high throughput where latencies accumulate quickly.

[-] skullgiver@popplesburger.hilciferous.nl 11 points 8 months ago* (last edited 8 months ago)

[This comment has been deleted by an automated system]

[-] flashgnash@lemm.ee 10 points 8 months ago

Having tried NTFS, ext4 and btrfs, the difference is not noticeable (though NTFS is buggy on Linux)

Btrfs I believe has compression built in so is good for large libraries but realistically ext4 is the easiest and simplest way to do so I just use that nowadays

[-] Cwilliams@beehaw.org 9 points 8 months ago

Well that's because any support for it is unofficial. NTFS is made for Windows

[-] MonkderZweite@feddit.ch 7 points 8 months ago

And proprietary and an old piece of garbage.

[-] Cwilliams@beehaw.org 6 points 8 months ago

I didn't want to sound to harsh, but yea

[-] chaorace 1 points 8 months ago

That's an odd way to put it. NTFS is proprietary, yes, but it's also fundamentally a stable technology which remains eternally forwards/backwards compatible with systems which have long since been frozen in time (modern NTFS is 100% identical to the NTFS used with Windows XP 20+ years ago). In this sense it's a lot like FAT32, which you may or may not be surprised to hear was very strongly patented and never officially opened up.

When you get down to it, the key issue is that NTFS is built for the NT kernel. The various NT-isms of the filesystem do not cleanly map onto *nix metaphors. Access controls are a great example: NTFS is perfectly capable of representing *nix-ish permissions, but NT isn't quite sure what to do with those *nix-ish permissions when interpreting them as NT-ish access controls and vice-versa.

To sum it up: problems commonly attributed to NTFS often actually tend to be issues of NT/*nix compatibility. non-NT NTFS drivers make valiant efforts to bridge the gap, but at the end of the day there will always be compromises. These compromises turn into gremlins as soon as you start sharing the same NTFS volume between NT & non-NT systems.

[-] Flaky@iusearchlinux.fyi 3 points 8 months ago

I had a pretty bad experience with the Paragon NTFS3 drivers a couple years ago. Basically the kernel hung, maybe from this, maybe not, but it ended up with filesystem corruption on my hard drives.

Thankfully, Windows was able to fix it but until recently I relied on NTFS-3G. Paragon's NTFS3 driver seems to be faring a lot better nowadays.

I'd be surprised to find out there was one filesystem that consistently did better than others in gaming performance. ext4 is a fine choice, though.

[-] noddy@beehaw.org 3 points 8 months ago

I remember reading somewhere that btrfs has good performance for gaming because of deduplication. I'm using btrfs, haven't benchmarked it or anything, but it seems to work fine.

[-] Auli@lemmy.ca 3 points 8 months ago

Going to be they or XFS. There was a benchmark of the different filesystems I heard about never found it though. It was recent and included bcachefs

[-] rutrum@lm.paradisus.day 5 points 8 months ago

Perhaps I'm guilty of good luck, but is the trade off of performance for reliability worth it? How often is reliability a problem?

As a different use case altogether, suppose I was setting up a NAS over a couple drives. Does choosing something with COW have anything to do with redundancy?

Maybe my question is, are there applications where zfs/btrfs is more or less appropriate than ext4 or even FAT?

[-] aksdb@feddit.de 14 points 8 months ago

For fileservers ZFS (and by extension btrfs) have a clear advantage. The main thing is, that you can relatively easily extend and section off storage pools. For ext4 you would need LVM to somewhat achieve something similar, but it's still not as mighty as what ZFS (and btrfs) offer out of the box.

ZFS also has a lot of caching strategies specifically optimized for storage boxes. Means: it will eat your RAM, but become pretty fast. That's not a trade-off you want on a desktop (or a multi purpose server), since you typically also need RAM for applications running. But on a NAS, that is completely fine. AFAIK TrueNAS defaults to ZFS. Synology uses btrfs by default. Proxmox runs on ZFS.

[-] 4am@lemm.ee 14 points 8 months ago

ZFS cache will mark itself as such, so if the kernel needs more RAM for applications it can just dump some of the ZFS cache and use whatever it needs.

I see lots of threads on homelab where new users are like “HELP MY ZFS IS USING 100% MEMORY” and we have to talk them off that ledge: unused RAM is wasted RAM, ZFS is making sure you’re running fast AF.

[-] aksdb@feddit.de 7 points 8 months ago

ZFS cache will mark itself as such, so if the kernel needs more RAM for applications it can just dump some of the ZFS cache and use whatever it needs.

In theory. Practically unless I limit the max ARC size, processes get OOM killed quite frequently here.

[-] MonkderZweite@feddit.ch 4 points 8 months ago* (last edited 8 months ago)

unused RAM is wasted RAM

In theory. But how it is implemented in current systems, reserved memory can not be used by other processes and those other processes can not just ask the hog to give some space. Eventually, the hog gets OOM-killed or the system freezes.

[-] PixxlMan@lemmy.world 1 points 8 months ago

Even when, as the comment says, the memory is marked as cache?

Windows doesn't have this problem

[-] ReversalHatchery@beehaw.org 3 points 8 months ago

are there applications where zfs/btrfs is more or less appropriate than ext4 or even FAT?

Neither of them likes to deal with very low amounts of free space, so don't use it on places where that is often a scarcity. ZFS gets really slow when free space is almost none, and nowadays I don't know about BTRFS but a few years ago filling the partition caused data corruption there.

[-] mcepl@lemmy.world 4 points 8 months ago

ZFS is not really hip. It’s pretty old. But also pretty solid. Unfortunately it’s licensed in a way that is maybe incompatible with the GPL, so no one wants to take the risk of trying to get it into Linux. So in the Linux world it is always a third-party-addon. In the BSD or Solaris world though …

Also ZFS has tendency to have HIGH (really HIGH) hardware/CPU/memory requirements.

[-] bamboo@lemm.ee 2 points 8 months ago

It was originally designed for massive storage servers (“zettabyte” file system) rather than personal laptops and desktops. It was before the current convergence trend too, so allocating all of the system resources to the file system was considered very beneficial if it could improve performance.

[-] mcepl@lemmy.world 1 points 7 months ago

I haven’t meant it as the criticism of ZFS. It is just so, and perhaps there were good reasons for it. Now (especially with the convergence trend) it hurts.

this post was submitted on 07 Dec 2023
232 points (96.4% liked)

Linux

46611 readers
872 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS