Activity - Bit more context behind that now that the coffee kicked in:...

Max_P , 5 months ago

Bit more context behind that now that the coffee kicked in:

Back then everyone had HDDs which strongly prefers sequential reads and writes. So if you can buffer all those in RAM, the system can optimize the throughput.

For the most part, IO happens on internal, non-removable drives so it makes sense to let applications write to RAM and do the flushing to disk in the background. For example, Firefox can write to its cache without having to worry whether it’ll slow down the browser too much. Generally makes applications much snappier, especially single threaded ones that vastly predates async runtimes.

If the program does IO on multiple drives, by acknowledging the write on one immediately may let the program perform IO on the next file, which the kernel can then flush to both drives in parallel.

By acknowledging the write immediately, the modified file is also immediately available to other programs on the system which can access the file immediately, from RAM, as it’s still being flushed to disk in the background.

The buffering allows writes to potentially cancel a pending write. If you’re updating multiple files for example, the kernel can delay updating the filesystem state to do it just once with the updated file list.

That’s largely why Linux works so much better with millions of tiny files compared to Windows.

You can still get speed benefits even with modern NVMe doing this. Those are so fast the kernel can run out of stuff to write before it’s gotten around to wake up the application for more. Instead let the application fill up the buffer fast, and only then, block the application.

Pretty much the only time this matters and becomes confusing is when you’re copying a file and wanted an accurate transfer rate, and the target disk is much slower than the rest of the computer, ie. USB sticks and SD cards.

Example case: updating your system. The package manager will write a whole bunch of files everywhere, but also run a bunch of commands to update some other files, rebuild caches and indexes, maybe do some computations and compiling. The package manager will call sync at the end of the process, and it’s likely by the time you get there, most of the data will have been flushed to disk. So it runs much faster.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...