Benchmarking Linux FUSE Overhead


Our experiments indicate that depending on the workload and hardware used, performance degradation caused by FUSE can be completely imperceptible or as high as –83% even when optimized; and relative CPU utilization can increase by 31%.
Ref: To FUSE or Not to FUSE: Performance of User-Space File Systems | USENIX
Check Installed Storage Types
Firstly, I wanted to check what kind of storage devices are installed. This would help in creating the benchmarking workload. The primary goal of this exercise is to figure out the bottlenecks in FUSE and not the disk. So, the test should be configured according to the disk type.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 4K 1 loop /snap/bare/5
loop1 7:1 0 73.9M 1 loop /snap/core22/2045
loop2 7:2 0 516M 1 loop /snap/gnome-42-2204/202
loop3 7:3 0 91.7M 1 loop /snap/gtk-common-themes/1535
loop4 7:4 0 49.3M 1 loop /snap/snapd/24792
loop5 7:5 0 50.8M 1 loop /snap/snapd/25202
loop6 7:6 0 73.9M 1 loop /snap/core22/2082
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 39G 0 part /
├─sda14 8:14 0 4M 0 part
├─sda15 8:15 0 106M 0 part /boot/efi
└─sda16 259:0 0 913M 0 part /boot
Only sda
is present in the output, so there are only HDDs and no NVMe SSDs installed.
Check FUSE Configuration
Many of the configurations for FUSE are set during the kernel compilation. These can be checked by looking into the kernel's boot configuration.
$ cat /boot/config-$(uname -r) | grep FUSE
CONFIG_FUSE_FS=y
CONFIG_FUSE_DAX=y
CONFIG_FUSE_PASSTHROUGH=y
CONFIG_FUSE_IO_URING=y
FUSE and passthrough, the flags we are interested in, are enabled. Another interesting flag is CONFIG_FUSE_IO_URING
that hooks up FUSE with IO_uring. This might be more powerful than passthrough if we plan to integrate the backing filesystem in user space.
Benchmarking and Tools
Doing a basic google search revealed some tools that can be used for benchmarking a filesystem.
Designing the fio
workload
Small Random IO - Latency
Small block size creates small IOPS thus increasing the number of kernel context switches.
Small IO depth forces latency measurement as each IO operation has to resolve immediately preventing any reorder optimizations.
Large IO depth coupled with small block sizes showcases real-life scenarios.
Large Sequential IO - Throughput
Large block size to minimizes the number of IO operations and increases the data transferred, focusing the throughput instead of latency.
Large IO depth makes kernel submit multiple IO operations, allowing HDD to reorder IOs for minimal head movement.
Metadata Operations - Overhead
Purely metadata operations like
create
andstat
have no data transfer involved. This helps in figuring out the overhead purely by FUSE.Large number of files helps in averaging out the cost of operations.
exouser@dimuthu-fuse-benchmark:~/sbtc$ ls -l *.fio
-rw-rw-r-- 1 exouser exouser 325 Aug 27 08:30 latency.fio
-rw-rw-r-- 1 exouser exouser 180 Aug 27 08:30 overhead.fio
-rw-rw-r-- 1 exouser exouser 193 Aug 27 08:30 throughput.fio
Running the Benchmarks
Each of the
$ fio latency.fio --filename=$PWD/nexus-fs/build/fuse_mount/$PWD/fiotest.dat --output-format=json --output=fuse-latency.json
Determining Bottlenecks
Kernel Context Switching
Data Copying
Latency for Small Operations
Subscribe to my newsletter
Read articles from Swebert Correa directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Swebert Correa
Swebert Correa
I am a software engineer at Rakuten Symphony. On a daily basis, I deal with storage and distributed systems. This involves pumping out features, figuring out race conditions, mitigating deadlocks, and coding by keeping network latency in mind. That's a lot of C and GDB 😛