io_uring — The Modern Asynchronous I/O Revolution in Linux

Difficulty: Advanced

Reading Time: 10 min read

Last Updated: June 30, 2025


io_uring: A Modern Asynchronous I/O Interface

io_uring is a modern, high-performance asynchronous I/O interface introduced in Linux kernel 5.1 (March 2019). It was designed to significantly improve I/O efficiency for high-throughput, low-latency applications such as databases, file servers, filesystems, and web servers.

It provides a new Linux kernel system call interface that enables scalable, non-blocking I/O by minimizing system call overhead, context switches, and CPU usage. At its core, io_uring uses two shared ring buffers—the Submission Queue (SQ) and Completion Queue (CQ)—to manage I/O operations between user space and kernel space.

This model allows user-space programs to perform I/O with minimal syscalls, zero-copy capability, and high concurrency, addressing the limitations of older interfaces like read(), write(), aio_read(), and Linux AIO. It supports efficient asynchronous execution of file operations, socket communication, and more, making it ideal for performance-critical systems.


1. The Problem Before io_uring

Before io_uring, applications like web servers, databases, and file servers struggled to scale efficiently when handling:

  • Thousands of file reads and writes

  • Massive numbers of socket connections

  • High-speed log operations

Traditional Linux I/O methods—such as read(), write(), epoll, select(), aio_read()—suffered from several critical drawbacks:

  • Each I/O required a system call, which incurs costly context switches between user and kernel space.

  • Data had to be copied between user space and kernel space, adding further latency.

  • Achieving asynchronicity required complex mechanisms or user-space hacks.

  • These constraints led to high CPU usage, latency spikes, and scalability bottlenecks, especially under high concurrency and throughput.


2. io_uring Breaks the Traditional Boundary Between User Space and Kernel

io_uring is clever because it partially breaks the traditional boundary between user space and kernel space — but in a controlled and safe way.

Let’s first recap the traditional model, then see how io_uring changes it.

2.1 🧱 Traditional Model: Strict Boundary

User Space

  • Where your programs run

  • Safe and isolated from critical system resources

  • Must use system calls to request services from the kernel (e.g., I/O, memory access)

Kernel Space

  • Where the operating system and device drivers execute

  • Has full control over hardware and system resources

  • Handles and services system calls made by user space applications

💡 So every time your program does something like read(), write(), recv(), it calls into the kernel via a syscall. This is slow, because:

  • It switches context from user mode → kernel mode.

  • Kernel processes the request, returns the result.

  • This takes CPU time and adds latency.

2.2 🔄 io_uring's Model: Shared Ring Buffers

io_uring introduces a shared memory region between user space and kernel space using ring buffers:

  1. Submission Queue (SQ) Ring

    • User space writes I/O requests into this ring.

    • Kernel reads from it when ready.

    • No syscall needed for each I/O — just write to memory!

  2. Completion Queue (CQ) Ring

    • Kernel writes completed I/O results into this ring.

    • User space reads from it directly.

    • Again, no syscall — just read from memory!

These rings are set up once via a syscall (io_uring_setup()), and then the user and kernel share them directly in memory.

🎯 How This Breaks the Traditional Model

AspectTraditional I/Oio_uring
Syscall for each I/ORequiredOptional or batched
User ↔ Kernel communicationOnly through syscallsShared ring buffers
Kernel handles queuesHidden from user spaceExposed to user space (partially)
CPU context switch on each I/OYesReduced

So user space now directly manages the queues, bypassing many syscalls.

🔐 Is This a Security Risk?

No — the kernel still:

  • Validates each I/O request.

  • Maintains isolation.

  • Prevents unsafe memory access.

  • The shared buffers are controlled and mapped only through safe APIs.

🧠 Why is this powerful?

  • Eliminate the per-request syscall overhead by using shared memory queues

  • Batch multiple I/O operations in a single kernel interaction

  • Support all types of file descriptors, including buffered files and sockets

  • Enable polling-based models to avoid interrupts and reduce context switches

  • You can submit thousands of I/Os without a syscall per request.

  • You get very low latency, near-zero syscall overhead.

  • Ideal for high-performance, asynchronous applications.

This makes io_uring highly suitable for scalable, non-blocking, and performance-critical systems.


3. Use Cases

  • Web Servers:

    Efficiently handles 100,000+ concurrent connections without introducing latency, making it ideal for high-performance HTTP servers.

  • Databases:

    Enables fast reads and writes with minimal CPU usage, preventing I/O bottlenecks in data-intensive workloads.

  • File Servers:

    Processes thousands of simultaneous I/O operations, ensuring smooth throughput under heavy load.

  • Networked Applications:

    Speeds up socket communication, improving responsiveness for real-time or distributed systems.

  • Real-Time Logging Systems:

    Supports efficient and high-speed log writes, crucial for applications that generate large volumes of logs per second.

🛠️ Real-World Examples:

1. Let’s say you're building a video streaming server.

  • With old Linux I/O: Reading each video chunk = syscall + wait.

  • With io_uring: You batch all reads, submit once, no wait, and get notified when done.

Result: Smoother streaming, lower server load, more users per server.

2. Another example is a web server (e.g., written in C or Rust) that uses io_uring to handle 10,000 simultaneous client requests:

  • Submits 10,000 socket read operations to the Submission Queue.

  • Kernel processes I/O in the background, posting results to the Completion Queue.

  • Server polls the Completion Queue, processes responses, and continues handling new requests without blocking, achieving low latency and high throughput.

3. Analogy

Imagine a restaurant kitchen (kernel space) and waiters (user space):

  • In the old model, a waiter walks into the kitchen for every order.

  • With io_uring, there’s a shared clipboard:

    • Waiters write orders (I/O requests) to the clipboard.

    • Chefs (kernel) check it regularly and fulfill orders.

    • Once done, they write the result back on the same clipboard.

No back-and-forth walking = much faster service.


4. Core Concepts of io_uring

  • Submission Queue (SQ):

    User applications submit I/O requests—called Submission Queue Entries (SQEs)—to the kernel via a shared buffer. This buffer is writable only by the application.

  • Completion Queue (CQ):

    The kernel posts completed I/O results—called Completion Queue Entries (CQEs)—to a shared buffer that is writable only by the kernel.

  • SQE (Submission Queue Entry):

    Each I/O operation is described using an SQE, specifying the operation type, target file descriptor, buffer, and flags.

  • CQE (Completion Queue Entry):

    Once the I/O operation is complete, the kernel writes completion information (e.g., return code or number of bytes read/written) into a CQE.

  • Ring Buffers:

    Both SQ and CQ are implemented as memory-mapped ring buffers (circular queues), allowing efficient communication between user space and the kernel without syscalls for every I/O.


5. I/O Operations Supported by io_uring

  • File I/O: read(), write(), fsync, openat, close

  • Network I/O: accept, recv, send

  • Timeout handling and delay

  • File prefetching (fadvise)

  • Advanced operations: splice, tee, provide_buffers

⚙️ How io_uring Works

  1. Setup:Application calls io_uring_setup() to create ring buffers.

  2. Submit Requests:Fill a Submission Queue Entry (SQE) with I/O operation info (e.g., read, write, fsync, etc.), then submit using io_uring_enter().

  3. Get Completions:After the operation completes, the kernel places a Completion Queue Entry (CQE) in the completion ring. The application reads it to check status.

  4. No Context Switches (sometimes):If kernel-side support is enabled, I/O can be performed with no syscalls using SQPOLL mode (submission polling by a kernel thread).

Here's a Python example demonstrating how io_uring works using the python-liburing library (a Python binding for liburing):

from liburing import io_uring, io_uring_queue_init, io_uring_queue_exit, io_uring_submit
from liburing import io_uring_get_sqe, io_uring_prep_read, io_uring_wait_cqe
import os

def main():
    # 1. Setup: Create io_uring instance
    ring = io_uring()
    try:
        io_uring_queue_init(16, ring, 0)  # Initialize with 16 entries

        # 2. Submit Requests: Prepare a read operation
        file_path = "example.txt"
        fd = os.open(file_path, os.O_RDONLY)  # Open file
        buffer = bytearray(1024)  # Buffer for reading
        sqe = io_uring_get_sqe(ring)  # Get Submission Queue Entry
        io_uring_prep_read(sqe, fd, buffer, 0, 1024)  # Prepare read operation
        io_uring_submit(ring)  # Submit to kernel via io_uring_enter()

        # 3. Get Completions: Wait for and process completion
        cqe = io_uring_wait_cqe(ring)  # Wait for Completion Queue Entry
        if cqe.res >= 0:
            print(f"Read {cqe.res} bytes from {file_path}")
            print(buffer[:cqe.res].decode('utf-8'))
        else:
            print(f"Error: {cqe.res}")

        # 4. SQPOLL (optional): Not shown, requires kernel config
        # SQPOLL mode would be enabled via flags in io_uring_queue_init

    finally:
        os.close(fd)  # Close file
        io_uring_queue_exit(ring)  # Clean up ring

if __name__ == "__main__":
    main()

Explanation

  1. Setup: io_uring_queue_init(16, ring, 0) initializes io_uring with a queue depth of 16 entries using io_uring_setup().

  2. Submit Requests: io_uring_get_sqe retrieves an SQE, io_uring_prep_read fills it for a file read operation, and io_uring_submit calls io_uring_enter() to send it to the kernel.

  3. Get Completions: io_uring_wait_cqe retrieves a CQE from the completion queue, checking the result (bytes read or error).

  4. No Context Switches: SQPOLL mode (polling) isn't shown, as it requires kernel support and is enabled via flags in io_uring_queue_init.

Note: Requires python-liburing (pip install liburing). Ensure example.txt exists. SQPOLL mode needs kernel support and specific flags. Run with appropriate permissions.


6. Pros

  • Low Overhead: Batches multiple I/O requests in a single io_uring_enter() call, significantly reducing the number of system calls and their cost.

  • Polling Modes:

    • Submission Queue Polling (SQPOLL): A kernel thread continuously polls the Submission Queue to process I/Os without syscalls.

    • Completion Queue Polling: User space can poll the Completion Queue, reducing interrupts and latency.

  • Zero-Copy I/O: Enables direct buffer sharing between user and kernel space, minimizing memory copies and CPU usage.

  • Non-blocking I/O: Enables asynchronous execution, so applications (e.g., in C, Rust, or C# via native bindings) can continue processing while I/O completes in the background.

  • Batching: Allows submission and completion of multiple I/O operations at once, improving throughput and reducing overhead.

  • Asynchronous by Design: Built for non-blocking execution, ideal for highly concurrent systems.

  • Multishot Support: A single request (e.g., accept, recv) can yield multiple completions, perfect for repeated event handling without re-submission.

  • Supported Operations:

    • File I/O: Buffered and direct reads/writes

    • Network I/O: Sockets (e.g., accept, send, recv)

    • Advanced: fsync, openat, close, timeout, fadvise, splice, tee, provide_buffers

    • Multi-shot accept (since Linux 5.19) and multi-shot receive (since Linux 6.0)

  • Better Than Linux AIO:

    • Supports buffered I/O (unlike AIO’s O_DIRECT-only limitation)

    • Handles sockets and mixed I/O workloads efficiently

    • More deterministic, avoids blocking, and scales better under high concurrency

  • High Performance: Reduces CPU load via fewer syscalls and context switches and Reduces latency for I/O operations, critical for applications handling thousands of concurrent connections (e.g., Nginx, Redis).

  • Scalability: Supports massive I/O workloads (e.g., millions of file reads or socket events) without overwhelming the CPU or kernel.

  • Library Support: liburing provides a user-space API to simplify usage of the io_uring interface.


7. Cons

Although kernel validation and isolation for syscalls, Google Limits io_uring because:

  • Complexity and Attack Surface: io_uring is a highly flexible and powerful interface, supporting a wide range of operations (over 61 types). This complexity, while enabling performance, also introduces a larger attack surface. New features and interactions can lead to unforeseen security bugs. This exposes a large attack surface; 60% of Linux kernel exploits in Google’s 2022 bug bounty program targeted io_uring vulnerabilities.

  • Vulnerability History: As recent search results indicate, io_uring has had a notable history of vulnerabilities, with Google reporting that a significant percentage of kernel exploits in their bug bounty program targeted io_uring. These vulnerabilities often lead to local privilege escalation (LPE).

  • Evasion of Traditional Security Tools: One of the most significant security concerns with io_uring is its ability to bypass traditional system call monitoring. Since io_uring allows applications to perform I/O operations and other actions without making explicit, individual system calls that security tools typically hook, it can create a "blind spot" for some runtime security solutions. This means malware or rootkits could potentially use io_uring to operate more stealthily. This made Rootkits (e.g., “Curing” by ARMO) can bypass syscall-monitoring security tools (e.g., Falco, Tetragon) by using io_uring, as it avoids traditional system calls.

  • Default Disablement in Some Environments: Due to the security concerns, some environments, like Android and ChromeOS, and even Google's production servers, have either disabled io_uring by default or severely restricted its use to trusted code. Docker also has a history of blocking io_uring syscalls by default in containers.


8. Conclusion

io_uring redefines how asynchronous I/O is performed in Linux by breaking away from the syscall-per-request paradigm. Through shared ring buffers, batching, polling, and zero-copy operations, it empowers developers to build systems that are scalable, non-blocking, and high-throughput by design.

While it comes with a larger attack surface and some security trade-offs, its performance advantages make it a game-changer for I/O-intensive applications—especially in domains like real-time networking, high-speed logging, databases, and modern file servers.

As Linux continues to evolve, io_uring stands at the forefront of next-generation system-level I/O. For developers working with large-scale I/O or building low-latency infrastructure, understanding and embracing io_uring is no longer optional—it’s essential.


9. Key Takeaways

  1. Modern Asynchronous I/O:

    io_uring replaces traditional syscall-heavy interfaces with shared memory ring buffers, drastically reducing overhead.

  2. Built for Performance and Scalability:

    Supports batching, zero-copy, and multishot operations, enabling millions of concurrent I/O events with minimal CPU cost.

  3. Real-World Impact:

    Power high-performance systems like web servers, databases, and loggers with unparalleled I/O efficiency.

  4. Security Trade-offs Exist:

    While powerful, its complexity and syscall-avoidance design make it harder to monitor, and thus more vulnerable to advanced exploits.

  5. Evolving Ecosystem:

    Supported by libraries like liburing and bindings for Rust, Python, and more, it's becoming more accessible to modern developers.

  6. Know When to Use It:

    Ideal for performance-critical systems—but for simple workloads, its complexity may not be justified.


10. References and Further Reading

  1. Unixism io_uring for Linux

  2. Rust io_uring

  3. kernel.dk

  4. Google Concerns


About the Author

Abdul-Hai Mohamed | Software Engineering Geek’s.

Writes in-depth articles about Software Engineering and architecture.

Follow on GitHub and LinkedIn.

0
Subscribe to my newsletter

Read articles from Abdulhai Mohamed Samy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abdulhai Mohamed Samy
Abdulhai Mohamed Samy