io_uring — The Modern Asynchronous I/O Revolution in Linux


Difficulty: Advanced
Reading Time: 10 min read
Last Updated: June 30, 2025
io_uring: A Modern Asynchronous I/O Interface
io_uring is a modern, high-performance asynchronous I/O interface introduced in Linux kernel 5.1 (March 2019). It was designed to significantly improve I/O efficiency for high-throughput, low-latency applications such as databases, file servers, filesystems, and web servers.
It provides a new Linux kernel system call interface that enables scalable, non-blocking I/O by minimizing system call overhead, context switches, and CPU usage. At its core, io_uring uses two shared ring buffers—the Submission Queue (SQ) and Completion Queue (CQ)—to manage I/O operations between user space and kernel space.
This model allows user-space programs to perform I/O with minimal syscalls, zero-copy capability, and high concurrency, addressing the limitations of older interfaces like read(), write(), aio_read(), and Linux AIO. It supports efficient asynchronous execution of file operations, socket communication, and more, making it ideal for performance-critical systems.
1. The Problem Before io_uring
Before io_uring, applications like web servers, databases, and file servers struggled to scale efficiently when handling:
Thousands of file reads and writes
Massive numbers of socket connections
High-speed log operations
Traditional Linux I/O methods—such as read(), write(), epoll, select(), aio_read()—suffered from several critical drawbacks:
Each I/O required a system call, which incurs costly context switches between user and kernel space.
Data had to be copied between user space and kernel space, adding further latency.
Achieving asynchronicity required complex mechanisms or user-space hacks.
These constraints led to high CPU usage, latency spikes, and scalability bottlenecks, especially under high concurrency and throughput.
2. io_uring Breaks the Traditional Boundary Between User Space and Kernel
io_uring is clever because it partially breaks the traditional boundary between user space and kernel space — but in a controlled and safe way.
Let’s first recap the traditional model, then see how io_uring changes it.
2.1 🧱 Traditional Model: Strict Boundary
User Space
Where your programs run
Safe and isolated from critical system resources
Must use system calls to request services from the kernel (e.g., I/O, memory access)
Kernel Space
Where the operating system and device drivers execute
Has full control over hardware and system resources
Handles and services system calls made by user space applications
💡 So every time your program does something like read(), write(), recv(), it calls into the kernel via a syscall. This is slow, because:
It switches context from user mode → kernel mode.
Kernel processes the request, returns the result.
This takes CPU time and adds latency.
2.2 🔄 io_uring's Model: Shared Ring Buffers
io_uring introduces a shared memory region between user space and kernel space using ring buffers:
Submission Queue (SQ) Ring
User space writes I/O requests into this ring.
Kernel reads from it when ready.
No syscall needed for each I/O — just write to memory!
Completion Queue (CQ) Ring
Kernel writes completed I/O results into this ring.
User space reads from it directly.
Again, no syscall — just read from memory!
These rings are set up once via a syscall (io_uring_setup()), and then the user and kernel share them directly in memory.
🎯 How This Breaks the Traditional Model
Aspect | Traditional I/O | io_uring |
Syscall for each I/O | Required | Optional or batched |
User ↔ Kernel communication | Only through syscalls | Shared ring buffers |
Kernel handles queues | Hidden from user space | Exposed to user space (partially) |
CPU context switch on each I/O | Yes | Reduced |
So user space now directly manages the queues, bypassing many syscalls.
🔐 Is This a Security Risk?
No — the kernel still:
Validates each I/O request.
Maintains isolation.
Prevents unsafe memory access.
The shared buffers are controlled and mapped only through safe APIs.
🧠 Why is this powerful?
Eliminate the per-request syscall overhead by using shared memory queues
Batch multiple I/O operations in a single kernel interaction
Support all types of file descriptors, including buffered files and sockets
Enable polling-based models to avoid interrupts and reduce context switches
You can submit thousands of I/Os without a syscall per request.
You get very low latency, near-zero syscall overhead.
Ideal for high-performance, asynchronous applications.
This makes io_uring highly suitable for scalable, non-blocking, and performance-critical systems.
3. Use Cases
Web Servers:
Efficiently handles 100,000+ concurrent connections without introducing latency, making it ideal for high-performance HTTP servers.
Databases:
Enables fast reads and writes with minimal CPU usage, preventing I/O bottlenecks in data-intensive workloads.
File Servers:
Processes thousands of simultaneous I/O operations, ensuring smooth throughput under heavy load.
Networked Applications:
Speeds up socket communication, improving responsiveness for real-time or distributed systems.
Real-Time Logging Systems:
Supports efficient and high-speed log writes, crucial for applications that generate large volumes of logs per second.
🛠️ Real-World Examples:
1. Let’s say you're building a video streaming server.
With old Linux I/O: Reading each video chunk = syscall + wait.
With io_uring: You batch all reads, submit once, no wait, and get notified when done.
Result: Smoother streaming, lower server load, more users per server.
2. Another example is a web server (e.g., written in C or Rust) that uses io_uring to handle 10,000 simultaneous client requests:
Submits 10,000 socket read operations to the Submission Queue.
Kernel processes I/O in the background, posting results to the Completion Queue.
Server polls the Completion Queue, processes responses, and continues handling new requests without blocking, achieving low latency and high throughput.
3. Analogy
Imagine a restaurant kitchen (kernel space) and waiters (user space):
In the old model, a waiter walks into the kitchen for every order.
With io_uring, there’s a shared clipboard:
Waiters write orders (I/O requests) to the clipboard.
Chefs (kernel) check it regularly and fulfill orders.
Once done, they write the result back on the same clipboard.
No back-and-forth walking = much faster service.
4. Core Concepts of io_uring
Submission Queue (SQ):
User applications submit I/O requests—called Submission Queue Entries (SQEs)—to the kernel via a shared buffer. This buffer is writable only by the application.
Completion Queue (CQ):
The kernel posts completed I/O results—called Completion Queue Entries (CQEs)—to a shared buffer that is writable only by the kernel.
SQE (Submission Queue Entry):
Each I/O operation is described using an SQE, specifying the operation type, target file descriptor, buffer, and flags.
CQE (Completion Queue Entry):
Once the I/O operation is complete, the kernel writes completion information (e.g., return code or number of bytes read/written) into a CQE.
Ring Buffers:
Both SQ and CQ are implemented as memory-mapped ring buffers (circular queues), allowing efficient communication between user space and the kernel without syscalls for every I/O.
5. I/O Operations Supported by io_uring
File I/O: read(), write(), fsync, openat, close
Network I/O: accept, recv, send
Timeout handling and delay
File prefetching (fadvise)
Advanced operations: splice, tee, provide_buffers
⚙️ How io_uring Works
Setup:Application calls io_uring_setup() to create ring buffers.
Submit Requests:Fill a Submission Queue Entry (SQE) with I/O operation info (e.g., read, write, fsync, etc.), then submit using io_uring_enter().
Get Completions:After the operation completes, the kernel places a Completion Queue Entry (CQE) in the completion ring. The application reads it to check status.
No Context Switches (sometimes):If kernel-side support is enabled, I/O can be performed with no syscalls using SQPOLL mode (submission polling by a kernel thread).
Here's a Python example demonstrating how io_uring works using the python-liburing library (a Python binding for liburing):
from liburing import io_uring, io_uring_queue_init, io_uring_queue_exit, io_uring_submit
from liburing import io_uring_get_sqe, io_uring_prep_read, io_uring_wait_cqe
import os
def main():
# 1. Setup: Create io_uring instance
ring = io_uring()
try:
io_uring_queue_init(16, ring, 0) # Initialize with 16 entries
# 2. Submit Requests: Prepare a read operation
file_path = "example.txt"
fd = os.open(file_path, os.O_RDONLY) # Open file
buffer = bytearray(1024) # Buffer for reading
sqe = io_uring_get_sqe(ring) # Get Submission Queue Entry
io_uring_prep_read(sqe, fd, buffer, 0, 1024) # Prepare read operation
io_uring_submit(ring) # Submit to kernel via io_uring_enter()
# 3. Get Completions: Wait for and process completion
cqe = io_uring_wait_cqe(ring) # Wait for Completion Queue Entry
if cqe.res >= 0:
print(f"Read {cqe.res} bytes from {file_path}")
print(buffer[:cqe.res].decode('utf-8'))
else:
print(f"Error: {cqe.res}")
# 4. SQPOLL (optional): Not shown, requires kernel config
# SQPOLL mode would be enabled via flags in io_uring_queue_init
finally:
os.close(fd) # Close file
io_uring_queue_exit(ring) # Clean up ring
if __name__ == "__main__":
main()
Explanation
Setup: io_uring_queue_init(16, ring, 0) initializes io_uring with a queue depth of 16 entries using io_uring_setup().
Submit Requests: io_uring_get_sqe retrieves an SQE, io_uring_prep_read fills it for a file read operation, and io_uring_submit calls io_uring_enter() to send it to the kernel.
Get Completions: io_uring_wait_cqe retrieves a CQE from the completion queue, checking the result (bytes read or error).
No Context Switches: SQPOLL mode (polling) isn't shown, as it requires kernel support and is enabled via flags in io_uring_queue_init.
Note: Requires python-liburing (pip install liburing). Ensure example.txt exists. SQPOLL mode needs kernel support and specific flags. Run with appropriate permissions.
6. Pros
Low Overhead: Batches multiple I/O requests in a single io_uring_enter() call, significantly reducing the number of system calls and their cost.
Polling Modes:
Submission Queue Polling (SQPOLL): A kernel thread continuously polls the Submission Queue to process I/Os without syscalls.
Completion Queue Polling: User space can poll the Completion Queue, reducing interrupts and latency.
Zero-Copy I/O: Enables direct buffer sharing between user and kernel space, minimizing memory copies and CPU usage.
Non-blocking I/O: Enables asynchronous execution, so applications (e.g., in C, Rust, or C# via native bindings) can continue processing while I/O completes in the background.
Batching: Allows submission and completion of multiple I/O operations at once, improving throughput and reducing overhead.
Asynchronous by Design: Built for non-blocking execution, ideal for highly concurrent systems.
Multishot Support: A single request (e.g., accept, recv) can yield multiple completions, perfect for repeated event handling without re-submission.
Supported Operations:
File I/O: Buffered and direct reads/writes
Network I/O: Sockets (e.g., accept, send, recv)
Advanced: fsync, openat, close, timeout, fadvise, splice, tee, provide_buffers
Multi-shot accept (since Linux 5.19) and multi-shot receive (since Linux 6.0)
Better Than Linux AIO:
Supports buffered I/O (unlike AIO’s O_DIRECT-only limitation)
Handles sockets and mixed I/O workloads efficiently
More deterministic, avoids blocking, and scales better under high concurrency
High Performance: Reduces CPU load via fewer syscalls and context switches and Reduces latency for I/O operations, critical for applications handling thousands of concurrent connections (e.g., Nginx, Redis).
Scalability: Supports massive I/O workloads (e.g., millions of file reads or socket events) without overwhelming the CPU or kernel.
Library Support: liburing provides a user-space API to simplify usage of the io_uring interface.
7. Cons
Although kernel validation and isolation for syscalls, Google Limits io_uring because:
Complexity and Attack Surface:
io_uring
is a highly flexible and powerful interface, supporting a wide range of operations (over 61 types). This complexity, while enabling performance, also introduces a larger attack surface. New features and interactions can lead to unforeseen security bugs. This exposes a large attack surface; 60% of Linux kernel exploits in Google’s 2022 bug bounty program targeted io_uring vulnerabilities.Vulnerability History: As recent search results indicate,
io_uring
has had a notable history of vulnerabilities, with Google reporting that a significant percentage of kernel exploits in their bug bounty program targetedio_uring
. These vulnerabilities often lead to local privilege escalation (LPE).Evasion of Traditional Security Tools: One of the most significant security concerns with
io_uring
is its ability to bypass traditional system call monitoring. Sinceio_uring
allows applications to perform I/O operations and other actions without making explicit, individual system calls that security tools typically hook, it can create a "blind spot" for some runtime security solutions. This means malware or rootkits could potentially useio_uring
to operate more stealthily. This made Rootkits (e.g., “Curing” by ARMO) can bypass syscall-monitoring security tools (e.g., Falco, Tetragon) by using io_uring, as it avoids traditional system calls.Default Disablement in Some Environments: Due to the security concerns, some environments, like Android and ChromeOS, and even Google's production servers, have either disabled
io_uring
by default or severely restricted its use to trusted code. Docker also has a history of blockingio_uring
syscalls by default in containers.
8. Conclusion
io_uring
redefines how asynchronous I/O is performed in Linux by breaking away from the syscall-per-request paradigm. Through shared ring buffers, batching, polling, and zero-copy operations, it empowers developers to build systems that are scalable, non-blocking, and high-throughput by design.
While it comes with a larger attack surface and some security trade-offs, its performance advantages make it a game-changer for I/O-intensive applications—especially in domains like real-time networking, high-speed logging, databases, and modern file servers.
As Linux continues to evolve, io_uring
stands at the forefront of next-generation system-level I/O. For developers working with large-scale I/O or building low-latency infrastructure, understanding and embracing io_uring
is no longer optional—it’s essential.
9. Key Takeaways
Modern Asynchronous I/O:
io_uring
replaces traditional syscall-heavy interfaces with shared memory ring buffers, drastically reducing overhead.Built for Performance and Scalability:
Supports batching, zero-copy, and multishot operations, enabling millions of concurrent I/O events with minimal CPU cost.
Real-World Impact:
Power high-performance systems like web servers, databases, and loggers with unparalleled I/O efficiency.
Security Trade-offs Exist:
While powerful, its complexity and syscall-avoidance design make it harder to monitor, and thus more vulnerable to advanced exploits.
Evolving Ecosystem:
Supported by libraries like
liburing
and bindings for Rust, Python, and more, it's becoming more accessible to modern developers.Know When to Use It:
Ideal for performance-critical systems—but for simple workloads, its complexity may not be justified.
10. References and Further Reading
About the Author
Abdul-Hai Mohamed | Software Engineering Geek’s.
Writes in-depth articles about Software Engineering and architecture.
Subscribe to my newsletter
Read articles from Abdulhai Mohamed Samy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
