Against ROP Attacks

CryptapeCryptape
18 min read

Security is fundamental to any blockchain. It ensures that all tokens are secure. When talking about a virtual machine and the smart contract platform it forms, security comes in two main aspects:

  • The code running on the virtual machine must be secure

  • The virtual machine itself should also be designed to facilitate safer code execution

The first aspect often gets sufficient attention. When it comes to CKB, we now encourage developers to write scripts in Rust for maximum security, reserving pure C code only for those who fully understand its risks. Additionally, higher-level languages have been introduced in CKB to strike a better balance between productivity and security.

Virtual machine security was a major focus when CKB-VM was originally designed. Many potential risks were addressed at the architectural level, though some—despite thorough research—were still left open. One such issue is Return-Oriented Programming (ROP)—a rather ingenious attack. It exploits executable code that has been legitimately loaded into memory, making widely effective protections (e.g., W^X) futile. It spans multiple architectures and is constantly evolving. Although we’ve spent a great deal of effort in the early days on ROP, we did not implement specific countermeasures to prevent it. Now, with new RISC-V extensions now available, it is the perfect time to introduce design-level protections against ROP.

Acknowledgments

Before diving deeper, we would like to acknowledge Todd Mortimer from the OpenBSD team. His work on ROP mitigations at the OpenBSD kernel in 2018-2019 significantly inspired our research and this article. We highly recommend his talk, slide decks from AsiaBSDCon 2019 and EuroBSD 2018, and this paper for a deeper understanding of ROP. Several examples on x64 ROP attacks in this post are also drawn from his research.

Typical Attack Workflow

While there are many sophisticated ways of attacks, a common attack on a program typically follows this process:

  1. Prepare a shellcode— a piece of binary code to perform specific actions (e.g., running a shell or other programs on the target computer).

  2. Exploit one possible vulnerability in the target system, most commonly a buffer overflow attack. The attack could be initiated via a network protocol (such as HTTP) against a remote system, or via command line input to a target program;

  3. As the result of the attack, the shellcode is inserted to a designated memory region of the target system and gets executed, allowing the attacker to achieve their goal. The consequences vary, like gaining unauthorized access to sensitive data, destroying certain data/machine, planting malicious programs onto the target for further actions, manipulating control flow.

While traditional systems face a wide range of attacks, blockchains run in their own limited and unique runtime environment, rendering many conventional attacks irrelevant. Major blockchain security threats includes:

  • Private key security: Blockchain wallets rely on private keys, which are prime targets for various attacks.

  • Smart contract vulnerability: Poorly written smart contracts contain logic flaws that lead to security risks.

  • Virtual machine security: Attacker may send malicious inputs to a smart contract, causing it to terminate unexpectedly with a success status—despite lacking proper credentials.

This post focuses specifically on attacks targeting the blockchain’s virtual machine—in our case—CKB Virtual Machine (CKB-VM) specifically.

CKB’s Early Approach

While it is impossible to predict every attack, disrupting the typical attack workflow is an effective defense strategy. From its inception, CKB-VM has implemented W^X protection: at any given time, any memory location in CKB-VM is either writable (allowing data modification) or executable (allowing data execution)—but never both. Once a memory region is marked as executable, it cannot be reverted to writable throughout the lifecycle of the current CKB-VM instance. Only writable memory location can be frozen to executable.

This design significantly disrupts the typical attack workflow. For shellcode to execute on CKB-VM, it must reside in executable memory. However, an attacker can only provide shellcode as part of program inputs, which are loaded into writable memory. As long as a CKB script does not voluntarily mark input data as executable (a highly unlikely scenario), the shellcode remains inert. Additionally, attempting to overwrite existing executable shellcode is also futile, since executable memory region is unwritable, and cannot be converted back to writable.

This way, W^X is a well-established security technique widely used in modern hardwares, operating systems, and virtual machines. Although it cannot prevent all possible attacks, W^X effectively shields many by breaking the standard attack workflow. Even if an attacker successfully injects shellcode into a target machine, the attack is incomplete due to the inability to execute it.

Understanding ROP

While W^X is effective, it does not solve all our problems. This leads to the topic of this post: Return-oriented Programming (ROP). Instead of explicitly injecting new code, ROP exploits executable code that already resides in the target machine’s memory. Essentially, ROP builds a shellcode by chaining existing code snippets together that were never intended to function as such. It may sound like a fantasy, but as we shall see from the following examples, ROP is a practical and effective attack technique.

To understand ROP, we must first examine modern CPU architecture and memory layout. While assembly instructions vary in representations and lengths, they are put together in memory one after another as a stream of bytes:

Image Source

As seen in the above example, different assembly instructions come in different lengths. For x86-64 ISA, an instruction can range from 1 to 7 bytes (RISC ISAs such as ARM or RISC-V have more uniform instruction lengths—we will discuss it later). But in memory, instructions are stored sequentially without gaps.

This means that with a stream of bytes alone, we really don’t know what instructions the stream of bytes consist of. In the above example, meaningful assembly instructions emerge only when we start decoding from the B8 byte. In a different occasion, assuming we know elsewhere that B8 22 11 bytes at the front are for certain magic flags, the decoding would start from 00 byte, yielding a totally different instruction set.

Image Source

It is really the combination of a special program counter (PC) register from the CPU and the current memory stream, jointly determine the instructions the CPU executes. Depending on each different ISA or hardware, a booting process initializes a CPU’s PC register to a pre-defined value, then loads up instructions from this pre-defined address, and initializes everything related to the operating system. When a user launches a new program, the metadata for each program will contain an entrypoint address, where OS sets the CPU’s PC register to, in order to start executing the program. It is suffice to say that maintaining a proper PC value is a critical job to ensure a computer’s proper function. An invalid PC value might lead to a CPU malfunction at best, or at worst, leaking sensitive information or granting attackers unauthorized access.

Forming an ROP Attack Via ROP Gadgets

Let’s look at the following byte instruction stream in a x86-64 CPU:

  8a 5d c3         movb -61(%rbp), %bl

This 3-byte represents a mov instruction: it takes the address of rbp register, adds an offset of -61, then uses the result as a memory address to load 1 byte data, and finally sets the loaded data to bl register. However, if we ignore 8a and only look at 5b c3 here, it actually represents a different instruction set:

  5d               popq %rbp
  c3               retq

This byte sequence contains two instructions:

  • Pop 8-byte value from stack, and use it to set rbp register

  • Pop 8-byte value from stack, and use it to set PC register, so we continue executing from the new location

We‘ve briefly discussed that shellcode only fulfills a certain task required by the attacker. In fact, the most common type of shellcode simply construct a new shell, where the attacker can execute more operations. Such shellcode can be represented in the following C pseudocode to run a new command via the execve syscall:

execve(“/bin/sh”, NULL, NULL);

To execute this on an x86-64 CPU, the following actions are needed for a syscall:

  • rax register: must contain the syscall number, for execve, it is 59

  • rdi, rsi, rdx registers: hold the first 3 arguments to the syscall. In this case, rdi holds a pointer to the C string /bin/sh; rsi and rdx must be zero.

  • The syscall instruction (or typically int 80h on x64) shall be executed

A typical shellcode would be a packed assembly sequence directly doing all of the above instructions. In contrast, ROP attack looks for the following sequences:

# Those can set the value of different registers from values on the stack
pop %rax; ret
pop %rdi; ret
pop %rsi; ret
pop %rdx; ret

# Finally, trigger the syscall
syscall

Each of these small code sequences, are conventionally callled ROP gadgets. An attacker searches for these gadgets in the target program or system libraries (such as libc). Once these required gadgets are obtained, the attacker pieces together a sequence of data, much like the following:

AddressLengthValue
X8“/bin/sh”
X + 88Address to code sequence syscall
X + 1680
X + 248Address to code sequence pop %rdx; ret
X + 3280
X + 408address to code sequence pop %rsi; ret
X + 488X
X + 568address to code sequence pop %rdi; ret
X + 64859

With the prepared data sequence, the attacker can exploit a vulnerability in the target computer or program, such as typical buffer overflow attack. During this process, the attacker performs three key actions:

  • Pushes (or overwrites existing data) the crafted data sequence to the stack

  • Sets the stack pointer (top of the stack) to X + 64

  • Sets the PC register to the address of a code sequence, pop %rax; ret in the existing program or libc memory space

Now the attack proceeds step by step as follows:

  1. The CPU runs pop %rax; ret. With the stack pointer pointing to X + 64, the CPU pops 59 from the stack and sets rax register to 59. It then pops the address to code sequence pop %rdi; ret from the stack, and sets PC to this value;

  2. The CPU runs pop %rdi; ret. With the stack pointer pointing to X + 48, the CPU pops value X, pointing to the C string /bin/sh from the stack, and sets rdi register to X. It then pops the address to code sequence pop %rsi; ret from the stack, and sets PC to this value;

  3. The CPU runs pop %rsi; ret. With the stack pointer pointing to X + 32, the CPU pops 0 from the stack and sets rsi register to 0. It then pops the address to code sequence pop %rdx; ret from the stack, and sets PC to this value;

  4. The CPU runs pop %rdx; ret. With the stack pointer pointing to X + 16, the CPU pops 0 from the stack and sets rdx register to 0. It then pops the address to code sequence syscall from the stack, and sets PC to this value;

  5. The CPU runs syscall. At this point, rax holds 59, rdi points to /bin/sh, and both rsi and rdx are zero, the CPU invokes execve("/bin/sh, NULL, NULL);, granting the attacker a shell for further manipulations.

This sequence of ROP gadgets, referred to as ROP chains, demonstrates how a complete ROP attack works. Two key takeaways are:

  • ROP does not inject new code. Instead, it injects data into the stack and leverages the existing code loaded in memory and marked them as executable. W^X protections hence cannot prevent ROP attacks.

  • Attackers can mine ROP gadgets from the libc library. This is because modern computers employs protection rings as a way for privilege encapsulations: on x86-64 computers, programs normally run at ring level 3, while libc runs at ring level 1. Lower ring levels have higher privileges, meaning that even if a program misbehaves, its capacities are limited at ring level 3. However, by using ROP gadgets in the libc library which runs at ring level 1, ROP attacks can have higher privileges and execute more damaging operations then normal shellcodes.

Note that the above examples simply show the most basic ROP gadgets. In reality, ROP gadgets come in all kinds of forms. Since they come from compiler outputs, they can be combined in the least expected way, and can vary the forms as new compiler optimizations come out. Numerous tools (e.g., ropper, ropr) and research papers (e.g., Experiments on ROP Attack with Various Instruction Set Architectures, ROPGMN, Detecting and Preventing ROP Attacks using Machine Learning on ARM, KROP ) keep coming out, making it almost impossible to enumerate all possible ROP gadget combinations.

ROP on ARM & RISC-V

ROP attacks are not limited to CISC architectures, where instructions vary in length. They also affect RISC designs, such as ARM and RISC-V. Take the following sequence for example:

13 4f 83 23 0b 00

Decoding from the start, the first four bytes represent xori t5,t1,568 following the RISC-V ISA. But if we skip the first two, the latter four represent lw t2,0(s6). This illustrates that a byte stream interpretation also requires PC register in a RISC design such as RISC-V. As a result, one can find ROP gadgets from a RISC-V program as well.

ROP on CKB-VM

CKB’s RISC-V machine operates in a more restricted environment: for programs running on CKB, there are no execve syscalls to hijack a running shell, and all runtime states are publicly visible on a public blockchain like CKB. However, ROP attacks can still occur on CKB: one could construct an ROP chain that sets a0 to 0, a7 to 93, then executes ecall. This causes CKB-VM to immediately return with a success code (0), potentially allowing a script to pass validation when it should have failed—such as a lock script succeeding without a valid signature.

Short Recap

Let’s briefly recap what we’ve learned so far:

  • ROP attacks utilize existing executable code for malicious purposes. W^X cannot prevent ROP.

  • ROP is possible across multiple architecture, including x86-64, ARM, RISC-V, and CKB.

  • The landscape of ROP is constantly evolving. With new tools, techniques, and research emerging regularly, it’s impossible to foresee all ROP gadgets.

ROP has been extensively studied over the years, leading to various mitigation strategies, which can be broadly categorized into two main approaches:

  • Software Solutions: Covering techniques like rewriting code sequences and implementing Retguard to prevent the creation of ROP gadgets

  • Hardware Solutions: Introducing additional CPU instructions with Control Flow Integrity (CFI) checks to safeguard control flow.

I’ll explore these strategies in greater detail in the following sections.

Software Solutions to Mitigate ROP

Rewriting Sequence

Certain instruction sequences are often targeted to form ROP gadgets. To prevent ROP, one approach is to alter the compiler, so that such sequences can never be generated. Take the following example:

  89 c3            mov    %eax,%ebx

In x86-64, c3 represents the ret instruction, making it a potential target for ROP gadgets. We can rewrite it into the following equivalent sequence:

  48 87 d8         xchgq %rbx, %rax
  89 d8            movl %ebx, %eax
  48 87 d8         xchgq %rbx, %rax

The new sequence lacks c3 byte at the expense of more bytes and more executed instructions. However, it is really up to real benchmarks to see if this causes noticeable overhead.

Further analysis has revealed that the rbx register in x86-64 is often the source of ROP gadgets, due to the way Intel encodes x86-64 instructions. Hence, the OpenBSD team decided to avoid rbx register wherever possible, reducing the number of potential ROP gadgets.

Again, this approach comes at the cost of having bigger code fragments, more instructions to execute, and an additional patched compiler. While OpenBSD has integrated these changes into its distribution, other environments must weigh the benefits against the costs.

For a deeper dive, I would strongly recommend Todd Mortimer’s work.

Retguard’s Solution: Prologue and Epilogue

Todd Mortimer also introduced Retguard in this work for securing OpenBSD known. ROP attacks typically occur when you enter a function foo, but the stack was manipulated, so the CPU exits to another code fragment that is not foo. What if to verify that, at each function exit, it is the same function for exiting and entering?

Retguard introduces two components to perform this task:

  • Prologue: A prologue is inserted to each function’s entry, taking two inputs:

    • A cookie value, a random data assigned for this particular function.

    • The return address, where to jump to when current function exits—as inputs.

The prologue computes the XOR value of these two, and stores the result into the current function’s frame section, a dedicated memory region designated to the current function to hold data, separated from the stack.

  • Epilogue: An epilogue is inserted to the location where a function might exit. It takes two inputs:

    • The saved XOR value from the prologue in the frame section

    • The return address it now can access to (most likely popped from the stack in x64 machine, or read from a special RA register in RISC design)

The epilogue computes the XOR of these two. If the result matches the original cookie, execution proceeds. Otherwise, the epilogue halts the program, signaling an error.

This prologue-epilogue mechanism in Retguard guards the call stack from tampering. At a noticeable but acceptable cost (both in performance and code size), Retguard eliminates a significant number of ROP gadgets from the OpenBSD kernel. Like other software-based mitigations, it requires a patched compiler, and it is up to each environment to decide if such technique shall be employed.

Hardware Advancements to Mitigate ROP

In addition to software solutions, hardware-based defenses have also been developed. For instance, Intel has introduced Indirect Branch Tracking feature starting with its 12th generation core processors, using a new instruction endbr32 or endbr64 added at every location the program might jump to or call into. When the CPU executes a jump/call, it asserts that the target location is a proper endbr32 / endbr64 instruction, before updating the program counter PC register to proper values. Otherwise, the CPU halts to terminate the program. This ensures that all control flows will follow the intended way, preventing ROP attacks from redirecting execution arbitrary locations.

Modern OSes have already extensively leveraged endbr32 / endbr64 instructions. Ubuntu 24.04, for instance, has included these instructions in its packages:

$ objdump -d /bin/bash | head -n 50

/bin/bash:     file format elf64-x86-64

Disassembly of section .init:

0000000000030000 <.init>:
   30000:       f3 0f 1e fa             endbr64
   30004:       48 83 ec 08             sub    $0x8,%rsp
   30008:       48 8b 05 d9 7e 12 00    mov    0x127ed9(%rip),%rax        # 157ee8 <__gmon_start__@Base>
   3000f:       48 85 c0                test   %rax,%rax
   30012:       74 02                   je     30016 <unlink@plt-0xe1a>
   30014:       ff d0                   call   *%rax
   30016:       48 83 c4 08             add    $0x8,%rsp
   3001a:       c3                      ret

Disassembly of section .plt:

0000000000030020 <.plt>:
   30020:       ff 35 a2 76 12 00       push   0x1276a2(%rip)        # 1576c8 <o_options@@Base+0x1cc8>
   30026:       ff 25 a4 76 12 00       jmp    *0x1276a4(%rip)        # 1576d0 <o_options@@Base+0x1cd0>
   3002c:       0f 1f 40 00             nopl   0x0(%rax)
   30030:       f3 0f 1e fa             endbr64
   30034:       68 00 00 00 00          push   $0x0
   30039:       e9 e2 ff ff ff          jmp    30020 <unlink@plt-0xe10>
   3003e:       66 90                   xchg   %ax,%ax
   30040:       f3 0f 1e fa             endbr64
   30044:       68 01 00 00 00          push   $0x1
   30049:       e9 d2 ff ff ff          jmp    30020 <unlink@plt-0xe10>
   3004e:       66 90                   xchg   %ax,%ax
   30050:       f3 0f 1e fa             endbr64
   30054:       68 02 00 00 00          push   $0x2
   30059:       e9 c2 ff ff ff          jmp    30020 <unlink@plt-0xe10>
   3005e:       66 90                   xchg   %ax,%ax
   30060:       f3 0f 1e fa             endbr64
   30064:       68 03 00 00 00          push   $0x3
   30069:       e9 b2 ff ff ff          jmp    30020 <unlink@plt-0xe10>
   3006e:       66 90                   xchg   %ax,%ax
   30070:       f3 0f 1e fa             endbr64
   30074:       68 04 00 00 00          push   $0x4
   30079:       e9 a2 ff ff ff          jmp    30020 <unlink@plt-0xe10>
   3007e:       66 90                   xchg   %ax,%ax
   30080:       f3 0f 1e fa             endbr64
   30084:       68 05 00 00 00          push   $0x5
   30089:       e9 92 ff ff ff          jmp    30020 <unlink@plt-0xe10>
   3008e:       66 90                   xchg   %ax,%ax
   30090:       f3 0f 1e fa             endbr64
   30094:       68 06 00 00 00          push   $0x6
   30099:       e9 82 ff ff ff          jmp    30020 <unlink@plt-0xe10>
   3009e:       66 90                   xchg   %ax,%ax

The endbr32 / endbr64 instructions has been carefully designed, so they are nop instructions—meaning they can do nothing at all—on CPUs prior to their introductions. Having them doesn't have any effect on older CPUs but enhances security on supported hardware.

RISC-V’s Latest Achievements: CFI Extension

The above mitigations against ROP fall into two categories:

  • Compiler Modifications: Can generate more secure binary assembly code.

  • Additional CPU instructions : Coming with Control Flow Integrity (CFI) checks to prevent exploitation

Back to the beginning of designing CKB-VM, we throughly studied ROP and recognized that a vulnerability in a CKB script could potentially open the door to ROP attacks. However, we eventually did not introduce any specific mitigation against ROP in CKB-VM. Our decision was to stay aligned with the RISC-V ecosystem, avoiding shipping any custom RISC-V spec with additional instructions that would require a patched compiler. Nor do we want to maintain our own compiler set, eliminating the potential that any RISC-V-compliant compiler shall be able to produce CKB script. As the result, we shipped the first version of CKB-VM without ROP mitigations, but that does not mean we’ve ignore this issue:

  • We’ve reached out to the RISC-V community for possible extension similar to Intel’s solution, and kept monitoring advancements in this field;

  • We’ve been watching over progress and writing secure CKB scripts. Since ROP relies on existing vulnerabilities, secure CKB scripts can kept ROP purely theoretical.

We were thrilled when the RISC-V CFI (Control-Flow Integrity) Extension was officially ratified in July 2024. Designed by the brilliant minds from the RISC-V Foundation, this extension directly addresses ROP attacks with two key features:

  • Zicfilp extension introduces landing pad: Resembles Intel’s endbr32 / endbr64 to ensure that the CPU can only jump to valid, permitted targets.

  • Zicfiss extension introduces shadow stack with a series of instructions:. Offers a hardware solution similar to Retguard, where CPU ensures the control flow integrity or simply puts the call stack, preventing tampering throughout execution.

Together, these features offer the state-of-the-art mitigations against ROP. More importantly, RISC-V CFI is now an official extension, meaning all future RISC-V CPUs, compilers, and tools will support this extension. In fact, LLVM 19 has already supported, and I believe other compilers and tools will follow soon.

Once fully adopted, CKB script developers can simply turn them on like a switch during code compilation. Without modifying the code, they can enjoy the security provided by RISC-V CFI extensions. Even if a vulnerability exists in a CKB script, these built-in enforcements can prevent it from being exploited.

Final Words

Security is complex. While we strive for maximum security, certain design principles might get in the way from introducing specific mitigations. ROP is a prime example: while we did learn much about it early on, implementing the best mitigations needs proper timing. Now the time has come. We are happy to introduce RISC-V CFI in CKB’s next hardfork, bringing stronger security for everyone.


✍🏻 Written by Xuejie Xiao

His previous posts include:

Find more in his personal website Less is More.

0
Subscribe to my newsletter

Read articles from Cryptape directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Cryptape
Cryptape