Understanding Buffer Overflows: A Beginner's Guide - Part 1
Table of contents
This article is a 2 part series and this is the first article. In this article, we’ll discuss basic prerequisites to understand buffer overflows.
Any program we run, runs in specified block(s) of memory. This memory is called the address space of the program.
As the image above shows, this memory is divided into various segments. For the purpose of this series, we only need to focus on the Stack Segment.
Stack
Stack is the region in memory which programs use for function calls, passing function arguments to other functions, saving previous stack frame’s registers, local variables and storing return pointer and values.
Stack works in the same way as the stack data structure with two possible operations - push and pop. The stack pointer(esp) points to the top of the stack with additional functionalities.
The stack grows and shrinks dynamically as the program runs. Each time a function is called, a new stack frame is pushed onto the stack, and when a function returns, its corresponding stack frame is popped off. For the uninitialized, a stack frame is nothing but an isolated part of the stack reserved for use by the function it was created for. The base and top of this stack is marked by Base Pointer(ebp) and Stack Pointer(esp) Respectively.
In most computer systems, the stack grows downward in memory i.e the stack grows from high value addresses towards low values addresses.
Registers
Registers are small, fast storage locations within a computer's central processing unit (CPU) that hold data temporarily during computation. They are essential for the CPU's operations, as they store values that the processor can access quickly, without needing to fetch them from the memory (RAM).
We will be discussing x86 architecture registers here but the basic functionality for different architectures remain the same. In the x86 architecture, registers are typically 32 bits wide, meaning they can hold 32 bits of data at once.
The architecture is nothing but a name for grouping all CPUs that follow the same instruction sets. You might have heard, that your computer is of 64 bit architecture or 32 bit architecture (x86).
Important x86 Registers:
General-Purpose Registers (GPRs): These are the core registers used for various data operations. In x86, the common general-purpose registers are:
EAX (Accumulator): Used for arithmetic and logic operations.
EBX (Base): Often used as a pointer to data in memory.
ECX (Counter): Primarily used for counting iterations in loops.
EDX (Data): Used for I/O operations and extended precision arithmetic.
Instruction Pointer (EIP): This 32-bit register holds the address of the next instruction to be executed.
Stack Pointer (ESP): Keeps track of the top of the stack, used during function calls and local variable storage.
These registers control the execution of programs along with performing different logical operations.
If you are from a programming background, you can think of registers as variables that the CPU uses for storing values and subsequently using them for computation.
Calling Convention
Now that we know the structures that are used in the memory, lets see how this memory works when we call a function. This procedure is called Calling Convention.
In reality, many calling conventions are possible. We will look at the one which the C Programming Language uses.
As we are discussing x86 architecture, the memory is divided into regions of 32 bits i.e. 4 bytes. Now lets see what happens when we call a function in our code.
First the parameters of the function are pushed on the stack in reverse order. Then the instruction after the instruction that calls the function is pushed. This is essentially the return address of the function. When the function will end, program execution will continue from this address. This part is called the Caller Convention as this is done by the function that calls the new function (for which are building a stack frame).
Now, we enter the code of the new function. First, the base pointer is pushed onto the stack. This is done because after the function ends, we want to setup the stack frame for the original function properly again. Then the base pointer is moved to the stack pointer and stack pointer is moved to make space for the variables. The stack pointer and base pointer now constitute the stack frame of the new function. This whole procedure is called the Callee Convention as its done inside the new function.
As you might see, the stack is just a bunch of values on top of each other. The stack by itself doesn’t know which value is the saved base pointer or the return value. That all is just a part of the convention and that’s where the problem starts.
What if the stack pointer that was moved to create space for the variables allocates less space than required? The variables would grow on to overwrite the values of the saved base pointer, return address, parameters etc! This is what attackers use to exploit a vulnerability named Buffer Overflows.
Buffer Overflows
Buffer overflow is a vulnerability of stack in which an attacker could overwrite different parts of the stack to control the execution flow of the program. This happens due to unsanitized user inputs by using vulnerable function like gets and fgets without proper input size. These functions allows an user to write more bytes in the stack than the buffer size. This allows the malicious user to change values like local variables, return pointer, global values and many more.
As I told earlier, the stack doesn’t know which value is where. It is assumed that the user would put an expected input that would follow our rules. Vulnerable functions like gets do not use any check on the length of the input string. Thus the user can overwrite the return address value and change the flow of the execution to his will!
This simple vulnerability is named ret2win.
Here a sample code:
#include <stdio.h>
// Use the following command for compiling this code
// gcc -m32 -no-pie -fno-stack-protector -o vuln vuln.c
void win()
{
puts("How did you reach here!");
}
void vuln()
{
char buffer[16];
gets(buffer);
}
int main()
{
puts("What are you doing here!");
vuln();
}
What do you think? Is there a way to somehow execute the win function?
According to the current program logic NO, but if we try to look closely we can use the above discussed vulnerability here.
If you enter the given payload using python into the program you’ll be amazed to see “How did you reach here!“ printed on the screen!
python2 -c 'print "a" * 28 + "\x86\x91\x04\x08"' | ./vuln
Now its your task to use the knowledge gained to explain why this is correct. You may also need to search a bit on the internet but that’s the fun part!
The fastest correct responder would receive a shout out in the COPS Community 🔥
Form to answer - https://forms.gle/wxd3YgRYPeAuLY1G6
We’ll be back with the second part of the blog explaining the answer and how to craft such exploits for any given code! Till then keep learning!
Subscribe to my newsletter
Read articles from UjjawalK directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by