Buffer Overlows


In the light of recent events with the WannaCry ransomware, I dug up my old article from 2004. The principles and logic are still valid today. Since I haven't done this in a while, I will immediatelly start with the code snippet written in C:

void function() {
        char buffer[128];
        int *ret;
        ret = buffer + 140;
        printf("@ret: %p\n",*ret);
        (*ret) = (*ret) + 7;
int main() {
        int x = 0;
        x = 5;
        printf("x: %d\n",x);


Nice. Simple code snippet. Let's compile it, run it and check the output.

# compile
gcc -o buff buff.c

# run

# output
@ret: 0x804838b 
x: 0

Ooops! Why is the value of x still 0, if we set it to 5? 


The Mission

We will try to show, that an execution of a computer program can sometimes be altered by overflowing the character array or so called string buffer. Overflow is possible if the computer code does not perform bounds checking on user's input. For Example, two such functions are strcpy() and gets().

Ultimately, we are trying to alter the execution by overwriting the exact memory location of the function's return address on the stack. This way, it is possible to obtain partial control of the program execution. In an buffer overflow exploit, the attacker will try to use this technique to execute previously planted shellcode, thus obtaining higher user credentials.

The code above overflows the buffer and increments the return address pointer in such way, that the program execution continues after the assignment instruction. This is the reason why value of x still equals 0.

The Procedure

The code was written & tested on x86 architecture running Redhat 9 Linux as a proof of concept. We basically create a pointer to the character array. This pointer is then increased for 140 Bytes, overflowing the declared array by 12 Bytes. Now it is pointing directly to the return address, where the execution will continue after we leave function(). Line 6 increments the return address by 7 bytes. This means that the execution will jump over 7 bytes of instructions and continue there. The tricky part here is to know where the return address resides in memory (architecture & OS dependent) and where can the program execution safely continue. If we would choose this numbers by random, I am quite sure, that we would crash the program getting only Segmentation Fault message. To obtain the correct values, we need to dissasemble function(). Let us use good old gdb (GNU Debugger):

Dump of assembler code for function function:
0x08048328 <function+0>:        push   %ebp
0x08048329 <function+1>:        mov    %esp,%ebp
0x0804832b <function+3>:        sub    $0x98,%esp   
0x08048331 <function+9>:        lea    0xffffff78(%ebp),%eax
0x08048337 <function+15>:       add    $0x8c,%eax
0x0804833c <function+20>:       mov    %eax,0xffffff74(%ebp)
0x08048342 <function+26>:       sub    $0x8,%esp
0x08048345 <function+29>:       mov    0xffffff74(%ebp),%eax
0x0804834b <function+35>:       pushl  (%eax)
0x0804834d <function+37>:       push   $0x8048454
0x08048352 <function+42>:       call   0x8048268 <printf>
0x08048357 <function+47>:       add    $0x10,%esp
0x0804835a <function+50>:       mov    0xffffff74(%ebp),%edx
0x08048360 <function+56>:       mov    0xffffff74(%ebp),%eax
0x08048366 <function+62>:       mov    (%eax),%eax
0x08048368 <function+64>:       add    $0x7,%eax
0x0804836b <function+67>:       mov    %eax,(%edx)
0x0804836d <function+69>:       leave
0x0804836e <function+70>:       ret
End of assembler dump.


As we enter this function EIP Instruction Pointer  in on top of the stack. Stack pointer on the x86 Architecture grows "down". Now we need to observe what is going on with ESP Stack pointer. For starters, we push EBP register (4 bytes). Then, line 4 reserves 0x98 (152 decimal) bytes on the stack for static local variables. Later 0x10 (16 decimal) bytes are freed. This gives us the following offset from our return address: 

  • Line 2 PUSH: 4 bytes reserved
  • Line 4 SUB: 152 bytes reserved
  • Line 13 ADD: 16 bytes freed

152 bytes - 16 bytes + 4 bytes = 140 bytes.  

Ok, now we know where the return address is. To see, how it could be changed without crashing the program, we need to check the dissasembly of main():

Dump of assembler code for function main:
0x0804836f <main+0>:    push   %ebp
0x08048370 <main+1>:    mov    %esp,%ebp
0x08048372 <main+3>:    sub    $0x8,%esp
0x08048375 <main+6>:    and    $0xfffffff0,%esp
0x08048378 <main+9>:    mov    $0x0,%eax
0x0804837d <main+14>:   sub    %eax,%esp
0x0804837f <main+16>:   movl   $0x0,0xfffffffc(%ebp)
0x08048386 <main+23>:   call   0x8048328 <function>
0x0804838b <main+28>:   movl   $0x5,0xfffffffc(%ebp)
0x08048392 <main+35>:   sub    $0x8,%esp
0x08048395 <main+38>:   pushl  0xfffffffc(%ebp)
0x08048398 <main+41>:   push   $0x804845e
0x0804839d <main+46>:   call   0x8048268 <printf>
0x080483a2 <main+51>:   add    $0x10,%esp
0x080483a5 <main+54>:   leave
0x080483a6 <main+55>:   ret
0x080483a7 <main+56>:   nop
End of assembler dump.

Normally, after the function call we would return to line 10, address 0x0804838b. Next line is assignment of value 5 to the variable x. If we are to jump over this line, we need to go directly to 0x08048392. Difference between these 2 adresses is exactly 7 bytes, therefore we increase the saved EIP address for exactly this value. This way we effectively jump over the instruction and therefore the value of x remains 0 as initialized.  


Be careful when handling buffers. Take extra care to check the buffer size to avoid memory corruption and potential vulnerability. 

When working with higher level or scripting languages it is somewhat easier, as they mostly have such protections automatically built-in. Also, new compilers and operating systems try to mitigate such situations by introducing different memory management strategies.