How a program works
How a program works
- CODE SEGMENT
- DATA SEGMENT
- STACK SEGMENT
- What is behind a function-call
- The Base pointer (EBP)
- ANALYSIS:
How a program works A very early step for understanding Buffer Overflow
This is not a buffer overflow exploit, but a required background that will help to understand how CPU & memory "collaborate" to execute a program. I read many articles about 'buffer overflow'. Most of them starting from a specific point by 'stowing' the basic knowledge one must have to deeply understand what is going on behind the scenes. I write this article to cover (I hope) this gap.
If at the end of this article you feel more comfortable with concepts like CALL, RETN and how a function is executed using the memory (buffer, stack, etc) then i will feel that I succeed… so, help me feel a successful and nice person :))
First, I would like to point out that everything we say is about the processor xx86 family. In addition, most memory addresses are expressed in a decimal notation (for the shake of clarity for beginners) instead of hexadecimal that actually represented in the real world systems.
Requirements in order to read this article:
- A basic understanding of assembly language.
- A basic understanding of C language.
- A basic understanding of a Personal Computer.
- A basic understanding of English (i hope…).
- None of the above,… just open mind, imagination and… frame. Well,… ok,.. 4 and 5 i believe is the most crucial - even they contradict each other!
I hear you say…: "Come on lamer, you said too much!! Let’s start…"
Ok then…. Every process starts in memory in three basic segments: -Code Segment -Data Segment (the well known BSS) -Stack Segment
CODE SEGMENT
In this memory segment, "live" all instructions of our program. Nobody… (nobody? well ok, almost nobody) can write to this memory segment i.e. is a read only segment.
For example All assembly instructions (in C code here) are located in code segment:
/* Set the 1st diagonal items to 1 otherwise 0 */ for (i = 0; i < 100; i++) for (j = 0; j < 100; j++) if (i<>j) a[i][j] = 0 else a[i][j] = 1;
PS: The remarks /…/ are not included… in the data segment. The compiler does not produce code for the remarks.
DATA SEGMENT
All initialized or un-initialized global variable are stored in this non-read only segment. For example:
int i; int j = 0; int a[100][100];
STACK SEGMENT
All function variables, return addresses and function addresses are stored in this non-readonly memory. This segment is actually a stack data structure (for those that have attended a basic information technology course). This, actually means, that we put variables in a stack in memory. The last putted (or pushed) variable is in the top on stack i.e. the first available. The well known LIFO (Last In First Out) data structure.
The processor register ESP (Extended Stack Pointer) is used to keep the address of the first current available element of the stack.
In the stack: we can put (PUSH) and get (POP) values. There are two important “secrets†here: [1] PUSH and POP instructions are done in 4-byte-units because of the 32bit architecture of xx86 processors family. [2] Stack grows downward, that is, if SP=256, just after a “PUSH 34†instruction, SP will become 252 and the value of EAX will be placed on address 252.
For example:
STACK adrs memory
256 | xy | 252 | | 248 | | 244 | | … …………….. (ESP=256)
Instruction > PUSH EAX ; remark: suppose EAX = 34
STACK 256 | xy | 252 | 34 | 248 | | 244 | | … …………….. (ESP=252)
Instruction > POP EAX ; remark: Get the value from the stack into EAX register
STACK 256 | xy | 252 | 34 | 248 | | 244 | | … …………….. (ESP=256)
Instruction > PUSH 15 ; remark: suppose EAX = 15 Instruction > PUSH 16 ; remark: suppose EBX = 16
STACK 256 | xy | 252 | 15 | 248 | 16 | 244 | | … …………….. (ESP=248)
What is behind a function-call
Before we explain what is behind, we must say a few words about the EIP (Extended Instruction Pointer or simple 'Instruction pointer'). This register keeps the code segment address of the instruction that will be executed by the CPU.
Every time CPU executes an instruction stores into EIP the address of the instruction that follows the currently executed. But, how does CPU find the address of the next instruction? Well… we have two cases here…
- The address is immediately after the instruction currently executed.
- There is a 'JMP' (jump, i.e. a function call) so the instruction that needs to be executed next is in an address which is not next to the current.
In case 1 the address is calculated by simply add the Length of the currently executed instruction to the current EIP value. Example: Suppose we have the following 2 instruction to the addresses 100, 101
100 push EDX 101 mov ESP 0
Suppose that at the starting point of our little program we have: EIP = 100 CPU executes the instruction at address 100. CPU checks the instruction: Is it a JUMP? No, so calculate its size. CPU knows that the push instruction is 1 byte long. So,… the new value of EIP = EIP + size(push EDX) => EIP = 100 + 1 => EIP = 101 So,…. CPU executes the instruction at address 101, and so forth…
In case 2, we have a jump… things are a bit more different. Actually, just before we JMP to another address (i.e. call a function), we save the address of the next instruction in a temporary register, say in EDX; and before returning from the function we write the address in EDX to EIP back again.
CALL and RETN assembly instructions are used … by the CPU to calculate the above addresses: The CALL is used to do 2 things:
- To "remember" the next instruction that will be executed after function returns (by pushing its address to the stack) and
- To write into the EIP the address of the calling function i.e. to perform the function call.
The RETN instruction is called at the end of the function: It pops (gets) the "return address" that CALL pushes into the stack to continue the execution after the end of the function.
The Base pointer (EBP)
Each function in any program (even the main() function in C) has its own stack frame. A stack frame is a logical group of consecutive variables in the stack that keeps variables and addresses for every function that is currently executed. Every address in the stack’s frame is a relative address. That means, we address the locations of data in our stack in relative to some criterion. And this criterion is EBP, which is the acronym for Extended Base Pointer. EBP has the stack pointer of the caller function. We PUSH the old ESP to the stack, and utilize another register,named EBP to relatively reference local variables in the callee function. I hope the use of the base pointer will be more clear in the following example.
A REAL EXAMPLE C PROGRAM:
Consider the following C program:
void function1(int , int , int ); void main() { function1 (1, 2, 3); }
void function1 (int a, int b, int c) { char z[4]; }
I compile/link the above program and I use the olly debugger to check the assembly code created. Bypassing the operating systems instructions (which is the 90% of the assembly code) the rest is the code that corresponds to our little program:
0040123C /. 55 PUSH EBP 0040123D |. 8BEC MOV EBP,ESP 0040123F |. 6A 03 PUSH 3 ; /Arg3 = 00000003 00401241 |. 6A 02 PUSH 2 ; |Arg2 = 00000002 00401243 |. 6A 01 PUSH 1 ; |Arg1 = 00000001 00401245 |. E8 05000000 CALL bo1.0040124F ; \bo1.0040124F 0040124A |. 83C4 0C ADD ESP,0C 0040124D |. 5D POP EBP 0040124E \. C3 RETN
0040124F /$ 55 PUSH EBP 00401250 |. 8BEC MOV EBP,ESP 00401252 |. 51 PUSH ECX 00401253 |. 59 POP ECX 00401254 |. 5D POP EBP 00401255 \. C3 RETN
ANALYSIS:
The addresses from 0040123C to 0040124E is the main() function. The addresses from 0040124F to 00401255 is the function1() function.
0040123C /. 55 PUSH EBP Backs up the old stack pointer. It pushes it onto the stack.
0040123D |. 8BEC MOV EBP,ESP Copy the old stack pointer to the ebp register From then on, in the function, we'll reference function's local variables with EBP. These two instructions are called the "Procedure Prologue".
The stack has the EBP value: [ebp] STACK 256 | [ebp] | … …………….. (ESP=256)
0040123F |. 6A 03 PUSH 3 ; /Arg3 = 00000003 00401241 |. 6A 02 PUSH 2 ; |Arg2 = 00000002 00401243 |. 6A 01 PUSH 1 ; |Arg1 = 00000001 Here we put the arguments into the stack
The stack is: STACK 256 | [ebp] | 252 | 3 | 248 | 2 | 244 | 1 | … …………….. (ESP=244)
00401245 |. E8 05000000 CALL bo1.0040124F ; \bo1.0040124F call the function at addresss 0040124F. bo1 is the name of my executable. The stack becomes: STACK 256 | [ebp] | 252 | 3 | 248 | 2 | 244 | 1 | 240 | 0040124A | <- the return address when the function1 ends. … …………….. (ESP=240)
Let’s follow the execution, so go to address 0040124F (the function1):
0040124F /$ 55 PUSH EBP 00401250 |. 8BEC MOV EBP,ESP Hmm… this is the "Procedure Prologue" again (remember this must be executed in every function). It set ups its own stack frame. The EBP register is currently pointing at a location in main's stack frame. This value must be preserved. So, EBP is pushed onto the stack. Then the contents of ESP is transferred to EBP. This allows the arguments to be referenced as an offset from EBP and frees up the stack register ESP to do other things.
The stack now, is: STACK 256 | [ebp] | 252 | 3 | 248 | 2 | 244 | 1 | 240 | 0040124A | <- the return address when the function1 ends. 236 | <main’s EBP> | <- Note that ESP=EBP indicates this address. … …………….. (ESP=236)
00401253 |. 59 POP ECX 00401254 |. 5D POP EBP After two pops the actual stack becomes: STACK 256 | [ebp] | 252 | 3 | 248 | 2 | 244 | 1 | … …………….. (ESP=244)
00401255 \. C3 RETN The function ends and returns to the 0040124A (remember our definition of the RET instruction).
0040124A |. 83C4 0C ADD ESP,0C After the function RETurned, we add 12 or 0C in hex (since we pushed 3 args onto the stack, each allocating 4 bytes (integers)) into Stack Pointer. Increasing the ESP we actually decreasing the stack (remember that we fill stack downwards from high to low memory addresses i.e. ESP = 244 + 12 = 256). STACK 256 | [ebp] | … …………….. (ESP=256)
Thus, the ESP has the value that has at the first step of the programs execution before the function call.
I hope that you get a basic understanding of the use of Stack and Stack Pointer. In another article I will describe how nasty things can happened here. Hint: How about overwriting the stack item (at address 240 in our example above) or how about overwriting the value of the Instruction Pointer (EIP)…
Please be impolite and as rude as possible because this is not my 1st article. In addition I don’t give a shit about it…
;-) To be serious… I suggest you to try my program or better create your own and test, check, review, test, check, review, test, check, review!!
Happy Programming Guys!!
References: [1] BUFFER OVERFLOWS DEMYSTIFIED by murat@enderunix.org [2] C Function Call Conventions and the Stack (UMBC CMSC 313, Computer Organization & Assembly Language, Spring 2002, Section 0101) [3] The Assembly Language Book for IBM PC by Peter Norton (ISBN 960-209-028-6) [4] Analysis of Buffer Overflow Attacks from http://www.windowsecurity.com/articles/Analysis_of_Buffer_Overflow_Attacks.html [5] 8088 8086 Programming and Applications for IBM PC/XT & Compatibles by Nikos Nasoufis
ghost 18 years ago
Sounds, erm, interesting. You clearly put a lot of effort into this. It seems well constructed, written and packed full of content. Really good, shame I didnt read it, whhooosshhh right over my head.
Mr_Cheese 18 years ago
great article has everything a great article needs and you included references which is a rare bonus. seems interesting as i was reading now and straight forward to understand. i'll give it a thorough read a bit later on.
ghost 18 years ago
looks great, gave it a quick read only htoughm 3:23 am o.O makes me want to leanr ASM
ghost 18 years ago
Fantastic article had to read it a few times to understand it all but it was great! Please write more articles :)