One thing I have found over the couple of times where I have dabbled in reversing, is a common learning strategy for newbies is to get straight into trying crackmes without having a basic understanding of what the hell they’re doing. Guided by poorly written “tuts” or tutorials, often sprinkled liberally with shocking spelling, the tendancy is to try to glean information from seeing it done. From a random sampling of tutorials found on www.crackmes.de and other places I have found a very large portion of them do not fully explain what is going on and why the reverser chose to put the breakpoint where he/she did. For example, things like: “I put a brake pnt ther becoz my spidy sense told me to lol, u will haf to figar out why 4 urself” happens supprisingly often. Alternatively the tutorial writer doesn’t write a tutorial, merely posts the answer without any guidance on how to arrive at it. This is fine if you have some experience, but for a newbie it can certainly be frustrating, resulting the newbie being able to at best go through the motions layed out in the tutorial but without understanding what is being done. Don’t get me wrong, there are some exelent tutorials out there, written by people who care that people are reading and following allong, but they are few and far between.
So in order to avoid this, the strategy that I initially started with this time arround was to learn to program in assembly and then go from there. I had hoped that having a solid understanding of assembly language would assist in reversing. This has also caused me great frustration to my suprise. The thing is, code written by a human, regardless of the language is compiled by a computer into the most efficient form according to the type of compiler and the optimising options set and sometimes there may be a trade off between things like speed of execution, memory usage and size of the final executable. Certain mathematical operations, for example may be switched arround and handled in completely different ways than a human would logically expect, and comparisons and jumps changed accordingly or code interleaved in order to get more efficiency of execution.
The end result is that the code that the CPU executes may look entirely different from the code that the human wrote. And my conclusion therefore is that if your goal is to learn to reverse, teaching yourself to write programs with assembly language will only be usefull up to a certain point.
The ultimate goal of any reversing session is to understand the program flow enough that you could at least write pseudocode describing its functionality. This level of understanding may not be necessary in all cases depending on your reasons for reversing, but it should still be the goal that you aim for from the start. And so as you will never have the hand-written code to look at, it is more profitable to learn what certain higher level logic looks like after it has been compiled, linked and then dissassembled.
So what I am doing in this post and possibly subsequent posts is going through at a very basic level, the break down of simple instructions as viewed via a dissassembler.
You will need:
————–
A C compiler for windows (I’m using LCC: http://www.cs.virginia.edu/~lcc-win32/)
Ollydbg (http://www.ollydbg.de/)
IDA demo or free (http://www.datarescue.be/downloaddemo.htm)
A good text editor, or you can use the IDE which comes with LCC.
Knowledge of basic programming structure (you don’t have to know C as I’ll explain the relevant bits).
Basic knowledge of assembly language (just have a read through PCASM first and keep it as a reference).
some familiarity with OllyDbg
knowledge of Hex
First we’ll start with the standard Hello World program.
I’m using the command line rather than the gui of LCC because I find it more flexible to work with when just compiling small amounts of code like this.
Install LCC -> right click on “my computer” -> properties -> Advanced tab -> environment variables -> edit the “path” variable and put the directory that you have installed lcc into at the beginning of the line followed by a semicolon e.g. “C:\lcc\bin;”.
Create a file called “compile.bat” in the directory that you will be working in and put the following in it:
lcc -o %1.obj %1.c
lcclnk -o %1.exe %1.obj
Type the following C program into your chosen text editor and save it as hello.c
#include <stdio.h>
int main(void)
{
printf("Hello World\n");
return 0;
}
now you can just type into the command line
compile hello
and it will create a file called hello.exe
This classic program obviously prints “Hello World” out to the screen. But in order for this seemingly simple task to be accomplished there is far more going under the hood, specifically printf is a function contained in the stdio library which will display information to the screen. in the completed binary, the entirety of the printf code will be integrated into the binary.
So lets open it up in ollydbg.
As you can see, there’s quite a bit more stuff in there apart from what we’ve written. Notice that you get placed at what Ollydbg thinks is the entry point for the program. The purpose of this example is not to go through this, but to determine what the compiler has done with our code.
Scroll down until you get to the following:
004012D4 /$ 68 A0A04000 PUSH hello.0040A0A0 ; /format = "Hello World"
004012D9 |. E8 DB5E0000 CALL hello._printf ; \_printf
004012DE |. 83C4 04 ADD ESP,4
004012E1 |. B8 00000000 MOV EAX,0
004012E6 \. C3 RETN
This section of the code pushes the data stored in 0040A0A0 onto the stack and then calls the function printf. You can see what is stored in 0040A0A0 by right clicking on the command in ollydbg and selecting “follow in dump -> Immediate Constant”. This information is set when the program is opened. you can See exactly what the _printf function does by stepping into it during runtime (set a break point at that line and hit f7 to step into the code).
Next we’ll add a bit more complexity and demonstrate a few more things at once:
#include <stdio.h>
int main(void)
{
int num;
if (2==2)
{
printf("Hello World\n");
}
else
{
printf("No Greeting for you\n\n");
}
printf("enter a number\n");
scanf("%d",&num);
if (num==2)
{
printf("number = 2\n");
}
else
{
printf("number <> 2\n");
}
printf("The address of number is: %d and the value is %d",&num, num);
return 0;
}
So lets compile this and open it up in olly.
Again we’ve been placed at the entry point to the program. Scroll down until you see the following:
004012D4 $ 55 PUSH EBP
004012D5 . 89E5 MOV EBP,ESP
004012D7 . 51 PUSH ECX
004012D8 . 57 PUSH EDI
004012D9 . 68 FFB04000 PUSH hello2.0040B0FF ; /format = "Hello World"
004012DE . E8 76760000 CALL hello2._printf ; \_printf
004012E3 . 83C4 04 ADD ESP,4
004012E6 . EB 0D JMP SHORT hello2.004012F5
004012E8 . 68 E9B04000 PUSH hello2.0040B0E9 ; /format = "No Greeting for you"
004012ED . E8 67760000 CALL hello2._printf ; \_printf
004012F2 . 83C4 04 ADD ESP,4
You can see here that the compiler has made a decision that our if test is not necessary and insted of performing a compare on 2=2, it opts to just always execute the call to prinf with “Hello World” and then puts a JMP command to always skip over the “No Greeting for you” section. This is a very small, trivial example of the kinds of unexpected things that you’ll find in dissassembled code. very likely no programmer would compare to constants like we have, but you can see that the program has been omptimised in a way that may not immediately make sense if we don’t have the source code handy.
004012F5 > 68 D9B04000 PUSH hello2.0040B0D9 ; /format = "enter a number"
004012FA . E8 5A760000 CALL hello2._printf ; \_printf
004012FF . 83C4 04 ADD ESP,4
00401302 . 8D7D FC LEA EDI,DWORD PTR SS:[EBP-4]
00401305 . 57 PUSH EDI
00401306 . 68 D6B04000 PUSH hello2.0040B0D6 ; /format = "%d"
0040130B . E8 70430000 CALL hello2._scanf ; \_scanf
00401310 . 83C4 08 ADD ESP,8
00401313 . 837D FC 02 CMP DWORD PTR SS:[EBP-4],2
This section takes a number entered by the user and compares it. It’s worth it at this point to set a breakpoint at 004012F5 and step through the program paying close attention to the registers and the stack.
The LEA command is taking the value stored in the address EBP-4 and the following push command is inserting the address value at the top of the stack.
You’ll notice the number you enter is placed in the stack at 0012FF70, yours may be different, but it will always be in the address referenced by the value of EBP-4 so in hex 0012FF70 – 4 = 0012FF6C.
The stack now looks like this
0012FF60 0040B0D6 Ö°@. ASCII "%d"
0012FF64 0012FF6C lÿ.
0012FF68 7C910208 ‘| ntdll.7C910208
0012FF6C 00000002 ...
0012FF70 /0012FFC0 Àÿ.
olly moves the view of the stack according to what is in the ESP register (which was just incremented by 8 in the previous code), you can scroll up and right-click -> lock stack in order to stop it from moving while debugging.
The memory address 0011FF64 now stores the value of the address that contains the number that we just entered. Something that is important to note at the moment is the difference between a reference to the data stored in a register and reference to the data stored at the memory address that the register holds. They are very different.
For example, having steped through the code to the CMP statement, we would have seen that the ADD ESP,8 command immediately added 8 to the value stored in the ESP register. The CMP command however is not referring to the data stored in EBP, nor is it refering to (the value of the data stored in EBP)-4, but it is referencing the data stored at the memory address in the stack that equals the value of (EBP minus 4). confusing?
If the data stored in EBP is “0012FF70″, then any refference to EBP without square brackets refers to the value 0012FF70.
if the data stored in the memory address 0012FF70 is “0012FFC0″, then a reference to [EBP] with the square brackets is referring to the value “0012FFC0″.
A reference to [EBP-4] first takes the number 4 away from the value stored in EBP, and then finds the value stored at the resultant memory address. in this case EBP contains the hex value “0012FF70″ and so EBP-4 = “0012FF6C”. if the data stored at the 0012FF6C stack address is “2″, then a reference to [EBP-4] = “2″.
I hope this is clear because it is a very important concept, and one that may not be clear to people who have only programmed in higher level languages (like myself I’m ashamed to admit). Once again, I recommend that you step through this in Ollydbg paying close attention to the registers and the stack.
Moving right allong then, the rest of the code is as follows:
00401317 . 75 0F JNZ SHORT hello2.00401328
00401319 . 68 CAB04000 PUSH hello2.0040B0CA ; /format = "number = 2"
0040131E . E8 36760000 CALL hello2._printf ; \_printf
00401323 . 83C4 04 ADD ESP,4
00401326 . EB 0D JMP SHORT hello2.00401335
00401328 > 68 BDB04000 PUSH hello2.0040B0BD ; /format = "number <> 2"
0040132D . E8 27760000 CALL hello2._printf ; \_printf
00401332 . 83C4 04 ADD ESP,4
Here you see the basics of an if-test at work. As we know, the previous command (CMP DWORD PTR SS:[EBP-4],2) effectively performed the operation [EBP-4]-2, and instead of storing the result, it sets the ZF and CF flags according to the outcome. All we care about for this one is if the difference is zero (ZF flag set to 1). If it is, the program will carry on with the next command, if it is not zero it will Jump (JNZ = Jump if not zero) by setting the next execution address (stored in the EIP register) to 00401328 and then continue on.
If we enter 2 into the program, the comparison will be zero and the program will proceed to tell us that the “number = 2″. after it has finished doing this, it will proceed to the next command after the end of the alternate branch (the LEA command), if it takes the “number <> 2″ path, then once it has finished, it just continues with the next command.
If you are following closely at this point, you will notice that there are some unnecessary redundancies in this code. there is a duplicated “ADD ESP, 4″, only one is ever executed due to the if-test so why not remove one and place the other at the end of the if-test? you’ll also notice at this point that the EDI register already contains the value stored in [EBP-4] and so this second LEA command is unnecessary. There are certain strange people in this world who actually care about this sort of thing and they actually have competitions in order to try to reduce the size of executables as much as possible by removing redundancies like this and being as efficient as possible…. for the moment, it’s just an interesting point to note: compilers are not absolutely perfect.
Next, we get to our final section of the code.
00401335 > FF75 FC PUSH DWORD PTR SS:[EBP-4] ; /<%d>
00401338 . 8D7D FC LEA EDI,DWORD PTR SS:[EBP-4] ; |
0040133B . 57 PUSH EDI ; |<%d>
0040133C . 68 A0B04000 PUSH hello2.0040B0A0 ; |format = "The address of number is: %d and the value is %d"
00401341 . E8 17760000 CALL hello2._printf ; \_printf
00401346 . 83C4 0C ADD ESP,0C
00401349 . B8 00000000 MOV EAX,0
0040134E . 5F POP EDI
0040134F . C9 LEAVE
00401350 . C3 RETN
I just added this section of the code to ram home the difference between data stored in registers and the value stored in the memory location stored by the registers.
The first line here is fairly simple – it gets the value that we entered and puts it at the top of the stack, preparing it for being displayed to the user. The second line gets the value of the memory address stored at EBP-4 and puts it into the EDI register. the following line pushes it onto the stack and we’re ready to go.
If you take it one step further, the stack looks like this:
0012FF5C 0040B0A0 |format = "The address of number is: %d and the value is %d"
0012FF60 0012FF6C |<%d> = 12FF6C (1245036.)
0012FF64 00000002 \<%d> = 2
You’ll notice that the numeral 2 was placed on the stack first followed by the memory address dispite the fact that the address is displayed first in the output string. You can step into the CALL hello2._printf command (by hitting f7 in olly) to see what happens with these values.
You’ll notice that the program, when it completes it’s execution, will output “The address of number is: 125036″ if you convert this to hex, you’ll get 12FF6C, which is the memory address where our entered number is stored.
So there’s only one more thing remaining here. We’ve seen what Olly does with the code, let’s have a quick peek at what IDA pro has to offer:

As you can see, IDA puts together a nice graphical program flow – it is very easy to see where in the code various jumps go to. you’ll also notice that it appears to use a different method of dissassembly, or at least it displays the dissassembled code in a different manner than olly does.
these two lines in olly:
00401335 > FF75 FC PUSH DWORD PTR SS:[EBP-4] ; /<%d>
00401338 . 8D7D FC LEA EDI,DWORD PTR SS:[EBP-4] ; |
are somewhat simplified in IDA as:
push [ebp+var_4]
lea edi, [ebp+var_4]
with var_4 being declared at the start as a constant.
I hope this has been helpful – Please feel free to leave a comment, if I’ve made any mistakes in the above, please let me know – I’m always trying to learn more
Next time we’ll do the same thing again looking at array structures.
Cheers!
Nick.