HowTo Examine & Modify Executables


Introduction

I have been working with C all my professional and student life. There have been times when I had to look a little deeper to understand what is going on with my buggy program. There are many tools and techniques to examine an executable and this post is about that and a little bit of reverse engineering.

Test Code

As an example I wrote a pretty basic piece of code with some intentional inclusions. There are two global variables msga and msgb. Two user defined routines allow and deny that get executed inside the main function. One conditional call to an external program using execvp. The idea here is to examine the executable this program creates. Find out where the code I wrote lands in the executable and what compiler adds on top of it. Later I’ll showcase some basic reverse engineering that can be done by pretending we haven’t seen the code.

GitHub

#include <stdio.h>
#include <unistd.h>

char *msga = "Allow";
char *msgb = "Deny";

void allow() {
    printf("%s\n", msga);
}

void deny() {
    printf("%s\n", msgb);
}

int main(int argc, char **argv) {
    deny();
    int runExternal = 0;
    if (runExternal) {
        char* lsargs[] = {"ls", "-l", NULL};
        execvp("ls", lsargs);
    }
}

While dealing with executable we’ll encounter a lot of hexadecimal values, I prefer using Python to do quick arithmetic whenever the need arises. Also, some of the output is going to be too big to paste here in the post so I’ll link them in the end.

Goal

Let’s compile the program and get our a.out.

gcc examinebin.c

As we see in the code the execution of the program basically runs the deny and halts. I am creating a goal for myself that I’ll identify the instruction inside the executable and change it to make sure that allow is called and then ‘lsis executed with-a` argument.

PS: The tools used to do analysis and their output are listed at the bottom of this post for reference.

Call Sequence

If we look at the objdump output, it is very neatly divided into segment and clearly labeled with symbol names. The code we are interested in is the one we wrote but it’s nice to know what everything else is.

The short version is every C program needs a main routing which marks the start and end of user written code. C runtime executes main within its framework and takes care of all static and runtime dependencies. The order of execution can be determined very easily by hooking up the executable with gdb and adding breakpoint to all symbols defined in .text section and _init & _fini. Let’s see what happens.

breakpoint : _init, _start, deregister_tm_clones, register_tm_clones, **do_global_dtors_aux, frame_dummy, allow, deny, main, **libc_csu_init, __libc_csu_fini, _fini

This is the call sequence labelled by me based on what my understanding of the usual meaning of these symbols

// Initialisation
_init (argc=1, argv=0x7fffffffdfd8, envp=0x7fffffffdfe8)
_start ()
__libc_csu_init ()
_init ()
frame_dummy ()
register_tm_clones ()

// User Code
main ()
deny () // we want to call allow and execvp here instead


// Deconstruction and finalisation
__do_global_dtors_aux ()
deregister_tm_clones ()
deregister_tm_clones ()
_fini ()

Identification

Now that we know what we don’t have to explore we can focus on the task at hand, calling allow and ls with -a. To do that we will specify our goal properly, basically we want to:

  • call allow instead of deny.
  • change runExternal flag value to non-zero.
  • change "-l" to "-a" in lsargs`

To do that we have to know where these values are in binary and then change them manually without disturbing everything else.

Replace deny

The hexadecimal code calling deny from objdump output

0000000000001189 <allow>:

00000000000011a3 <deny>:

00000000000011bd <main>:
    11e4:    e8 ba ff ff ff           callq  11a3 <deny>
    11e9:    c7 45 dc 00 00 00 00     movl   $0x0,-0x24(%rbp)

from the callq reference we know that opcode e8 takes the operand ba ff ff ff (0xffffffba) which is basically the offset from next instruction 0x11e9. So, it should point to (0x11a3)

offset = hex(0xffffffba - 0x100000000) # getting the negative value
deny_addr = hex(0x11e9 + int(offset, 16))
print(deny_addr)

To call allow instead we will have to change (0xffffffba) to something that gives (0x1189) instead.

allow_addr = hex(0x1189)
offset = hex(int(hex(int(allow_addr, 16) - 0x11e9), 16) + 0x100000000)
print(offset)
# 0xffffffa0 -> a0 ff ff ff

So all we need to do is change ba to a0 in the binary

Change runExternal

This is quite simple, all we need to do is locate the mov instruction that is putting the value in the flag.

    11e9:    c7 45 dc 00 00 00 00     movl   $0x0,-0x24(%rbp)

and change the value to any non-zero one. ref

00 00 00 00 -> 01 00 00 00

Change "-l"

We basically want to change the arguments going into execvp function call. In the assembly we can see the location where the callq to execvp has been made and there should be push or lea instruction before that to add the argument into the stack. Since these values are hardcoded in binaries all we need to do is get the location of -l and change it to -a.

    11f6:    48 8d 05 12 0e 00 00     lea    0xe12(%rip),%rax        # 200f <_IO_stdin_used+0xf>
    11fd:    48 89 45 e0              mov    %rax,-0x20(%rbp)
    1201:    48 8d 05 0a 0e 00 00     lea    0xe0a(%rip),%rax        # 2012 <_IO_stdin_used+0x12>
    1208:    48 89 45 e8              mov    %rax,-0x18(%rbp)
    121b:    48 8d 3d ed 0d 00 00     lea    0xded(%rip),%rdi        # 200f <_IO_stdin_used+0xf>
    1222:    e8 69 fe ff ff           callq  1090 <execvp@plt>

lea instruction is basically calculating the effective address which in every case here is an offset to the next instruction pointer. So we have three addresses, which can be calculated or seen in the objdump output as well.

print(hex(0xe12 + 0x11fd)) # 0x200f
print(hex(0xe0a + 0x1208)) # 0x2012
print(hex(0xded + 0x1222)) # 0x200f

from the hexdump output we can clearly see that our strings are really there.

00002000: 0100 0200 416c 6c6f 7700 4465 6e79 006c  ....Allow.Deny.l
00002010: 7300 2d6c 0000 0000 011b 033b 5400 0000  s.-l.......;T...

Changing the fourth byte from the right 6c -> 61 will make l->a.

Changes

Let’s summarize and do all the necessary changes to the text output provided by xxd utility.

Changes for allow

000011e0: 0000 0000 e8(ba) ffff ffc7 45dc 0000 0000
000011e0: 0000 0000 e8(a0) ffff ffc7 45dc 0000 0000

Changes for runExternal flag

000011e0: 0000 0000 e8ba ffff ffc7 45dc (00)00 0000
000011e0: 0000 0000 e8ba ffff ffc7 45dc (01)00 0000

Changes for l -> a

00002010: 7300 2d(6c) 0000 0000 011b 033b 5400 0000
00002010: 7300 2d(61) 0000 0000 011b 033b 5400 0000

Create new executable

Using xxd utility

xxd -r modified-xxd.txt > a2.out

change permission

chmod +x a2.out

Run

Well I can tell you that it actually works but it’s better to try yourself. The output is:

Before

Deny

After

Allow
.  ..  a.out  a2.out

Tools

Lets review some tools that tell us about the file from the outside.

file

utility that gives the file name, file type and other format related information.

output

sum

Get the checksum and number of blocks in the file. Once we do some reverese engineering this output will tell us that the new executable is not genuine.

output

ldd

Gives the list of shared objects required by the executable

output

There are some utilities that give a quick peek about the executable if in depth examination is not something you need.

strings

Displays all printable characters and strings in the file. Works on any file in fact not just executable

output

nm

Lists all the symbols present in the executable file address map.

output

Now comes the in depth analysis of executable, this includes intrepreting the machine code into human readable form and also figuring out a way to edit the file.

objdump

using the -d option you can get the detailed version of each section and segment of your executable along with the interpreted assembly instruction.

output

xxd or hexdump

These are plain read-write tools to deal with binary files and not just executables. Reading part creates a text file showing hexadecimal values at each byte and if possible there is a printable version side by side. Any changes to this output text file can be fed back to the tool, which can then create a binary file.

I am using xxd for reading and writing the executable here.

output