Prepare everything

I had a lot of problem finding the needed tools for XP SP1, because it is just so old. First of all, I couldn’t find a ” Visual 2008 Express C++ Edition ” executable compatible with XP SP1. I also couldn’t find symbols for XP SP1.

What I ended up doing is installing vc++ on an XP SP3, build what I needed here (while configuring vc++ so that the executable is compatible with SP 1) and transfer it on the XP SP1 machine.

Also, I only managed to find symbols for XP SP3. It’s still fine though, while the symbols is a always a nice plus, you can debug without symbols. If you do need some offsets to specific data, you can check what’s the offset on the XP SP3 machine, and they’re the same in XP SP1 (at least from what I could see).

Let’s start

As already mentionned, I’ll follow mr_me tutorial (replicated on b33f’s blog), and try to expand on anything that I had a hard time grasping.
This code is affected by a heap overflow at line 39. The reason why is quite obvious : we allocated 260 bytes of memory on the heap, and we then put an user controlled string (the first argument given to our application) inside it. Since there’s no size check, our string ends up overflowing the allocated space.

#include 
#include 
#include 

DWORD MyExceptionHandler(void);
int foo(char * buf);

int main(int argc, char * argv[]) {
  HMODULE l;
  l = LoadLibrary("msvcrt.dll");
  l = LoadLibrary("netapi32.dll");
  printf("\n\nHeapoverflow program.\n");
  if (argc != 2)
    return printf("ARGS!");
  printf("String given as arg length : %d\n", strlen(argv[1]));
  foo(argv[1]);
  return 0;
}

DWORD MyExceptionHandler(void) {
  printf("In exception handler....");
  ExitProcess(1);
  return 0;
}

int foo(char * buf) {
  HLOCAL h1 = 0, h2 = 0;
  HANDLE hp;

  __try {
    hp = HeapCreate(0, 0x1000, 0x10000);
    if (!hp) {
      return printf("Failed to create heap.\n");
    }
    printf("HEAP created at %.8X\n", hp);
    h1 = HeapAlloc(hp, HEAP_ZERO_MEMORY, 260);
    printf("CHUNK 1: %.8X %.8X\n", h1, & h1);
    // Heap Overflow occurs here:
    strcpy(h1, buf);
    printf("data copied to first buffer\n");
    // This second call to HeapAlloc() is when we gain control
    h2 = HeapAlloc(hp, HEAP_ZERO_MEMORY, 260);
    printf("second heapalloc done\n");
    printf("CHUNK 2: %.8X %.8X\n", h2, & h2);
  }
  __except(MyExceptionHandler()) {
    printf("oops...");
  }
  return 0;
}

Here’s something I had a hard time getting from mr_me article :

Now of course when we run this in the debugger, we gain control of the second allocation (because freelist[0] is being updated with our attack string from the first allocation).

Why is freelist[0] relevant, when we’re only allocating 260 bytes ? Why is it updated ? Shouldn’t freelist[0] be in play only for allocations that are superior to 1016 bytes ?

The answer is actually fairly simple : those 260 bytes actually are NOT what’s freelist[0] going to be updated with. In fact, while creating our 0x1000 bytes heap, it already came with a 0x980 free block.

Let’s take it right from the beginning, and observe our heap just after it was created (the //parts are comments I added and not from WinDbg output. I also colored relevant pointers/data).

0:000> !heap -h 370000


FreeList[ 00 ] at 00370178: 00370688 . 00370688 (1 block )

Heap entries for Segment00 in Heap 00370000
00370000: 00000 . 00640 [01] - busy (640)
00370640: 00640 . 00040 [01] - busy (40)
00370680: 00040 . 00980 [14] free fill // this free block is the size of 0x1000 - 0x680 = 0x980 , 0x680 being the size of the metadata of our heap

We can see that FreeList[0] (0x00370178) indeed points at a free chunk (0x00370688 – which is 00370680 + 8 bytes of metadata), that was created with the heap.

FreeList[ 00 ] at 00370178: 00370688 . 00370688 (1 block )

Let’s look at this free chunk in more details:

0:000> dd 00370680
00370680 00080130 00001400 00370178 00370178 // 16 bytes of metadata
//flink and blink both point back to freelist[0] (0x00370178) indicating it is the last free block
00370690 feeefeee feeefeee feeefeee feeefeee // feeefeee is a magic number used by heapalloc() indicating free memory
003706a0 feeefeee feeefeee feeefeee feeefeee
003706b0 feeefeee feeefeee feeefeee feeefeee
003706c0 feeefeee feeefeee feeefeee feeefeee
003706d0 feeefeee feeefeee feeefeee feeefeee
003706e0 feeefeee feeefeee feeefeee feeefeee
003706f0 feeefeee feeefeee feeefeee feeefeee

00370ff0 feeefeee feeefeee feeefeee feeefeee //end of free chunk

To summarize, our memory looks like this right after our heap was created, that’s to say after HeapAlloc() was called:

Memory used by heap right after its creation

After creating the heap, we allocate 260 bytes from it with HeapAlloc(). What does WinDbg says after this allocation ?

0:000> !heap -h 370000
//curated output
...
00370640: 00640 . 00040 [01] - busy (40)
00370680: 00040 . 00120 [07] - busy (104), tail fill - unable to read heap entry extra at 00370798
003707a0: 00120 . 00860 [14] free fill

We do find our newly allocated block. As expected,  it can hold 0x104 bytes (260 bytes), and it looks like we have quite a bit of metadata as its total size is 0x120 (288 bytes).

Nothing too surprising when we take a peak at the block, although, I might not understand what are the trailing metadata (if it is indeed metadata ?).

0:001> dd 370680 370680+120
00370680  00080024 001c075e 00000000 00000000
00370690  00000000 00000000 00000000 00000000
... //curated output
00370780  00000000 00000000 00000000 abababab
00370790  abababab feeefeee 00000000 00000000 //abababab is a magic number used by Windows as a "no man's land", like: if you go any further some shit happened bro

And of course, our free block size was reduced by the size of the new busy block. This new size is 0x980-0x120=0x860.

Let’s visualize how the memory now looks:

Memory right after first allocation

Now, what happens when we copy our string (the one we pass as argument to our application) to the block we just allocated ? Well, let’s first look at it with a small string (that doen’t induce any overflow), like “AAAA”.

dd 00370680
00370680  00080024 001c0788 41414141 00000000
00370690  00000000 00000000 00000000 00000000
003706a0  00000000 00000000 00000000 00000000
003706b0  00000000 00000000 00000000 00000000
003706c0  00000000 00000000 00000000 00000000

Everything’s looking all fine and dandy here : our “AAAA” string was copied to the first bytes of our chunk. 

Heap without overflow

Then, we allocate a second block on our heap at line 42 :

h2 = HeapAlloc(hp, HEAP_ZERO_MEMORY, 260);

What does our heap looks like now ?

!heap -h 370000
...
    Heap entries for Segment00 in Heap 00370000
        00370640: 00640 . 00040 [01] - busy (40)
        00370680: 00040 . 00120 [07] - busy (104), tail fill - unable to read heap entry extra at 00370798
        003707a0: 00120 . 00120 [07] - busy (104), tail fill - unable to read heap entry extra at 003708b8
        003708c0: 00120 . 00740 [14] free fill
        00371000:      0000f000      - uncommitted bytes

In very much the same way as to our first allocation, a new busy block appeared, and the free block shrinked.

 

However, what would have happened if we used a string that overflowed, for exemple, 296*A instead of “AAAA” ? Let’s do that and compare the heap with and without overflow (and for the lazy ones, just do $str = “A” * 296 in a PowerShell prompt to produce the string).

Heap without overflow
Overflowed heap (296*A used)

What happened ? Well, our string filled the busy chunk, and then overflowed onto the free chunk, overwriting its metadata. Most importantly, it overflowed the blink and flink pointers of the free chunk. Of course, this is gonna cause problems… Indeed, the application crashes at the following instruction:

0:000> u eip
ntdll!RtlImpersonateSelf+0x2b0:
77f6256f 8901            mov     dword ptr [ecx],eax
77f62571 894804          mov     dword ptr [eax+4],ecx

 

To understand what happened, let’s dive a little deeper on what happens when the second allocation is made. 

The free block of 0x860 has to be divided into one busy chunk of 260 bytes, and one new, shrinked free chunk. Also, and most importantly for our exploitation, the current free chunk has to be unlinked from freelist[0]. In a double linked list such as freelist[0], unlinking, in its simples form, looks like this, given that we want to unlock ” Block ” from the list.

Double linked list
Instead of pointing to our block, flink of the previous block will point to the next block
Instead of pointing to our block, blink of the next block will point to the previous block
The block is unlinked from the list

Now, we’re a bit in a special case, as we only have one free block, “PREVBLOCK” and “NEXTBLOCK” are actually the same, and the unlinking process actually looks like the following (I’ll get to what eax and exc refer to later):

FreeList[0] before unlinking of the free block happens
Step 1: Instead of pointing to our block, flink of the previous block will point to the next block. In this special case, the “next block” and ” previous block” are both freelist[0].flink as there’s no more blocks.
Step 2: Instead of pointing to our block, blink of the next block will point to the previous block. In this special case, the “next block” and ” previous block” are both freelist[0].flink as there’s no more blocks.
The free block has been unlinked

And this is why the crash happens : flink and blink are garbage due to the overflow. More precisely, the two instructions right after out crash correspond respectively to the first and second step we just described.

0:000> u eip
ntdll!RtlImpersonateSelf+0x2b0:
77f6256f 8901            mov     dword ptr [ecx],eax <----- first step of unlinking. eax actually contains our garbage flink, and ecx our garbage blink
77f62571 894804          mov     dword ptr [eax+4],ecx <----- second step of unlinking

 

And yeah, that’s it. We got our 4 bytes write primitive. Writing arbirary data (the flink contained in eax) to an arbitrary place (the blink contained in exc) is now just a matter of finding the offsets in our overflow. This part, and how to turn that into arbitrary code execution, is fairly straightforward and well explained on mr_me tutorial, which the reader can do if he wishes so :-).