Codegate CTF Preliminary 2019: PyProt3ct (Reversing 27.8)

rev  /  PyProt3ct
bpsec
36 solves  /  27.8 points

d34dPY27

Writeup

What We Got

The provided zip archive contains three files:

  • chal.py: a wrapper that reads the file code, a flag from stdin and calls a function from play.py with those as arguments
  • play.py: an obfuscated script with multiple functions, all names are a combination of 0s and Os (like O0O0OOO00OO00O000)
  • code: a 600 kB file with mixed binary and ASCII characters, a quick peek shows that there are also strings like O0O0OOO00OO00O000

Tidying Up 🧹

As a first step I started to deobfuscate play.py manually by renaming variables to meaningful names. It turned out that it is a virtual machine implementation that executes the code in the code file. The last function loops over the instructions and executes them. Each of the other functions is an implementation of an instruction. They all receive a memory object which is a python dict. There are special memory slots, for example the program counter sits at key 1000 and the user input sits at flag. There is also a memory slot that holds the return value when our code returns. The overall goal will most likely be to make the code return a truthy value. We then know that the user input we gave it is the correct flag.

The VM Architecture

To get an idea of what the actual program does, we first have to understand how the VM works, what instructions there are, etc.

Instruction Format

Each instruction consists of a 1 byte opcode and 1 byte arguments count. There can be up to 4 arguments which are then read one after another by first reading 1 byte that describes the argument’s length and then actual argument itself. Arguments are stored in the memory slots 2001 to 2004.

Instructions

Opcode My name for it Args Meaning (pseudo code)
0x00 const 2 mov <mem_slot_dst>, <literal_value>
0x02 set_item 3 mov <mem_slot_dst>[<offset>], <mem_slot_src>
0x08 jge 3 cmp <mem_slot_a>, <mem_slot_b>; jge <address>
0x0b jmp 1 jmp <address>
0x22 get_item 3 mov <mem_slot_dst>, <mem_slot_src>[<offset>]
0x29 eval_call_w_arg 2 evals the first argument, calls the result with the second argument and saves the return value in the result slot
0x2c eval_call 1 same as above, but calling without an argument
0x31 jl 3 cmp <mem_slot_a>, <mem_slot_b>; jl <address>
0x48 get_result 1 mov <mem_slot_dst>, result_slot
0x4a return 1 return <mem_slot_src>
0x51 je 3 cmp <mem_slot_a>, <mem_slot_b>; je <address>
0x52 mov 2 mov <mem_slot_dst>, <mem_slot_src>
0x5b jne 3 cmp <mem_slot_a>, <mem_slot_b>; jne <address>
0x63 subrange 4 mov <mem_slot_dst>, <mem_slot_src>[<offset_start> : <offset_end>]
0x66 add 3 add <mem_slot_dst>, <mem_slot_a>, <mem_slot_b>
0x6f or 3 or <mem_slot_dst>, <mem_slot_a>, <mem_slot_b>
0x83 and 3 and <mem_slot_dst>, <mem_slot_a>, <mem_slot_b>
0x97 shl 3 shl <mem_slot_dst>, <mem_slot_a>, <mem_slot_b>
0xab shr 3 shr <mem_slot_dst>, <mem_slot_a>, <mem_slot_b>
0xba xor 3 xor <mem_slot_dst>, <mem_slot_a>, <mem_slot_b>

A good thing to note here is that the VM apparently is a Harvard architecture (data and code are stored separately) and there is no direct way of modifying the code, so we won’t have to deal with self modifying code. The only exceptions are the eval/call instructions which could also be dangerous because they can execute arbitrary python code, so I decided to only run the whole thing in a docker container.

The Actual Program

To get some insight into the code, I wrote a simple disassembler based on the code of the VM. The first lines of its output looked like this:

1
2
3
0x00000000: OOOOO0OOO0O0OOOO00O0O00OOO0OO0O0000O0OOO000000OOO0OO00OOOO00O0OO0000OOOO0OOO000 = int(0x348c0)
0x00000059: OOO0OOOO000OOO000O000OO00O0O0OO00OOOOO0O0000O000O0000OO00O0O00OO0O0OO0OOO000000OOOOOOOOOOO000 = int(0x1d83a6)
0x000000c1: OOOO00O00O0OO0O0OOO0OOOOOOO000O0O00OO0O0000OOOOOOO000O0OOO0000OOO00O00000OOO000OOOOO0 = int(0x3b9e48)

Taking Out the Trash 🗑️

Here we go again… I made the disassembler name the memory slots var_* instead of that 💩. I then used some bash voodoo to find slots that are in fact constants because they are only assigned once and renamed them to const_*. I can now finally look at the code without my eyes starting to bleed 🙂.

1
2
3
4
5
0x00000000: const_0 = 0x348c0
0x00000059: const_1 = 0x1d83a6
0x000000c1: const_2 = 0x3b9e48
0x00000121: const_3 = const_1 ^ const_2
0x0000024c: const_4 = const_0 + const_3

Looking at the disassembly (4072 lines), I noticed that there are many constant assignments and conditional jumps based on constant values (which makes them unconditional jumps). Since that is basically all dead code, I filtered out those instructions. That leaves us with the core code (I already renamed some mem slots here):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
0x0007e160: total = const_3066 + const_3068
0x0007f9f3: call eval(len) flag
0x0007fa41: flag_length = result
0x0008069d: goto 0x00081608 if flag_length == const_2747
0x00080c50: return const_3130
0x0008142e: goto 0x00081608
0x000818cc: curr_char = flag[const_2020]
0x00083295: total = total + curr_char
0x00083fb4: total = total << const_2747
0x00084af5: curr_char = flag[const_2478]
0x00084d0d: total = total + curr_char
0x00086181: total = total << const_2747
0x00086595: curr_char = flag[const_2513]
0x000883b3: total = total + curr_char
0x00088922: total = total << const_2747
0x00088ada: curr_char = flag[const_2560]
0x00088c05: total = total + curr_char
0x00088dbd: total = total << const_2747
0x0008917c: curr_char = flag[const_2641]
0x000892eb: total = total + curr_char
0x00089581: total = total << const_2747
0x00089739: curr_char = flag[const_2671]
0x0008a21a: total = total + curr_char
0x0008a40c: total = total << const_2747
0x0008a607: curr_char = flag[const_2706]
0x0008a836: total = total + curr_char
0x0008a9ee: total = total << const_2747
0x0008b379: curr_char = flag[const_2733]
0x0008b4a4: total = total + curr_char
0x0008b9c8: i = 0x0
0x0008c066: goto 0x00091b2d if i >= const_2870
0x0008c422: tmp_3 = total >> const_2782
0x0008c6b5: tmp_3 = tmp_3 ^ const_1896
0x0008cd6c: tmp_3 = tmp_3 + const_1896
0x0008d230: tmp_3 = tmp_3 & const_2900
0x0008d6d6: tmp = total & const_2900
0x0008d88e: tmp = tmp ^ const_1896
0x0008db4f: tmp = tmp + const_1896
0x0008ddde: tmp = tmp & const_2900
0x0008e2ec: total = tmp
0x0008f038: total = total << 32
0x0008f381: total = total | tmp_3
0x0008f9fa: tmp_2 = total & const_3440
0x0008fbb2: total = total >> const_2733
0x00090e20: tmp_2 = tmp_2 << const_2852
0x0009114a: total = total | tmp_2
0x0009162a: total = total & const_2939
0x00091878: i = i + const_2478
0x00091a97: goto 0x0008bea8 (back jump)
0x00091b2d: goto 0x0009230f if total != const_3107
0x00092304: goto 0x000939ba
0x000939ba: return const_3130

I also put log statements into the VM code to see the memory values during runtime. The value in parentheses behind a memory slot is its current value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
0x0007e160: total = const_3066 (0x0) + const_3068 (0x0)
0x0007f9f3: result = eval('len')(bytearray(b'AAAAAAAA'))
0x0007fa41: flag_length = result (8)
0x0008069d: if flag_length (0x8) == const_2747 (0x8) goto 0x00081608 (taken)
0x000818cc: curr_char = flag[0] (bytearray(b'AAAAAAAA')[0], 65)
0x00083295: total = total (0x0) + curr_char (0x41)
0x00083fb4: total = total (0x41) << const_2747 (0x8)
0x00084af5: curr_char = flag[1] (bytearray(b'AAAAAAAA')[1], 65)
0x00084d0d: total = total (0x4100) + curr_char (0x41)
0x00086181: total = total (0x4141) << const_2747 (0x8)
0x00086595: curr_char = flag[2] (bytearray(b'AAAAAAAA')[2], 65)
0x000883b3: total = total (0x414100) + curr_char (0x41)
0x00088922: total = total (0x414141) << const_2747 (0x8)
0x00088ada: curr_char = flag[3] (bytearray(b'AAAAAAAA')[3], 65)
0x00088c05: total = total (0x41414100) + curr_char (0x41)
0x00088dbd: total = total (0x41414141) << const_2747 (0x8)
0x0008917c: curr_char = flag[4] (bytearray(b'AAAAAAAA')[4], 65)
0x000892eb: total = total (0x4141414100) + curr_char (0x41)
0x00089581: total = total (0x4141414141) << const_2747 (0x8)
0x00089739: curr_char = flag[5] (bytearray(b'AAAAAAAA')[5], 65)
0x0008a21a: total = total (0x414141414100) + curr_char (0x41)
0x0008a40c: total = total (0x414141414141) << const_2747 (0x8)
0x0008a607: curr_char = flag[6] (bytearray(b'AAAAAAAA')[6], 65)
0x0008a836: total = total (0x41414141414100) + curr_char (0x41)
0x0008a9ee: total = total (0x41414141414141) << const_2747 (0x8)
0x0008b379: curr_char = flag[7] (bytearray(b'AAAAAAAA')[7], 65)
0x0008b4a4: total = total (0x4141414141414100) + curr_char (0x41)
0x0008b9c8: i = 0x0
0x0008c066: if i (0x0) >= const_2870 (0x7f) goto 0x00091b2d (not taken)

The Logic 💡

The core logic of the program looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
if len(flag) != 8:
return False

total = unpack_qword(flag)

for i in range(127):
tmp_3 = total >> 32
tmp_3 ^= 0xffc2bdec
tmp_3 += 0xffc2bdec
tmp_3 &= 0xffffffff
tmp = total & 0xffffffff
tmp ^= 0xffc2bdec
tmp += 0xffc2bdec
tmp &= 0xffffffff
total = tmp << 32
total |= tmp_3
tmp_2 = total & 0x7f
total >>= 7
tmp_2 <<= 57
total = total | tmp_2
total = total & 0xffffffffffffffff

return total == 0xd274a5ce60ef2dca

The Solution

At this point I probably stared too many hours into the PyProt3ct abyss, so I decided that in order to solve the challenge I had to write a Z3 script. So I did that. And it said unsatisfiable. Which was quite demotivating.

Fortunately, @tihmstar came by right in that moment and quickly saw that all the operations can be reversed and we could simply use that to compute the flag from the expected value at the end:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/env python

from struct import pack

# start with the expected value
total = 0xd274a5ce60ef2dca

# do the loop in reverse
for i in range(127):
tmp_2 = total >> 57
total <<= 7
total |= tmp_2

tmp_3 = total & 0xffffffff
tmp_3 = (tmp_3 + (1 << 32) - 0xffc2bdec) % (1 << 32)
tmp_3 ^= 0xffc2bdec
tmp_3 &= 0xffffffff

tmp = total >> 32
tmp &= 0xffffffff
tmp = (tmp + (1 << 32) - 0xffc2bdec) % (1 << 32)
tmp ^= 0xffc2bdec

total = (tmp_3 << 32) | tmp

# show the flag!
flag = pack('>Q', total).decode('utf-8')
print('Flag: {}'.format(flag))

That gave us the flag d34dPY27. Using it as the user input for the original VM script gave us a :), so our calculation was right!

Resources