All Hail The Cloud

Hello everyone.  It’s been a while.

To recap some of my previous posts about the HMAPL Project, I was trying to decompile the game’s code at one point from it’s main executable file.

Due to some complications in the code, I was never ultimately able to find any machine that exists with enough pure RAM to satisfy the optimization procedure, resulting in large, confusing, inefficient code.  However, the retdec program I was using can also use “virtual memory” which includes a “page file” stored on the hard drive.  The downside to using a page file is that it’s much slower than RAM, and also can cause all other running programs to slow down or crash.

So to solve my issues, I was able to create a Windows Server 2016 virtual machine on Google Cloud that included:

  • 1 vCPU
  • 52 GB Memory
  • 2 TB Standard persistent disk
  • Windows Server 2016 Datacenter

Once configured, I installed the necessary retdec software, manually set the page file to a size of 1.8TB and let the program do it’s thing. I made sure not to even log in while the app was running, as even that could mess with memory usage and cause a system crash.  I instead based my readings off of the CPU/disk activity from the Google Dashboard.

It took roughly 12 days for the program to run it’s full course with optimizations, but it actually did complete successfully.

I decided to go with a python interpreter, as that should make the code overall simpler to interpret.  The code can be found on my [GitHub].

It’s worth noting that the decompiled code is not the same code that the developers would have used to create the game.  This is just a file generated using AI to interpret the executable powerpc data of the game.

I’ll be going over the new code dump over the next few weeks to see what I can find, and will post back with my results.

Decompiling the Game

Good news!

Avast has released the source code for their Retargetable Decompiler (RetDec for short).

Why am I excited about this?

Because can decompile PowerPC binaries (i.e. elf files) into a programming language (C or Python).

The main executable program in any Gamecube/Wii game is a Start.dol file, which is a variant of the elf format.  Also, the GC/Wii are both based on the PowerPC architecture.

Most decompilers will only break a game down into what’s called Assembly.  This is a type of machine code that is difficult at best to reverse-engineer.  And those that support recompiling to other programming languages typically only support the x86 architecture.

Previously, one could try RetDec online, but it had a very small limit on decompilation time (which was not enough for a file as large as the game’s main program code).  But now, I can run it on my own machine for as long as needed to run the decompilation process.

I’ve been able to successfully install RetDec and it’s dependencies.  I was then able to convert AWL’s Start.dol file into a Start.elf file using DolTool 0.3 and begin the decompilation process.

Unfortunately the decompilation failed part-way through due to not having enough memory (my laptop only has 4GB RAM).  My roommate offered to let me try on his laptop (which has 8GB), so once I’m able to try on that, I’ll post an update.

From the partial decompilation, I can tell that I’m on the right track though.  I’ve been able to find a few lines in the LLVM code that reference CLZ files and could be the key to their decompression algorithm.

@global_var_80298180.828 = constant [22 x i8] c”mainchapter%d.arc.clz\00″

%v4_800131f8 = call i32 @function_80238268(i32 %v2_800131f0, i32 ptrtoint ([22 x i8]* @global_var_80298180.828 to i32), i32 %v0_800131e8)

It appears that the clz file is passed through to a variable, which is them passed through some sort of function (possibly to be decompressed).

I’ll be analyzing this code as much as I can to try and figure out how the files work.

On a separate note, decompiling the game’s primary source code could create a better understanding of the game overall and could open the door for additional future mods from other authors..


I tried running the decompilation on my roommate’s 8GM RAM machine, but it still failed after running out of memory (though at a later point).

But on the bright side, I was eventually able to get the decompilation to work by turning off code optimizations with the –backend-no-opts flag in retdec.

The main downside to this is that the code is much larger and complicated than it would be if I had been able to run the optimizations.

function_80238268(v105, (int32_t)“mainchapter%d.arc.clz”, v35);

At least it’s somewhat easier to follow now than the partially-decompiled LLVM code obtained previously.

If anyone has a machine with higher memory (e.g. 32GB) you could try running RetDec to decompile the binary yourself.  I’ve made a downloadable [folder] with all of the needed files, including an installation guide to get everything set up.

[Please PM me if you’d be able to help out with this]

For anyone else that feels like helping out, I’ve uploaded the decompiled source code [link].

The code is unfortunately fairly large and unoptimized.

I have versions written in both C and Python.

Please feel free to take a look and see if you can find any useful information.