Ghidra, a potentially useful reverse-engineering tool

In the past, I’ve tried to decompile the game’s code using RetDec.

While I was successful in translating the games binary into some pseudo-code, I couldn’t really do anything with it.

Recently, the NSA declassified and released a powerful reverse-engineering tool called Ghidra.

In addition to that, someone created a language definition for Ghidra, containing specific instructions for the GameCube’s Gekko processor.

How to Import AWL code into Ghidra

Extracting the Executable From the Game

To analyze AWL (or AnWL) code with Ghidra, you’ll need to extract the game’s main executable.

This can be named Start.dol, boot.dol, or main.dol, depending on your tool of choice.

The executable can be extracted using Dolphin by right-clicking on your ISO and selecting “Properties”

Then, in the Filesystem tab, right-click the Disc and select “Extract System Data”

You’ll find the extracted executable named main.dol in your chosen folder.

Converting the Executable to a Better Format

Gamecube executables are typically in a DOL format. While this can, in theory be analyzed, it’s much easier for Ghidra (and other tools) to analyze an ELF-format file.

To do this, we can use a tool aptly called DolTool.

Simply copy your main.dol file into the DolTool directory and open up a command prompt in the folder. Then run the following.

DolTool.exe -e main.dol

Importing your ELF into Ghidra

Create a new Ghidra project, then select File -> Import, and select your new main.elf file.

In the following screen, select the “Executable and Linking Format” format and “PowerPC:BE:32:Gekko_Broadway:default” language

Analyzing main.elf

Finally, right-click your newly imported main.elf file and select “Open in Default Tool”

The file should then open up in CodeBrowser and prompt you to analyze it.

From here, you’re free to use the program to analyze the code and it’s functions.

Playing around with Ghidra, I located this bit of code that seems to load the mainchapter CLZ files

I don’t have much experience with reverse engineering, so you’re on your own from this point forward.

Decompiling the Game

Good news!

Avast has released the source code for their Retargetable Decompiler (RetDec for short).

Why am I excited about this?

Because can decompile PowerPC binaries (i.e. elf files) into a programming language (C or Python).

The main executable program in any Gamecube/Wii game is a Start.dol file, which is a variant of the elf format.  Also, the GC/Wii are both based on the PowerPC architecture.

Most decompilers will only break a game down into what’s called Assembly.  This is a type of machine code that is difficult at best to reverse-engineer.  And those that support recompiling to other programming languages typically only support the x86 architecture.

Previously, one could try RetDec online, but it had a very small limit on decompilation time (which was not enough for a file as large as the game’s main program code).  But now, I can run it on my own machine for as long as needed to run the decompilation process.

I’ve been able to successfully install RetDec and it’s dependencies.  I was then able to convert AWL’s Start.dol file into a Start.elf file using DolTool 0.3 and begin the decompilation process.

Unfortunately the decompilation failed part-way through due to not having enough memory (my laptop only has 4GB RAM).  My roommate offered to let me try on his laptop (which has 8GB), so once I’m able to try on that, I’ll post an update.

From the partial decompilation, I can tell that I’m on the right track though.  I’ve been able to find a few lines in the LLVM code that reference CLZ files and could be the key to their decompression algorithm.

@global_var_80298180.828 = constant [22 x i8] c”mainchapter%d.arc.clz\00″

%v4_800131f8 = call i32 @function_80238268(i32 %v2_800131f0, i32 ptrtoint ([22 x i8]* @global_var_80298180.828 to i32), i32 %v0_800131e8)

It appears that the clz file is passed through to a variable, which is them passed through some sort of function (possibly to be decompressed).

I’ll be analyzing this code as much as I can to try and figure out how the files work.

On a separate note, decompiling the game’s primary source code could create a better understanding of the game overall and could open the door for additional future mods from other authors..


Update

I tried running the decompilation on my roommate’s 8GM RAM machine, but it still failed after running out of memory (though at a later point).

But on the bright side, I was eventually able to get the decompilation to work by turning off code optimizations with the –backend-no-opts flag in retdec.

The main downside to this is that the code is much larger and complicated than it would be if I had been able to run the optimizations.

function_80238268(v105, (int32_t)“mainchapter%d.arc.clz”, v35);

At least it’s somewhat easier to follow now than the partially-decompiled LLVM code obtained previously.


If anyone has a machine with higher memory (e.g. 32GB) you could try running RetDec to decompile the binary yourself.  I’ve made a downloadable [folder] with all of the needed files, including an installation guide to get everything set up.

[Please PM me if you’d be able to help out with this]


For anyone else that feels like helping out, I’ve uploaded the decompiled source code [link].

The code is unfortunately fairly large and unoptimized.

I have versions written in both C and Python.

Please feel free to take a look and see if you can find any useful information.

Initial Examinations of CLZ and SB Files

I’ve began taking a closer look at the .clz files that comprise a lot of both AWL and AnWL’s filesystem.

Upon examination with a hex editor, my best guess is that they’re some sort of archive.

The first few hex characters, denoting the CLZ file type header seem to be consistent 0x434C5A.

I determined this because, after looking at commonall.arc.clz, I could find code for a U8 archive (noted by the U8 header 0x55AA382D).  This is likely the  commonall.arc within the  commonall.arc.clz file. However, I’ve found 11 occurrences of this U8 header tag within the CLZ file, so there very well may be 11 different U8 archives (as well as other files with different file headers) within the CLZ archive.

My plan at this point is to basically isolate the U8 archive code, and then save it to a separate ARC file for further analysis.


I’ve also noticed a bunch of SB files in the root/test/Scripts directory of the games.

From what I’ve found online, these seem to be compiled SPC binaries (similar to those in Super Mario Sunshine).

I’m not 100% sure of what these scripts/binaries may do at this time, or if they’re even used at all (as they’re in the test folder).

It seems that an SB Decompiler exists, but now I need to figure out how to compile the decompiler. It seems to require a compiled version of the arookas library to function, but I’m not yet sure how to compile that on a Windows machine.

One step forwards, two steps back.


At this point I’m almost considering emailing Natsume to see if I could get in touch with any of the staff that worked on the original games if they’d be willing to provide any insights into development using these file formats.

But that could also result in them becoming aware of the mod and sending a cease and desist if they deem appropriate. So there is an element of risk in getting them involved.

What are your guys thoughts?