CLZ Decompression / Recompression

Sukharah’s CLZ Compression Tool

Recently a wonderful soul who goes by Sukharah has managed to reverse-engineer the CLZ compression format used by AWL and AnWL.

They’ve used this knowledge to create a CLZ Compression Tool, which can be used to decompress clz files, as well as recompress them.

This is beyond amazing and opens up the project to begin work on models, animations, textures, and so much more.

Compiling / Downloading the Tool

The tool is meant to be compiled using a C++ compiler (e.g. g++). You can follow the instructions on Sukharah’s GitHub to compile it yourself.

Precompiled Version

For those not experienced in compiling programs, I’ve compiled a version that can be downloaded from [here]. I’ll be posting any newly compiled versions on the HMAPL Discord as Sukharah continues to develop the tool.

Note that this pre-compiled build will only work on Windows 10 and up, since that’s what I used to compile it.

Using the Tool

The utility is fairly straightforward.

I found the easiest method was to put your desired CLZ file in the same directory as CLZ.exe.

Then, open up a command prompt in that directory (e.g. type “cmd” in the Windows Explorer navigation bar).

Then, in the command prompt, execute the command:

 clz unpack input.file.clz output.file
clz unpack mainchapter0.arc.clz mainchapter0.arc

In the above example, I decompressed mainchapter0.arc.clz into it’s native arc format.

Quirks of Decompressed U8 Archives

The decompressed U8 archives (.arc files) seem to display improperly when viewed in BrawlBox (my former go-to arc tool).

Data sequence mismatching, resulting in seemingly duplicate data

As we can see from the above screenshot, the contents of boy_0.arc (embedded in mainchapter0.arc) are displayed before the actual boy_0.arc descriptor. This can cause mismatch/corruption issues when attempting to edit the files using BrawlBox.

After a bit of trial and error, I’ve found the best tool to work with these decompressed U8 archives is Wexos Toolbox

No more weird data sequence mismatches

Now that we’re able to decompress CLZ into ARC and modify these ARC files using the appropriate tools, we can play around with models, textures, and so much more.

Recompressing CLZ Files

While decompressing is one step, we need to recompress the files back into CLZ archives for any hope of the game reading them.

Fortunately, Sukharah built this functionality into their tool as well.

Unfortunately I had some issues using the “pack” function (output files would crash when loaded into AWL), although I believe this issue has been resolved in a recent revision of the tool.

So I ended up using the memory-optimized pack method.

clz pack2 input.file output.file.clz
clz pack2 mainchapter0.arc mainchapter0.arc.clz

Files compressed using the pack2 function worked perfectly, albeit with slightly longer load times than the original AWL files.

A Brief Rundown of CLZ Files Thus Far

CLZ-Compressed files remain one of the biggest hurdles for the Proud Life mod.

For this reason, I’ve decided to document everything currently known about these files.

Archives vs. Compression

The easiest way to understand how the game handles compression is to understand the difference between “archiving” and “compression”.

Most compression tools on Windows will do both of these in the same step (e.g. .zip files).

However, archiving is the act of taking multiple files, and putting them into one file.

Compression is when this “archive” is compressed to save space.

Both A Wonderful Life and Another Wonderful Life separate these two steps by creating archives (in the form of U8 archive or .arc files), then compressing them (using clz compression).

Note how the U8 Archive is the same size as it’s parts (5+2+7 = 14) , and it’s the CLZ file that’s actually smaller in size

Essentially, we need to figure out how to undo the compression step, do whatever we want to the uncompressed data, then redo the compression step.

File Header

All CLZ files will follow a similar format.

  • File header (“CLZ”)
  • The size of uncompressed data in hex (twice). This is useful since we can use it to confirm if an attempt at uncompressing a CLZ file was successful or not.
  • Compressed data
In this example, an uncompressed mainchapter0.arc file should be 6.4MB

Compression Algorithm

While I haven’t been able to find a match for the compression used, the file extension alludes to it being a Custom Lempel-Ziv (CLZ) algorithm.

The files utilize a big endian byte encoding.

They seem to use some sort of dynamic dictionary generation. Which can be most seen when viewing the chapter arc.clz files. It’ll display the first occurrence of a pattern, but not subsequent occurrences. In this case _0.arc is listed for “boy0_0.arc” but not the files afterwards.

Note _o.arc only appears for the first entry

It seems to be generating some sort of dynamic dictionary.

Comparing Compressed / Decompressed Files

Certain data on the disc can be used to see both compressed and decompressed versions of some files.

The disc:\test\Scripts folder contains numerous script (.sb) files.

Of note, this folder also contains a U8 archive test.arc.
While this U8 archive is formatted in such a way that you can’t directly see the filenames, they can be viewed in a hex editor.

In this case, we can see that the first file in the U8 archive is a compressed CLZ file that, when decompressed, would yeild a sb script of 2a79 bytes in size.

By comparing the file size in the CLZ header, we can match up these CLZ files with the uncompressed sb files.

In this case, we can find that the first compressed “@” clz file in test.arc actually matches up with the uncompressed 0000_Animal_Navigation.sb (2a79 bytes)

I’ve managed to match up the compressed/uncompressed files (and have renamed them accordingly). They can be found [here] for comparison/research purposes.

The Search for Help

I’ve tried searching on various forums (e.g. ZenHAX, XeNTaX, and Romhacking.net), but haven’t really come up with much.

If anyone has any ideas on how to tackle this compression algorithm, it would be greatly appreciated.

CLZ Compression Help

I have some (potential) good news.

While doing some work on the mod, I recently came across a forum of users that specialize in game research, including file identification and extraction.

It also seems like I’m not the only one who’s come across this, as there was already a thread with someone looking into A Wonderful Life’s CLZ files.

I’ve since posted on that thread and will hopefully be able to get some further help with compressing/decompressing CLZ files.

I’ll update as I get more info.

Getting Closer to Cracking CLZ

So far, this is that I’ve been able to deduce about the CLZ files used in A(n)WL.

  • The game separates archiving and compression, similar to other computing systems at the time and some still to this day (e.g. tar vs. tar.gz), meaning the compressed clz file only contains one file.
  • Multiple files can be grouped into an ARC file (U8 Archive) and then compressed.  This is why some files end in filename.arc.clz.
  • The file format is comprised of:
  1. A long (4 bytes) at 0x00000000 which is the CLZ identifier (i.e. 43 4C 5A 00)
  2. A long 0x00000004 of the size (in bytes) of the decompressed data, in hex (e.g. 00 53 54 90 for AWL’s commonall.arc.clz)
    Currently this is only speculated. I am unable to confirm that this is what this variable actually is until I successfully decompress a clz file.
  3. A long at 0x00000008 with blank space (i.e. 00 00 00 00)
  4. A repeat long at 0x000000c of the size in bytes (in hex). (e.g. 00 53 54 90 for the above file)
  5. One null byte at 0x00000010 (e.g. 00)
  6. The compressed file data starting at 0x00000011 (e.g. 55 AA 38 2D as this file contains a U8 Archive)
  • The file uses some form of Lempel–Ziv compression, possibly one of the following:
    • LZ77
    • LZ77-Huffman
    • Huf8
    • LZH8 (either original or nonstrict encoding)

I’ve found a few possible tools [1] [2] [3] for decompressing these algorithms and will report back once I have done some further testing.

I feel like I’m getting closer to finally figuring out the format.