A Brief Rundown of CLZ Files Thus Far

CLZ-Compressed files remain one of the biggest hurdles for the Proud Life mod.

For this reason, I’ve decided to document everything currently known about these files.

Archives vs. Compression

The easiest way to understand how the game handles compression is to understand the difference between “archiving” and “compression”.

Most compression tools on Windows will do both of these in the same step (e.g. .zip files).

However, archiving is the act of taking multiple files, and putting them into one file.

Compression is when this “archive” is compressed to save space.

Both A Wonderful Life and Another Wonderful Life separate these two steps by creating archives (in the form of U8 archive or .arc files), then compressing them (using clz compression).

Note how the U8 Archive is the same size as it’s parts (5+2+7 = 14) , and it’s the CLZ file that’s actually smaller in size

Essentially, we need to figure out how to undo the compression step, do whatever we want to the uncompressed data, then redo the compression step.

File Header

All CLZ files will follow a similar format.

  • File header (“CLZ”)
  • The size of uncompressed data in hex (twice). This is useful since we can use it to confirm if an attempt at uncompressing a CLZ file was successful or not.
  • Compressed data
In this example, an uncompressed mainchapter0.arc file should be 6.4MB

Compression Algorithm

While I haven’t been able to find a match for the compression used, the file extension alludes to it being a Custom Lempel-Ziv (CLZ) algorithm.

The files utilize a big endian byte encoding.

They seem to use some sort of dynamic dictionary generation. Which can be most seen when viewing the chapter arc.clz files. It’ll display the first occurrence of a pattern, but not subsequent occurrences. In this case _0.arc is listed for “boy0_0.arc” but not the files afterwards.

Note _o.arc only appears for the first entry

It seems to be generating some sort of dynamic dictionary.

Comparing Compressed / Decompressed Files

Certain data on the disc can be used to see both compressed and decompressed versions of some files.

The disc:\test\Scripts folder contains numerous script (.sb) files.

Of note, this folder also contains a U8 archive test.arc.
While this U8 archive is formatted in such a way that you can’t directly see the filenames, they can be viewed in a hex editor.

In this case, we can see that the first file in the U8 archive is a compressed CLZ file that, when decompressed, would yeild a sb script of 2a79 bytes in size.

By comparing the file size in the CLZ header, we can match up these CLZ files with the uncompressed sb files.

In this case, we can find that the first compressed “@” clz file in test.arc actually matches up with the uncompressed 0000_Animal_Navigation.sb (2a79 bytes)

I’ve managed to match up the compressed/uncompressed files (and have renamed them accordingly). They can be found [here] for comparison/research purposes.

The Search for Help

I’ve tried searching on various forums (e.g. ZenHAX, XeNTaX, and Romhacking.net), but haven’t really come up with much.

If anyone has any ideas on how to tackle this compression algorithm, it would be greatly appreciated.