Getting Closer to Cracking CLZ

So far, this is that I’ve been able to deduce about the CLZ files used in A(n)WL.

  • The game separates archiving and compression, similar to other computing systems at the time and some still to this day (e.g. tar vs. tar.gz), meaning the compressed clz file only contains one file.
  • Multiple files can be grouped into an ARC file (U8 Archive) and then compressed.  This is why some files end in filename.arc.clz.
  • The file format is comprised of:
  1. A long (4 bytes) at 0x00000000 which is the CLZ identifier (i.e. 43 4C 5A 00)
  2. A long 0x00000004 of the size (in bytes) of the decompressed data, in hex (e.g. 00 53 54 90 for AWL’s commonall.arc.clz)
    Currently this is only speculated. I am unable to confirm that this is what this variable actually is until I successfully decompress a clz file.
  3. A long at 0x00000008 with blank space (i.e. 00 00 00 00)
  4. A repeat long at 0x000000c of the size in bytes (in hex). (e.g. 00 53 54 90 for the above file)
  5. One null byte at 0x00000010 (e.g. 00)
  6. The compressed file data starting at 0x00000011 (e.g. 55 AA 38 2D as this file contains a U8 Archive)
  • The file uses some form of Lempel–Ziv compression, possibly one of the following:
    • LZ77
    • LZ77-Huffman
    • Huf8
    • LZH8 (either original or nonstrict encoding)

I’ve found a few possible tools [1] [2] [3] for decompressing these algorithms and will report back once I have done some further testing.

I feel like I’m getting closer to finally figuring out the format.

More CLZ Research

After someone mentioned to me that there were a few CLZ files embedded in the PS2 version of AWL, I decided to take a look.  Upon extrating the game data, I found they were right.

Not only that, but it appears one of the files (mainchapter0) exists in both arc and arc.clz format; possibly leftovers from porting the gamecube assets. This is nice as it provides a start and end point to look at for the compression.

Right now my working theory is that the clz files are non-standard single-file archives with some form of Lempel–Ziv compression (e.g. LZW, LZ77, LZH, etc).

It’ll take more testing to see if my theory holds and I can find some way of uncompressing and recompressing the files.