Getting Closer to Cracking CLZ

So far, this is that I’ve been able to deduce about the CLZ files used in A(n)WL.

  • The game separates archiving and compression, similar to other computing systems at the time and some still to this day (e.g. tar vs. tar.gz), meaning the compressed clz file only contains one file.
  • Multiple files can be grouped into an ARC file (U8 Archive) and then compressed.  This is why some files end in filename.arc.clz.
  • The file format is comprised of:
  1. A long (4 bytes) at 0x00000000 which is the CLZ identifier (i.e. 43 4C 5A 00)
  2. A long 0x00000004 of the size (in bytes) of the decompressed data, in hex (e.g. 00 53 54 90 for AWL’s commonall.arc.clz)
    Currently this is only speculated. I am unable to confirm that this is what this variable actually is until I successfully decompress a clz file.
  3. A long at 0x00000008 with blank space (i.e. 00 00 00 00)
  4. A repeat long at 0x000000c of the size in bytes (in hex). (e.g. 00 53 54 90 for the above file)
  5. One null byte at 0x00000010 (e.g. 00)
  6. The compressed file data starting at 0x00000011 (e.g. 55 AA 38 2D as this file contains a U8 Archive)
  • The file uses some form of Lempel–Ziv compression, possibly one of the following:
    • LZ77
    • LZ77-Huffman
    • Huf8
    • LZH8 (either original or nonstrict encoding)

I’ve found a few possible tools [1] [2] [3] for decompressing these algorithms and will report back once I have done some further testing.

I feel like I’m getting closer to finally figuring out the format.

Advertisement