Ode to Lempel-Ziv

I’m just reading an excellent MSc dissertation on digital forensics, and it mentions Lempel-Ziv, and I remember I wrote a good deal of things around the method a while ago. It was in 1977, that Abraham Lempel and Jacob Ziv developed the Lempel-Ziv class of adaptive dictionary data compression techniques (also known as LZ-77 coding), and are now some of the most popular compression techniques.

The LZ coding scheme is especially suited to data which has a high degree of repetition, and makes back references to these repeated parts. Typically a flag is normally used to identify coded and unencoded parts, where the flag creates back references to the repeated sequence. An example piece of text could be:

lzThis text has several repeated sequences, such as ‘ is ‘, ‘it’, ‘en’, ‘re’ and ‘ receiv’. For example the repetitive sequence ‘ recei’ (as shown by the underlined highlight), and the encoded sequence could be modified with the flag sequence #m#n where m represents the number of characters to trace back to find the character sequence and n the number of replaced characters. Thus the encoded message could become:

‘The receiver#10#3quires a#20#5pt for it. This is automatically sent wh#6#2 it #30#2#47#5ved.’

Normally a long sequence of text has many repeated words and phases, such as ‘and’, ‘there’, and so on. Note that in some cases this could lead to longer files if short sequences were replaced with codes that were longer than the actual sequence itself.

With a previous example of sport results:

Mulchester 3 Madchester 2
Smellmore Wanderers 60 Drinksome Wanderers 23

we could compress this with:

Mulchester 3 Mad#13#7 2
Smellmore Wanderers 60 Drinksome#23#1123

— I’ll add more to this later …

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s