RLE

From GPWiki

Files:GUITutorial_warn.gif The Game Programming Wiki has moved! Files:GUITutorial_warn.gif

The wiki is now hosted by GameDev.NET at wiki.gamedev.net. All gpwiki.org content has been moved to the new server.

However, the GPWiki forums are still active! Come say hello.

Run Length Encoding, or RLE, is a lossless compression method that reduces sequences of repeating values to a single value and the number of times to repeat it.

Example:

AAAAABBBCCCCCCCC

Can be reduced to:

A5B3C8

Obviously, this works best in datasets where there are long chains of identical data such as flat shaded images.

In an actual implementation you'd need some way to distinguish between actual values and the amount of times they have to be repeated. In the above example every letter is followed by a number - this means the format always alternates pieces of data and repeat counts. But what if you want to encode:

ABCDEBC
--->
A1B1C1D1E1B1C1

Well, it didn't really get compressed, it grew in size by a factor 2. Even images that do contain big areas of the same color often also contain more messy parts, where segments of length 1 are very common. To make this compress you'll need some way to allow single values to be inserted without a count. This creates the problem that you have to know whether the next piece of data is a count or the next value.

To do this you can create an 'escape' character. Much like the backslash in C-strings, this character is never normal data. Say we use x, an x means the following value will be followed by a count.

ABCDEBC
--->
ABCDEBC

AAAAABBBCCCCCCCC
--->
xA5xB3xC8

ABCCCCCCCDDDGGE
--->
ABxC7xD3GGE

This causes the first example to be compressed better, and second example to be compressed somewhat worse. Also note that if there are only identical values following each other (GG in the third example), it is more efficient to just put "GG" instead of "xG2" (but this also depends on the exact sizes of 'data' and 'count' values).

Some people might have noticed a problem with this x escape value... what if the actual data contains an 'x'? You can't put it in literally, because the program will think it is the start of a 'counted value' and mess up. The simplest answer is that you'll just have to always escape occurences of 'x' in the data.

ABCxDxxx
--->
ABCxx1Dxx3

External links

RLE at Wikipedia