Run-Length Encoding (RLE) Explained

Visualization of Run-Length Encoding compressing data sequences

Run-Length Encoding: Condensing repeating sequences.

Run-Length Encoding (RLE) is one of the simplest and oldest forms of lossless data compression. Its core idea is to reduce the size of repeating sequences of data, often called "runs." Instead of storing each individual data item in a run, RLE stores the data item and the count of its repetitions.

How Does RLE Work?

Imagine you have a sequence of data like: AAAAABBBCCDAA.

RLE would encode this as:

A repeats 5 times: Encode as 5A (or A5, depending on implementation)
B repeats 3 times: Encode as 3B
C repeats 2 times: Encode as 2C
D repeats 1 time: Encode as 1D (or just D if single occurrences aren't encoded with a count)
A repeats 2 times: Encode as 2A

So, the original sequence of 13 characters could be represented more compactly, for example, as 5A3B2C1D2A (10 characters, in this specific representation).

The actual implementation can vary. For example:

Some RLE schemes only encode runs of three or more identical characters to avoid expanding data where there are few runs (e.g., ABCDE would become 1A1B1C1D1E, which is longer).
Special markers might be used to differentiate between literal data and run-count/data pairs.

Example

Consider a small black and white bitmap image line represented by WWWWBBWWWBBBBW (W for white pixel, B for black pixel).

Using RLE, this could be encoded as:

4 White pixels: 4W
2 Black pixels: 2B
3 White pixels: 3W
4 Black pixels: 4B
1 White pixel: 1W

The encoded string: 4W2B3W4B1W. The original string had 14 characters, while the RLE version (in this format) has 10 characters.

Advantages of RLE

Simplicity: It's very easy to implement and understand.
Speed: Compression and decompression are generally very fast due to the straightforward logic.
Effective for specific data types: Works very well on data with many long runs of identical values, such as simple icons, line drawings, and some types of bitmap images (especially those with large areas of uniform color), or intermediate data in scientific computations.

Disadvantages of RLE

Inefficiency with non-repetitive data: If the data has few or no runs, RLE can actually increase the file size. For example, compressing ABCDEFG might result in 1A1B1C1D1E1F1G.
Limited compression ratios: For complex data like photographs or text documents, RLE typically achieves poor compression compared to more sophisticated algorithms like LZW or Huffman coding.

Common Use Cases

Image Compression: Early image formats like PCX and BMP (in some modes) use RLE. It's also a component in some fax machine standards (ITU-T T.4 standard).
Video Compression: Can be used for compressing sequences of identical frames or regions within frames.
Intermediate Compression Step: Sometimes RLE is used as a preliminary step before applying another, more complex compression algorithm.
Computer Graphics: Used in areas like encoding masks or simple textures.

Data Compression Explained