Delving into Arithmetic Coding

Arithmetic coding is a sophisticated form of entropy encoding used in lossless data compression. Unlike algorithms like Huffman coding which assign a specific integer number of bits to each symbol, arithmetic coding can assign fractional bits, often leading to greater compression efficiency, especially for sources with a small alphabet or highly skewed probabilities.

Abstract representation of arithmetic coding process
Visualizing the probability range division in arithmetic coding.

How Arithmetic Coding Works

The core idea of arithmetic coding is to represent an entire message (or sequence of symbols) as a single fraction, a number in the interval [0, 1). As the message becomes longer, the interval representing it becomes smaller, and the number of bits needed to specify that interval grows.

Here's a simplified view of the process:

  1. Model Probabilities: First, you need a model that provides the probabilities of the symbols in the input data. This could be a fixed model or an adaptive one that updates probabilities as it processes the data.
  2. Interval Division: Start with the range [0, 1). For each symbol in the input sequence, this range is narrowed based on the symbol's probability. The current range is subdivided into sub-intervals, with the size of each sub-interval proportional to the probability of the corresponding symbol.
  3. Select Sub-interval: The sub-interval corresponding to the next symbol in the input sequence becomes the new current range.
  4. Repeat: This process is repeated for all symbols in the message.
  5. Final Code: The final compressed message is a number that uniquely identifies the final, very small, interval. In practice, only enough bits to distinguish this interval from all other possible final intervals are transmitted.

Advantages of Arithmetic Coding

Key Differences from Huffman Coding

While both are entropy coders, Huffman coding assigns a fixed-length prefix code (e.g., 0, 10, 110) to each symbol. Arithmetic coding, on the other hand, effectively allocates a "fractional" number of bits to each symbol, based on its probability, leading to a single floating-point number representing the entire input sequence.

Disadvantages of Arithmetic Coding

Prominent Use Cases

Despite its complexities, arithmetic coding is used in several standards due to its efficiency:

For a more in-depth mathematical treatment, you can refer to resources like the Wikipedia page on Arithmetic Coding.