Arithmetic compression

Contributor: KAI ROHRBACHER           

{
Hello Thomas,

On 26.06.94 you wrote in area PASCAL to subject "Arithmetic compression":
TW> But where can we get a discription of this compression method ??
  Michael  Barnsley, Lyman Hurd, "Fractal Image Compression", AK Peters,
  1993
  Mark Nelson, "The Data Compression Book", M&T Books, 1991
  Ian  Witten,  Radford  Neal,  John Cleary, "Arithmetic Coding for Data
  Compression", CACM, Vol. 30, No.6, 1987

  Below  is a small source from the 1st book, translated into Pascal and
  adopted  to  work  on  the uppercase alphabet to demonstrate the basic
  principles.
  For  a  simple  explanation, the program uses the letters of the input
  string  to "drive" the starting point through the real interval 0.0 ..
  1.0
  By  this process, every possible input string stops at a unique point,
  that  is:  a  point  (better: a small interval section) represents the
  whole  string.  To  _decode_  it, you have to reverse the process: you
  start  at  the  given  end point and apply the reverse transformation,
  noting  which intervals you are touching at your voyage throughout the
  computation.
  Due  to the restricted arithmetic resolution of any computer language,
  the  max.  length of a string will be restricted, too (try it out with
  TYPE   REAL=EXTENDED,  for  example);  this  happens  when  the  value
  "underflows" the computers precision. }

{$A+,B-,D+,E+,F-,G-,I+,L+,N+,O-,P+,Q-,R+,S+,T-,V+,X+,Y+}
{$M 16384,0,655360}
PROGRAM arithmeticCompression;
USES CRT;
CONST charSet:STRING='ABCDEFGHIJKLMNOPQRSTUVWXYZ ';
      size=27; {=Length(charSet)}
      p:ARRAY[1..size] OF REAL=  (* found empirically *)
       (
        6.1858296469E-02,
        1.1055412402E-02,
        2.6991022453E-02,
        2.6030374520E-02,
        9.2418577127E-02,
        2.1864028512E-02,
        1.4977615842E-02,
        2.8410764564E-02,
        5.5247871050E-02,
        1.3985123226E-03,
        3.8001321554E-03,
        3.2593032914E-02,
        2.1919756707E-02,
        5.2434924064E-02,
        5.7837905257E-02,
        2.0364674693E-02,
        1.0031075103E-03,
        4.9730779744E-02,
        4.8056280170E-02,
        7.2072478498E-02,
        2.0948493879E-02,
        8.2477728625E-03,
        1.0299101184E-02,
        4.7873173243E-03,
        1.3613601926E-02,
        2.7067980437E-03,
        2.3933136781E-01
       );
VAR   psum:ARRAY[1..size] OF REAL;

 FUNCTION Encode(CONST s:STRING):REAL;
 VAR i,po:INTEGER;
     offset,len:REAL;
 BEGIN
  offset:=0.0;
  len:=1.0;
  FOR i:=1 TO Length(s) DO
   BEGIN
    po:=POS(s[i],charSet);
    IF po<>0
     THEN BEGIN
           offset:=offset+len*psum[po];
           len:=len*p[po]
          END
     ELSE BEGIN
           WRITELN('only input chars ',charSet,' allowed!');
           Halt(1)
          END;
   END;
  Encode:=offset+len/2;
 END;

 FUNCTION Decode(x:REAL; n:BYTE):STRING;
 VAR i,j:INTEGER;
     s:STRING;
 BEGIN
  IF (x<0.0) OR (x>1.0)
   THEN BEGIN
         WRITELN('must lie in the range [0..1]');
         Halt(1)
        END;
  FOR i:=1 TO n DO
   BEGIN
    j:=size;
    WHILE x