For some reason I keep finding myself dabbling in the worlds of compression and encryption. I'm not an expert in either of these areas, nor do I aspire to become one. It's just something that catches my interest from time to time.

On computers, both compression and encryption usually take bit patterns with a given meaning and translate them to other patterns intended to have the same meaning. This typically means having to read, write, and manipulate arbitrary groups of bits. To save myself from reinventing the wheel every time I played with another compression or encryption algorithm, I developed two libraries: one for bitwise file reading and writing (bitfile), and the other for manipulating arbitrary length arrays of bits (bitarray).

Some time ago I was asked to modify my LZSS implementation so it could be used on a SEGA Genesis without a file system, so I developed a bitwise array reading and writing library (arraystream). Arraystream is very similar to bitfile, with the major exception being that it operates on arrays.

I originally wrote the bitfile library in ANSI C, because I used C for my compression algorithms. However, this library was one of just a few things that I ever wrote (I've written a lot) where I thought that I could do a better job with C++. So I developed C++ implementations of my bitfile library. For the time being, the arraystream library is exclusively available in C.

I've been using Python when I've needed quick hacks to do something on a PC (vs an embedded system). After doing some searching of PYPI, the Python Package Index, I noticed there were a number of Python packages that were similar to my bitarray library, but there wasn't anything that provided all the functionality of my bitfile library. I had to fix that problem by providing a pure Python version of my bitfile library too.

I am publishing all of these libraries under the GNU LGPL in hopes that they will be of use to other people.

The rest of this page discusses each of my libraries.

Michael Dipperstein
mdipperstein@gmail.com


Bitfile Library Overview

Implementation

Each version of the bitfile library provides a wrapper around the language's native file I/O. The ANSI C version uses file I/O functions and every bitfile is referenced by a structure which includes a FILE pointer.

The arraystream library uses a similar structure, replacing the FILE pointer with a pointer to an array of unsigned characters and an array index. Arraystream operations are analogous to bitfile operations in almost all respects and will not be discussed further.

The C++ version of the bitfile library makes use of (but does not inherit from) the ifstream and ofstream classes. Every bit file object contains an ifstream pointer and ofstream pointer.

The Python version implements a class containing a Python file object.

In addition to a reference to a native file, each library includes an 8-bit buffer, and counter responsible for tracking the number of bits in the 8-bit buffer. The C and C++ versions of the bitfile library use an unsinged char for the 8-bit buffer.

Reading Bits

Reading bits from a bitfile works as follows:
Step 1. Read a byte from the underlying file and store it in the 8-bit buffer.
Step 2. Set the count of bits in the buffer to 8.
Step 3. Report the least significant bit (lsb) in the buffer as the bit read.
Step 4. Shift the buffer right by one bit.
Step 5. Decrement the count of bits in the buffer.

To read an additional bit, repeat the process from Step 3. Once all bits are read from the 8-bit buffer (the count equals 0) the process starts over from Step 1.

Writing Bits

Writing bits to a bitfile works as follows:
Step 1. Left shift the 8-bit buffer by one bit.
Step 2. Set the least significant bit (lsb) of the 8-bit buffer to the value of the bit being written.
Step 3. Increment the count of bits in the 8-bit buffer.

Repeat the process from Step 1 for each additional bit. Once 8 bits have been written to the 8-bit buffer, the buffer is written to the underlying file and the bit count is set to 0.

I have incorporated some short cuts that bypass the 8-bit buffer in the functions that read/write characters or bytes.

Documentation

Doxygen generated documentation of the ANSI C version of my bitfile library may be found in my repository's docs directory

The Python version of the bitfile library includes comments in docstring format.

For the time being, there is no formal documentation for the C++ version of the bitfile library or the arraystream library. Feel free to generate some and send it to me.

Usage

I have included a file named sample.[c|cpp|py] which demonstrates the usage of each function in the bitfile library and serves as a test to verify the correctness of the code.

Download

A repository containing the source for each bitarray library may be downloaded by clicking on the links below. My source has been released under the GNU LGPL. The source code repository is available on GitHub. I recommend that you checkout the latest revision of the master branch, unless you're looking for something specific.

C Version https://github.com/michaeldipperstein/bitfile
C Documentation https://michaeldipperstein.github.io/bitfile/
C++ Version https://github.com/michaeldipperstein/bitfile-cpp
Python Version https://github.com/michaeldipperstein/bitfile-py
Array Stream https://github.com/michaeldipperstein/arraystream

My latest implementations of Huffman codes, LZSS, LZW, and arithmetic encoding all provide additional examples of how to use the C version of the bitfile library. If you still have any questions or comments feel free to e-mail me at mdipperstein@gmail.com .

Home

Last updated on December 23, 2018