Bit Array Libraries

For some reason I keep finding myself dabbling in the worlds of compression and encryption. I'm not an expert in either of these areas, nor do I aspire to become one. It's just something that catches my interest from time to time.

On computers, both compression and encryption usually take bit patterns with a given meaning and translate them to other patterns intended to have the same meaning. This typically means having to read, write, and manipulate arbitrary groups of bits. To save myself from reinventing the wheel every time I played with another compression or encryption algorithm, I developed two libraries: one for bitwise file reading and writing (bitfile), and the other for manipulating arbitrary length arrays of bits (bitarray).

I originally wrote the bitarray library in ANSI C, because I used C for my compression algorithms. However, this libraries were one of just a few things that I ever wrote (I've written a lot) where I thought that I could do a better job with C++. So I developed C++ implementation. Of course somewhere along the way the Standard Template Library (STL) was added to C++ and you can do much of what the bit array library does with a vector of bit.

I am publishing both of these libraries under the GNU LGPL in hopes that they will be of use to other people.

The rest of this page discusses each of my libraries.

Michael Dipperstein
mdipperstein@gmail.com

Bitarray Library Overview

Implementation

The ANSI C bitarray library provides a collection of functions that create and operate on arrays of bits. The ISO C++ bitarray library provides a class with methods that perform similar functions. Modern versions of the C++ STL provide vector<bool> and bitset for similar functionality. My C++ implementation doesn't use the C++ STL.

Bitarrays may be of any size and are implemented as arrays of unsigned char. Bit 0 of the most significant unsigned char (char 0) is the most significant bit (msb) of the bit array. The last (non-spare) bit of the last unsigned char is the least significant bit (lsb).

Example:
An array of 20 bits (0 through 19) with 8 bit unsigned chars requires 3 unsigned chars (0 through 2) to store all the bits.

char	0								1								2

bit											1	1	1	1	1	1	1	1	1	1	X	X	X	X
bit	0	1	2	3	4	5	6	7	8	9	0	1	2	3	4	5	6	7	8	9	X	X	X	X

The array data is contained inside a structure/class which includes a count of the number of bits in the array, and a pointer to the memory storing the array. Since arrays may be of arbitrary size, the memory storing the array is dynamically allocated on the heap.

The C++ bitarray class overloads bitwise operators (&, |, ^, ...), providing the expected results on bitarray objects. The C bitarray library provides functions (BitArrayAnd, BitArrayOr, BitArrayXor, ...) for similar functionality.

I have written the bitarray library so that functions and methods requiring multiple bit arrays (such as BitArrayAnd or &), will not do anything if they are given arrays of differing sizes to operate on.

With native arrays, square brackets ([]) may be used to either obtain the value of an array element ¹, or to obtain a pointer to an array location ².

case 1:
if (array[index] == value) ...

case 2:
array[index] = value;

Unfortunately I have not found a way to do anything close to this with bitarrays in C.

In C++ it's not possible to overload square brackets ([]) to behave both ways. Consequently square brackets ([]) returns a bit value and parenthesis (()) returns a class that behaves as a pointer to a bit in the array. The class returned by parenthesis (()) may only be used for assigning bit values.

Usage

A description of each of the functions in my C bitarray library may be found here, unfortunately, I have't written a similar description for the C++ bitarray library. Both the C and C++ bitarray library source archives also include detailed headers preceding each function, and I have included a file named sample.[c|cpp] which demonstrates the usage of each function in the bitarray library.

Portability

All the source code that I have provided is written in strict ANSI C or ISO C++. I would expect it to build correctly on any machine with ANSI C/ISO C++ compilers. I have tested the code compiled with gcc on Linux on an Intel x86 and mingw on Windows XP.

The library includes the routines intended for debugging which dump the array contents to a display. These routines assume that unsigned chars are 8 bits. These routines can easily be written to support any specific size unsigned character. Writing the dump routines to handle arbitrary size unsigned char seems more difficult than it is worth to me. Especially since I only have access to machines with 8 bit unsigned chars.

Download

A repository containing the source for each bitfile library may be downloaded by clicking on the links below. My source has been released under the GNU LGPL. The source code repository is available on GitHub. I recommend that you checkout the latest revision of the master branch, unless you're looking for something specific.

C Version	https://github.com/michaeldipperstein/bitarray
C Documentation	https://michaeldipperstein.github.io/bitarray/
C++ Version	https://github.com/michaeldipperstein/bitarray-cpp

My latest implementations of Huffman codes provides an additional example of how to use the C version of these libraries. If you still have any questions or comments feel free to e-mail me at mdipperstein@gmail.com.

Home

Last updated on December 23, 2018