It happened again. I came across another algorithm that I had to investigate, implement, and publish my implementation of. This time the catalyst was my paying job. One of the systems I'm working with is quite noisy, and the customer is using a Hamming code to limit the effects of the noise.
I was given a spec with a table that showed how the 7 bit codes that my device received should be converted into 4 bit data. The document stated that the table decoded a Hamming (7, 4) code. That was it. There was no more explanation of where the table came from or why it made things any better? I spent some of my "spare" time learning the answers to those questions.
As with my other libraries, my intent is to publish an easy to follow ANSI C implementation. Anyone familiar with ANSI C and the Hamming encoding/decoding algorithms should be able to follow and learn from my implementation.
Information on downloading the source code for all of my Hamming encoding and decoding implementations may be found here.
The rest of this page discusses the basics of Hamming codes and my implementation.
Around 1947 Richard W. Hamming developed technique for detecting and correcting single bit errors in transmitted data. His technique requires that three parity bits (or check bits) be transmitted with every four data bits. The algorithm is called a (7, 4) code, because it requires seven bits to encoded four bits of data.
In order to understand my implementation of Hamming codes, it helps to be comfortable with the concepts of matrix multiplication, modulo 2 arithmetic, and parity. If you're already comfortable with these concepts, feel free to skip ahead to the section on encoding.
This section isn't intended to be a complete course on linear algebra. It's just a simple reminder on how matrices are multiplied. In all truth, I'm including it because I keep forgetting how it's done.
To start off with, you can't multiply two matrices together unless the first matrix has a number of columns equal to the number of rows in the second matrix. If [A] and [B] are matrices [A] × [B] is only meaningful if [A] is an i×j matrix (i rows and j columns) and [B] is a j×k matrix (j rows and k columns).
Suppose [A] is the matrix:
a_{11} | a_{12} | … | a_{1j} |
a_{21} | a_{22} | … | a_{2j} |
⋮ | ⋮ | … | ⋮ |
a_{i1} | a_{i2} | … | a_{ij} |
and [B] is the matrix:
b_{11} | b_{12} | … | b_{1k} |
b_{21} | b_{22} | … | b_{2k} |
⋮ | ⋮ | … | ⋮ |
b_{j1} | b_{j2} | … | b_{jk} |
the result of [A] × [B] is an i×k matrix
c_{11} | c_{12} | … | c_{1k} |
c_{21} | c_{22} | … | c_{2k} |
⋮ | ⋮ | … | ⋮ |
c_{i1} | c_{i2} | … | c_{ik} |
where:
c_{mn} = (a_{m1} × b_{1n}) +
(a_{m2} × b_{2n}) + … +
(a_{mj} × b_{jn})
for m ≤ i and n ≤ k.
If you can handle conventional arithmetic, modulo 2 arithmetic should be an easy concept to handle. Modulo 2 arithmetic is done in base 2 (binary), so the only numerals are 0 and 1. Since everything is modulo, there is no carrying either.
Addition and multiplication are the only operators modulo 2 operators that we really care about. For maximum browser compatibility, I will represent these operations with an ordinary plus (+) and multiplication symbol (×). I will try to note when + and × are not being used as modulo 2 operators.
Modulo 2 addition works as follows:
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 0
A + B = B + A
A + B + C = (A + B) + C = A + (B + C)
Modulo 2 multiplication works as follows:
0 × 0 = 0
0 × 1 = 0
1 × 0 = 0
1 × 1 = 1
A × B = B × A
A × B × C = (A × B) × C = A × (B × C)
Notice that modulo 2 addition functions just like a logical Exclusive OR (XOR) and modulo 2 multiplication functions just like a logical AND.
In computer science terms, parity comes in two varieties even and odd. A string of bits (1's and 0's) has even parity if it contains an even number of 1's (the modulo 2 sum of the bits is 0), otherwise it has odd parity (the modulo 2 sum of the bits is 1). Many error detection schemes utilize parity bits. A parity bit is an extra bit that forces a binary string to have a specific parity. The parity bit for an even parity block may be computed by summing all the bits modulo 2. The parity bit for an odd parity block may be computed by summing all the bits modulo 2 and adding 1. The table below lists all possible three bit values and value of a parity bit required to create a four bit sequence with even parity.
3 Bit String | Parity Bit | Verification |
---|---|---|
000 | 0 | 0 + 0 + 0 + 0 = 0 |
001 | 1 | 0 + 0 + 1 + 1 = 0 |
010 | 1 | 0 + 1 + 0 + 1 = 0 |
011 | 0 | 0 + 1 + 1 + 0 = 0 |
100 | 1 | 1 + 0 + 0 + 1 = 0 |
101 | 0 | 1 + 0 + 1 + 0 = 0 |
110 | 0 | 1 + 0 + 1 + 0 = 0 |
111 | 1 | 1 + 1 + 1 + 1 = 0 |
It is very common for communication protocols to specify that a block of bits will be transmitted with a specific parity. If a block of data arrives at its intended destination with a parity other than the specified parity, it must be the case that at least one of the bits has been corrupted. A single parity bit is not sufficient to identify an even number of errored bits, nor is it sufficient to allow the receiver to correct an error. Hamming codes use multiple parity bits to allow for error correction.
Traditional Hamming codes are (7, 4) codes, encoding four bits of data into seven bit blocks (a Hamming code word). The extra three bits are parity bits. Each of the three parity bits are parity for three of the four data bits, and no two parity bits are for the same three data bits. All of the parity bits are even parity.
Example:
Given: data bits d_{1}, d_{2}, d_{3}, and
d_{4}
A (7, 4) Hamming code may define parity bits p_{1}, p_{2},
and p_{3} as
p_{1} = d_{2} + d_{3} + d_{4}
p_{2} = d_{1} + d_{3} + d_{4}
p_{3} = d_{1} + d_{2} + d_{4}
There's a fourth equation for a parity bit that may be used in Hamming
codes:
p_{4} = d_{1} + d_{2} + d_{3}
Valid Hamming codes may use any three of the above four parity bit definitions. Valid Hamming codes may also place the parity bits in any location within the block of 7 data and parity bits. Two Hamming codes with different parity bits or parity bits in a different bit position are considered equivalent. They will produce different results, but they are still Hamming codes.
One method for transforming four bits of data into a seven bit Hamming code word is to use a 4×7 generator matrix [G].
Define d to be the 1×4 vector [d_{1} d_{2} d_{3} d_{4}]
It's possible to create a 4×7 generator matrix [G] such that the product modulo 2 of d and [G] (d[G]) is the desired 1×7 Hamming code word. Here's how it's done:
Step 1. Represent each data bit with a column vector as
follows:
1 | |
d_{1} = | 0 |
0 | |
0 |
0 | |
d_{2} = | 1 |
0 | |
0 |
0 | |
d_{3} = | 0 |
1 | |
0 |
0 | |
d_{4} = | 0 |
0 | |
1 |
Step 2. Represent each parity bit with a column vector containing a 1 in the row corresponding to each data bit included in the computation and a zero in all other rows. Using the parity bit definitions from the example above:
0 | |
p_{1} = | 1 |
1 | |
1 |
1 | |
p_{2} = | 0 |
1 | |
1 |
1 | |
p_{3} = | 1 |
0 | |
1 |
Step 3. Create a generator matrix, [G], by arranging the column vectors from the previous steps into a 4×7 matrix such that the columns are ordered to match their corresponding bits in a code word.
To create a generator that produces code words with the bits ordered p_{1}, p_{2}, p_{3}, d_{1}, d_{2}, d_{3}, d_{4} (3 parity bits followed by 4 data bits) use the vectors from the previous steps and arrange them into the following columns [p_{1} p_{2} p_{3} d_{1} d_{2} d_{3} d_{4}]
The results are following 4×7 generator matrix:
p_{1} | p_{2} | p_{3} | d_{1} | d_{2} | d_{3} | d_{4} | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 0 | 0 | 0 | |
G = | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
1 | 1 | 0 | 0 | 0 | 1 | 0 | |
1 | 1 | 1 | 0 | 0 | 0 | 1 |
Arranging the columns in any other order will just change the positions of bits in the code word.
Example:
Encode the data value 1010 using the Hamming code defined by the matrix G
(above).
0 | 1 | 1 | 1 | 0 | 0 | 0 | ||||||
1 0 1 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | = | 1 0 1 1 0 1 0 | |||
1 | 1 | 0 | 0 | 0 | 1 | 0 | | | | | | | | | |||||
1 | 1 | 1 | 0 | 0 | 0 | 1 | | | | | | | +-->(1 × 0) + (0 × 0) + (1 × 0) + (0 × 1) | |||||
| | | | | +---->(1 × 0) + (0 × 0) + (1 × 1) + (0 × 0) | ||||||||||||
| | | | +------>(1 × 0) + (0 × 1) + (1 × 0) + (0 × 0) | ||||||||||||
| | | +-------->(1 × 1) + (0 × 0) + (1 × 0) + (0 × 0) | ||||||||||||
| | +---------->(1 × 1) + (0 × 1) + (1 × 0) + (0 × 1) | ||||||||||||
| +------------>(1 × 1) + (0 × 0) + (1 × 1) + (0 × 1) | ||||||||||||
+-------------->(1 × 0) + (0 × 1) + (1 × 1) + (0 × 1) |
So 1010 encodes to 1011010. Equivalent Hamming codes represented by different generator matrices will produce different results.
In a world without errors decoding a Hamming code word would be very easy. Just throw out the parity bits. The encoding example produced a 7 bit code word. Its parity bits are 101 and its data bits are 1010. If you receive a 1011010, just decode it as 1010. But what happens if you receive a code word with an error and one or more of the parity bits are wrong?
Suppose the Hamming code defined by the matrix G in the example above is being used and the code word 1011011 is received. How is that word decoded? The first step is to check the parity bits to determine if there is an error.
Arithmetically, parity may be checked as follows:
p_{1} = d_{2} + d_{3} + d_{4} = 0 + 1 + 1 = 0
p_{2} = d_{1} + d_{3} + d_{4} = 1 + 1 + 1 = 1
p_{3} = d_{1} + d_{2} + d_{4} = 1 + 0 + 1 = 0
In this case every parity bit is wrong. p_{1}, p_{2}, and p_{3} should have been 010, but we received 101.
Parity may also be validated using matrix operations. A 3×7 parity check matrix [H] may be constructed such that row 1 contains 1s in the position of the first parity bit and all of the data bits that are included in its parity calculation. Row 2 contains 1s in the position of the second parity bit and all of the data bits that are included in its parity calculation. Row 3 contains 1s in the position of the third parity bit and all of the data bits that are included in its parity calculation.
Example:
Using the code from example above, the matrix H may
be defined as follows:
1 | 0 | 0 | 0 | 1 | 1 | 1 | |
H = | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 | 0 | 1 |
Multiplying the 3×7 matrix [H] by a 7×1 matrix representing the encoded data produces a 3×1 matrix called the "syndrome". There are two useful proprieties of the syndrome. If the syndrome is all zeros, the encoded data is error free. If the syndrome has a non-zero value, flipping the encoded bit that is in the position of the column in [H] that matches the syndrome will result in a valid code word.
Example:
Using the parity check matrix from the example above
we can correct and verify the code word 1011011.
1 | |||||||||||||
0 | |||||||||||||
1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | (1 × 1) + (0 × 0) + (0 × 1) + (0 × 1) + (1 × 0) + (1 × 1) + (1 × 1) | 1 | ||||
0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | = | (0 × 0) + (1 × 0) + (0 × 1) + (1 × 1) + (0 × 0) + (1 × 1) + (1 × 1) | = | 1 | ||
0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | (0 × 1) + (0 × 0) + (1 × 1) + (1 × 1) + (1 × 0) + (0 × 1) + (1 × 1) | 1 | ||||
1 | |||||||||||||
1 |
A column of all 1s is not the column of all 0s, so there is a parity error. Looking back at the matrix [H], you will see that the seventh column is all 1s, so the seventh bit is the errored bit. Changing the seventh bit produces the code word 1011010.
1 | |||||||||||||
0 | |||||||||||||
1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | (1 × 1) + (0 × 0) + (0 × 1) + (0 × 1) + (1 × 0) + (1 × 1) + (1 × 0) | 0 | ||||
0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | = | (0 × 0) + (1 × 0) + (0 × 1) + (1 × 1) + (0 × 0) + (1 × 1) + (1 × 0) | = | 0 | ||
0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | (0 × 1) + (0 × 0) + (1 × 1) + (1 × 1) + (1 × 0) + (0 × 1) + (1 × 0) | 0 | ||||
1 | |||||||||||||
0 |
Sure enough 1011010 is a valid code word. As I stated at the top of this section remove the parity bits to get the encoded value. In this case 1011011 was likely transmitted as 1011010, which encodes 1010.
Hopefully you have a reasonable grasp of the actual encoding and decoding algorithm. Now comes the fun part where I discuss how I implemented it in my source code.
I actually provide two different encoding functions (matrix multiplication and table look up) and three different decoding functions (matrix multiplication, table look up, and packed table look up).
The trick to encoding with matrices is realizing that all the values that
you need to deal with can be packed into right justified bytes
(unsigned char
s). Pack the data to be encoded into a byte and
pack the columns of the generator matrix into another array
of unsigned char
s.
The i^{th} bit of a code word equals the data word times the i^{th} column of the generator matrix, which is the i^{th} entry in the array. Because of the way modulo 2 arithmetic works, this is just a bitwise ANDing of the two bytes followed by an XOR of each of the bits in the results.
Download a copy of my source to see how this is done.
I'm actually ashamed to write something here. Since a Hamming (7,4) code
only encodes 4 bits of data at a time, there are only 16 possible data
values to encode. All I did here was create a 16 byte array called
hammingCodes
and defined it such that
hammingCodes
[data value] = code word.
I imagine that every machine out there that will produce smaller and faster look up table code than matrix multiplication code. The only downside to using a lookup table is that it requires you to have computed every possible code word in advance, which isn't a big deal, because there are only 16 of them.
Decoding with matrices, is very much like encoding with matrices. The big difference being that you multiply a packed array where each index contains the bits in a row of the parity check array with a byte containing the received code word. This can be done with ANDing and XORing just like it can for encoding.
The result of the matrix multiplication is the "syndrome". If the syndrome is 0, the least significant 4 bits of the code word are the encoded data. If the "syndrome" is non-zero, toggle the bit in the postition matching the column of the parity check matrix that matches the "syndrome". (If the "syndrome" matches the 4^{th} column, toggle the 4^{th} bit.) I use an array with the "syndrome" as it's index to provide a mask to use for toggling the errored bit. After the error has been corrected, the least significant 4 bits of the code word are the encoded data.
Download a copy of my source to see how this is done.
Here is yet another item that I'm ashamed to write something about.
Since a Hamming (7,4) code only encodes to a 7 bit code word, there are only
128 possible received values to decode. All I did here was create a 128
byte array called hammingDecodeValues
and defined it such that
hammingDecodeValues
[code word] = data value.
Since Hamming (7,4) codes decode to 4 bit code words, you can save some
data space and pack two data values into a single byte. I also provide a
look up table called hammingPackedDecodeValues which packs two data values
into a single unsigned char
. The value of
hammingPackedDecodeValues
[code word / 2] is
a byte packed such that the 4 most significant bits are the data value of an
odd code word and the 4 least significant bits are the data values of an
even code word.
I imagine that every machine out there that will produce smaller and faster look up table code than matrix multiplication code. On most machines I would expect the code for dealing with packed look up tables to be smaller but slower than the code for dealing with unpacked look up tables. The only downside to using a lookup table is that it requires you to have computed the correction to apply to every possible received code word in advance.
The source I provide happens to be for the code in the encoding example. It's pretty trivial to edit the generator and parity check matrices for a different Hamming (7,4) code, just put all of the 1s and 0s where they belong for your code and you're in business. Once you have the generator and parity check matrices, the test application that I provide can write the look up tables. Instructions for generating the look up tables are included in the README file.
I am releasing my implementation of Hamming encoding and decoding under the GNU LGPL. The source code repository is available on GitHub. I recommend that you checkout the latest revision of the master branch, unless you're looking for something specific.
Repository Location | https://github.com/michaeldipperstein/hamming |
---|
If you have any further questions or comments, you may contact me by e-mail. My e-mail address is: mdipperstein@gmail.com
If you're looking for more information on, Hamming codes, your favorite web search engine should provide you with several options. I found that Wikipedia contained some good historical background and provides an alternate algorithm. The algorithm that I ended up implementing was strongly influenced by discussions in Error Correction with Hamming Codes.
HomeLast updated on December 27, 2018