checksum

Checksum processes are used to verify data integrity and authenticity. A value is generated via mathematical manipulation of the binary values of a block of data. Initially, that "checksum" value is stored with the data itself. When the data block is transmitted or retrieved from storage, the mathematical process is repeated. If the computation result matches the checksum stored with the data, it is assumed there has been no integrity loss.

In the simplest form, a "check digit" is added to verify the integrity of a single number. For example, Universal Product Code (UPC) numbers contain a 12th checksum digit to verify the integrity of the 11 digits that identify the manufacturer and the item itself. A comprehensive checksum process will extend to every segment (block) of a data file, and every data file in the system.

All checksum processes can detect a change in a single bit (single bit error), since that inevitably changes the mathematical result. Some checksum methods can detect multiple bit changes. (Multiple bit errors can offset each other with some mathematical algorithms, resulting in a positive verdict on data integrity when that is not the case.)

Checksum processes are sometimes referred to as "hash functions" -- because they take a long string of characters -- up to the contents of an entire file or message -- and "hash" out via the selected computational method a fixed length string as output. That output can be called a checksum, but is also sometimes termed a message digest or a digital fingerprint.

Last modified: 11-May-2005 [RC]

 
 

   © 2002-2006 Contributing authors and University of Miami School of Medicine