Hash functions
Hash functions use compression algorithms to create fixed-length digests for any arbitrarily long strings. The digests are also called ‘fingerprints’ or ‘signatures’ [@wagnerWhatSHA2562020]. The outputs are unique (1-2-1 relationship) for every different input text (collision resistance) and are mathematically irreversible, meaning that one should never be able to find an input that hashes to the same digest of another known input (second pre-image resistance). In addition, it is impossible to guess the original text from a given digest (pre-image resistance). If you change a single letter in the entire ‘Lord of the Rings series, the digest for the books will be drastically different. This is a property of good hash functions - called the avalanche effect.
Hash functions are often used for digital signatures, MACs and data integrity services, such as P2P file sharing and verification, virus fingerprinting, Merkle trees and distributed hash tables [@bashirMasteringBlockchainInner2023]. They are essential in constructing other cryptographic primitives. Another well-known use case for hash functions is to store passwords (one-way hash). However, the requirements are very different for the two use case types. The former requires the algorithm to be fast so large amounts of text can be processed instantaneously. The latter needs the algorithms to be slow enough to deter brutal password-guessing attacks.
SHA-2
As of 2023, SHA-2 is the internet’s most commonly used hash function, providing four different versions that produce 224 (SHA-224), 256 (SHA-256), 384 (SHA-384) and 512 (SHA-512) byte outputs. Older generations of hash functions, such as MD (MD4, MD5) and SHA1, have been badly broken and should not be used anymore.
Other popular hash algorithms include RIPEMD and Whirlpool.
How Does SHA-2 Work?
First, the input message is divided into equal-sized blocks. The blocks are then compressed in multiple rounds on a block-by-block basis to produce a compressed output. This mechanism is known as the Merkle-Damgard Construction. For example, SHA-256 has an input message size limit of . with a block size of 512 bits, a word limit of 32 bits and an output of 256 bits.
SHA-2 uses the Davies-Meyer construction as its compression function.
The algorithm works as follows:
- Padding the message to multiples of 512 bits if it is shorter than the required block size or cannot be divided equally
- Divide the message into blocks of 512 bits
- Setting up the initial hash value. It consists of eight 32-bit words obtained by taking the first 32 bits of the fractional parts of the square roots of the first eight prime numbers.
- Iterately apply the compression function to the message blocks, using the output from the previous compression function call as the second argument to the compression function for the next block
You can use openssl
command line tool to see SHA-2 in action:
SHA-3
SHA-3 (a revised version of Keccak) is the latest standard in the SHA family. SHA 3 uses a fundamentally different structure (Sponge construction) from SHA-2. Unlike SHA-2, it is not susceptible to length-extension attacks and is suitable for hashing secrets. It is the current recommended Hash Function from NIST.
References
- Bashir, I. (2023) Mastering Blockchain: Inner workings of blockchain, from cryptography and decentralized identities, to DeFi, NFTs and Web3. Packt Publishing Ltd.
- Wagner, L. (2020) What is SHA-256? https://blog.boot.dev/cryptography/how-sha-2-works-step-by-step-sha-256/.