MD5 Definition
The MD5 hashing algorithm is a one-way cryptographic function that accepts a message of any length as input and returns as output a fixed-length digest value to be used for authenticating the original message.
The MD5 hash function was originally designed for use as a secure cryptographic hash algorithm for authenticating digital signatures. MD5 has been deprecated for uses other than as a non-cryptographic checksum to verify data integrity and detect unintentional data corruption.
Although originally designed as a cryptographic message authentication code algorithm for use on the internet, MD5 hashing is no longer considered reliable for use as a cryptographic checksum because researchers have demonstrated techniques capable of easily generating MD5 collisions on commercial off-the-shelf computers.
Ronald Rivest, founder of RSA Data Security and institute professor at MIT, designed MD5 as an improvement to a prior message digest algorithm, MD4. Describing it in Internet Engineering Task Force RFC 1321, “The MD5 Message-Digest Algorithm,” he wrote:
The algorithm takes as input a message of arbitrary length and produces as output a 128-bit ‘fingerprint’ or ‘message digest’ of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given pre-specified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be ‘compressed’ in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.
The IETF suggests MD5 hashing can still be used for integrity protection, noting “Where the MD5 checksum is used inline with the protocol solely to protect against errors, an MD5 checksum is still an acceptable use.” However, it added that “any application and protocol that employs MD5 for any purpose needs to clearly state the expected security services from their use of MD5.”
Message digest algorithm characteristics
Message digests, also known as hash functions, are one-way functions; they accept a message of any size as input, and produce as output a fixed-length message digest.
MD5 is the third message digest algorithm created by Rivest. All three (the others are MD2and MD4) have similar structures, but MD2 was optimized for 8-bit machines, in comparison with the two later formulas, which are optimized for 32-bit machines. The MD5 algorithm is an extension of MD4, which the critical review found to be fast, but possibly not absolutely secure. In comparison, MD5 is not quite as fast as the MD4 algorithm, but offered much more assurance of data security.
How MD5 works
The MD5 message digest hashing algorithm processes data in 512-bit blocks, broken down into 16 words composed of 32 bits each. The output from MD5 is a 128-bit message digest value.
Computation of the MD5 digest value is performed in separate stages that process each 512-bit block of data along with the value computed in the preceding stage. The first stage begins with the message digest values initialized using consecutive hexadecimal numerical values. Each stage includes four message digest passes which manipulate values in the current data block and values processed from the previous block. The final value computed from the last block becomes the MD5 digest for that block.
MD5 security
The goal of any message digest function is to produce digests that appear to be random. To be considered cryptographically secure, the hash function should meet two requirements: first, that it is impossible for an attacker to generate a message matching a specific hash value; and second, that it is impossible for an attacker to create two messages that produce the same hash value.
MD5 hashes are no longer considered cryptographically secure, and they should not be used for cryptographic authentication.
In 2011, the IETF published RFC 6151, “Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 Algorithms,” which cited a number of recent attacks against MD5 hashes, especially one that generated hash collisions in a minute or less on a standard notebook and another that could generate a collision in as little as 10 seconds on a 2.66 GHz Pentium 4 system. As a result, the IETF suggested that new protocol designs should not use MD5 at all, and that the recent research attacks against the algorithm “have provided sufficient reason to eliminate MD5 usage in applications where collision resistance is required such as digital signatures.”