Breaking the Code
Breaking the Code
Before I begin, I would like to mention that HBH does not allow formatting for articles. I recommend that you read the PDF version instead, which is available at: http://xxx/articles/md5_collisions/article.pdf
The Message Digest 5 algorithm, more widely known as MD5, was first developed by RSA Laboratories around 1991. The purpose was to ‘Fingerprint', or give a signature to documents and other extended texts, this would aid in the verification of those documents and could be used in court to prove a document was authentic (or fake) because the fingerprint matched (or didn't) when the document was encrypted using the MD5 algorithm.
However, this encryption method ultimately became most popular in the encryption of users' passwords on various websites. Since the encryption was intended for lengthy documents and to serve only as a fingerprint, it was developed so that encrypting certain text would always return the same result. This was sufficient for long documents, but when the encryption was used for user passwords that were usually between 4 and 8 characters long, it began to pose a problem and became a threat to many websites that contained sensitive data.
Currently, the only known way to ‘reverse', or break an MD5 is to form a collision. This is when a number of possible passwords are encrypted and checked against the MD5 hash being attacked eventually resulting in some plaintext that when encrypted matches the MD5 hash. The two most popular ways of doing this are by brute-force, which is trying every possible combination of a string with a given length and charset (creating permutations of the string), and by a dictionary attack, where a list of words, usually stored in a text file, is checked against the MD5 hash at hand.
For the purpose of this article, the hashes ‘098f6bcd4621d373cade4e832627b4f6' and ‘5f4dcc3b5aa765d61d8327deb882cf99' will be used. A MD5 hash will always be 32 characters in length and only consists of the characters (charset) a-f and 0-9 (hex).
The first method mentioned, brute-force, could easily be used on the first hash. The first thing we need to do would be to determine a charset and possible lengths which we believe the plaintext could be. The most common charset is alphanumeric, or a-z and 0-9, while the most common lengths of the plaintext are 1-8 (we will use 1-4 since the plaintext is known to me). Next, we will need to develop a program (using pre-made programs without understanding them is just cheating yourself.). The easiest language [for me] to write the program in would be PHP, it is somewhat slower than C++ but you don't need to compile any libraries for the MD5 Algorithm since it already comes as a pre-installed function. Here is an application that I have already coded, feel free to use it and modify it, as long as you don't remove the copyright and use it for personal use only:
Original Version:
http://xxx.com/accounts/hbh/articles/md5_collisions/permutate.php
Original, but Commented; more for beginners:
http://xxx.com/accounts/hbh/articles/md5_collisions/permutate_explained.php
I recommend creating a batch file to run the script since it will display the text in real-time, however using a browser will work fine. This attack should reveal that the first hash resolves to the plaintext ‘test', however the second one is still unknown so we will try a dictionary attack.
The number of possible combinations during a brute-force attack can be shown with the equation:
sum = 0; for(i = m; i <= n; i++) sum += pow(c, i);
(actual equation shown in PDF file)
Where ‘c' is the length of the charset (ex.: a-z 0-9 would be 36), ‘m' is the minimum length to brute-force, and ‘n' is the maximum length to brute-force. For example, a charset with the length of 36 and brute forcing from 1 character to 4 characters would be written as: (36^1)+(36^2)+(36^3)+(36^4) = 1,727,604 possible combinations
A brute-force attack creates permutations of a string with a given charset and length, and will look something like this: ID Plaintext MD5 Checksum (Hash) 1 aaaa 74b87337454200d4d33f80c4663dc5e5 2 aaab 4c189b020ceb022e0ecc42482802e2b8 3 aaac 3963a2ba65ac8eb1c6e2140460031925 4 aaad aa836f154f3bf01eed8df286a1fbb388 5 aaae 5f83cfb6ca6b50d3323a6377e1241b1f 6 aaaf 14acbed198e2456a400321cd9b065ce3 7 aaag 3e748489ace6f479f60ef30370220312 8 aaah d5096c66e84d7126564012fb76dbb47c 9 aaai 7cf9ef6f440fb8c7a377a60784d56156 10 aaaj bde5d54113c8ea8c6914f5aaee97fdf4
For a dictionary attack, you will need a wordlist containing a list of plaintext strings, this is usually a list common passwords. You will also need a method to read the wordlist and check the strings in the wordlist against the MD5 hash(es) you're trying to find a collision for.
Here is an example wordlist:
http://xxx.com/accounts/hbh/articles/md5_collisions/permutate.php
And here is some example code written in PHP to read the wordlist:
http://xxx.com/accounts/hbh/articles/md5_collisions/dictionary.php
Using a wordlist is much faster than brute force since there is a set number of hashes possible to try, therefore I would recommend attempting a dictionary attack before a brute-force attack. The larger the wordlist is, the higher the chance of finding the plaintext is, but that also means the attack will take longer.
If you like, you may try adding alterations to the words in the dictionary, this will result in a higher success rate for finding collisions. This would be doing things such as adding suffixes to the string such as 0 through 100, turning the word into ‘l33t speak', capitalizing the word (MD5 encrypts case-sensitive), and reversing the string.
Using the script and wordlist above will reveal that the second hash that couldn't be brute forced resolves to ‘password'.
A dictionary attack reads a prepared list of strings and checks the MD5 hash against a given list, it will look something like this: ID Plaintext MD5 Checksum (Hash) 1 abbonerei 418d9dce69dbdcec3e2a09d38e3e9709 2 abboneremo bee24629ee0ef1930021bb0eea510cb1 3 abbonerete a7228dad3682ea9a75b7525b01ba6e3b 4 abbonero 594bd7832c0e86658249188fa1cd9717 5 abboni 867a268217a878a50bc06f19e2782394
In summation, finding collisions for MD5 hashes has become quite easy and perhaps it is time for a new form of encryption for user passwords. I hope that this article has enlightened you on how the plaintext of MD5 hashes can be found and that you will think twice before signing up for websites with passwords that can easily be found in a dictionary or brute-forced.
- P.S.: I have created a form of a dictionary attack (actually it's in the form of an n-ary tree, but very similar to a dictionary attack) that you may use if you wish, simply instant message ‘MD5 Library' on AIM with a hash.
ghost 18 years ago
I thought about it, but decided not to. If you have any questions about the bot, just ask me directly.