Welcome to HBH! If you have tried to register and didn't get a verification email, please using the following link to resend the verification email.

Cryptanalysis Case Study: The One Time Pad


Cryptanalysis Case Study: The One Time Pad

By ghostghost | 11998 Reads |
0     0

Introduction: Hello, and welcome to the second of my articles on cryptanalysis. In this article, we will be analyzing the 'one time pad' encryption scheme from several attack vectors.

Expectations: If you are not yet familiar with the basic concepts and terminology of cryptanalysis, you can check out my introductory article. The code in this article will be written in python, but people skilled in any programming language should be able to follow along. The specific OTP we'll be analyzing is in the code bank, under DC's cryptanalysis challenge 2.

The One Time Pad: First, lets take a look at the basic idea of a one time pad (or OTP). The One Time Pad encryption algorithm combines plaintext and a key, of equal length, to produce ciphertext of the same length. The combination is most commonly done through modular addition, which I'll explain later. When using a OTP securely, the key must be completely random, and used ONLY once. IF these conditions are met, the pad can be PROVEN completely UNBREAKABLE.

How it works: In almost every form of encryption, the first step is turning the data into a number that we can work with. In python (and thus this example) we do so using the ord() function. ord() takes a single character, and returns it's ascii code. ord('a') returns 97, ord('f') returns 102. We can use the chr() function to reverse this. chr(97) returns 'a', chr(98) returns 'b' and so on. In OTP encryption we do this to each character in the plaintext and key, add them together, and then turn the resulting number back into a character. But wait, the ascii character system only goes up to 256. What if the number we get is bigger than that? That's where modular addition comes into play.

Modular Addition: It might sound kind of weird, but modular addition is something we all do every day. All it is is regular addition where numbers "wrap around" after they reach a certain value, called the modulus. When you add or subtract hours, you're actually doing it 'with a modulus of 12' since the numbers wrap around after passing 12. So, in our system we add the numbers mod 256. that way if our result is outside the ascii character range, it will simply wrap around.

The Theory: Here's where we trade in concepts for symbols. Lets say we have a message A and a key K. The nth digit of A is represented by A[n], and I will use character numbers (97) interchangeably with characters 'a' . The ciphertext X is defined: X[n]=(A[n]+K[n]) mod 256. This is OTP encryption. Decryption is the reverse. Plaintext A is defined: A[n]=(X[n]-K[n]) mod 256

In Practice: In the python script you can get from the code bank, I define two functions that do exactly this. The encrypt() function takes two strings, a plaintext and a key, and returns a ciphertext encoded in base64. The decrypt() function takes a base 64 encoded string as the ciphertext and a string as a key, and returns the plaintext as a string. You can test them out by running the script.

The security: Earlier, I said that if certain conditions are met, we can prove that the OTP is uncrackable. Those conditions are that the key is completely random, and is never repeated. Somebody trying to attack this system would find it impossible to brute force. This is because any number of possible plaintexts can be generated using random keys to decrypt the same ciphertext. For example the message 'nothing to fear' encrypted using one key can be decrypted using a different key to read 'be very afraid!' In a perfectly implemented OTP, there's no way to know which is correct.

The cryptanalysis: Now as long as the key is random, and only used once, we have no chance at all. But, lets assume that we know that the same key (K) has been used for two different messages, A and B, to produce ciphertexts X and Y. How does that help us? Recall that X[n]=(A[n]+K[n]) mod 256 and lets say Y[n]=(B)[n]+K[n]) mod 256 Since we're doing the same thing to both A and B we know that the ciphertexts (X and Y) share the same relationship as A and B. That is, A is to B as X is to Y. Observe: X[n] = (A[n]+K[n]) Y[n] = (B)[n]+K[n]) X[n]-Y[n] = (A[n]+K[n]) - (B[n]+K[n]) X[n]-Y[n] = A[n] + K[n] - B[n] - K[n] the two K[n]'s cancel eachother out, so: X[n]-Y[n] = A[n] - B[n] That is, the difference between X and Y is the same as the difference between A and B. We'll call this, the difference, D[n]

In practice: OK, now it's time to take a look at the two ciphertexts we'll be analyzing in this article. Both have been encrypted using the same repeating enlgish phrase as a key.

The first is:yNDK4dTmlM7U3YXt1dPK4suP2dvTlNnX0dmI3eKJ1+OI2dPO7KKIgZSEj4WTjpSFjqi3 and the second is:vM3b2Inq2YjG3cqZ4tfYoISPhZPT4tnT1ujJ0uGJ6OeWhYuFmZSIgb/Z4dmTsePHz83i they're both encoded in base64, so decode them and put them and put them in a variable, like so: import base64 strCiphertextONE= base64.b64.decode("yNDK4dTmlM7U3YXt1dPK4suP2dvTlNnX0dmI3eKJ1+OI2dPO7KKIgZSEj4WTjpSFjqi3") strCiphertextTWO= base64.b64.decode("vM3b2Inq2YjG3cqZ4tfYoISPhZPT4tnT1ujJ0uGJ6OeWhYuFmZSIgb/Z4dmTsePHz83i") Good. Now that we have the ciphertexts in plain ascii, we can find the differences between them. lstDiff=[] for n in range(len(strCiphertextONE): [tab]lstDiff.append((ord(strCiphertextONE[n]-ord(strCiphertextTWO[n]))) Good, now lstDiff is a list of integers representing the differences between the two ciphertexts.

And that gets us…? Nowhere, by itself. But remember what we were saying about brute force attacks before? They're impossible on a one time pad, because we have no way of recognizing the correct plaintext when we see it. But, now, by knowing the difference between the two original plaintexts (because of the key re-use), we can start making guesses.

Known plaintext attack. A known plaintext attack is a scenario where an attacker, knows (or guesses) that a certain word was part of the original plaintext. What does this have to do with our attack? Simple. For any A[n] that we guess, we know that B[n] must equal A[n]+D[n]. So if we think that A='Nobody suspects the inquisition', and the corresponding B='(^&VRWG(@#&hj&(^65409283(&?><"""{}}}}}}', we can be reasonably sure our guess about A is wrong. The bit of plaintext we guess or know is called a crib.

In practice: In our example, we're going to guess that the the original plaintext B is all spaces. Probably not correct, but spaces are common, so we should find some hits. strCribONE='' for i in range(len(lstDiff)): [tab]print i [tab]strCribONE=strCribONE+ chr((ord(' ')+lstDiff[i])%256) strCribONE is mostly giberish, because most of our guess (all spaces) was wrong. However, if you print out strCribONE, you'll see little bits of enlgish words in there. This is where we guessed correctly, and we guess that the corresponding plaintext is correct. We were lucky here, there's lots of spaces. If there weren't we might have to go through a wordlist as possible cribs.

Interchangeable: Remember earlier we said that X[n]=A[n]+K[n]. That is, that the ciphertext equals the key plus the plaintext. Well the process works both ways, so K[n]=X[n]-A[n]. Since we have X to begin with, and we just started making guesses about A (strCribONE), lets see what we can learn about K.

The practice: In practice, we can actually use decrypt() to look for the key. Since encrypt('bar','foo')==encrypt('foo','bar') we can actually decrypt the ciphertext with our plaintext crib, and get some of the key. Remember to encode the ciphertext first. decrypt(base64.b64encode(strCiphertextONE),strCribONE) results in mostly junk. But wait, look there, in the same spot as our crib. We have what looks like a bit of english. Remember that the key was an english phrase? This must be part of it, so copy and paste it into strKeyGuess without the giberish arround it. Ok, so we know what part of the key is, and it's contained in strKeyGuess. But we don't know how long the key is, or where it fits with the ciphertext. We can solve this problem by trying every possible length. for i in range(len(strCiphertextONE)): [tab]strKeyGuess=' '+strKeyGuess [tab]print [tab]print i [tab]print strKeyGuess [tab]print decrypt(base64.b64encode(strCiphertextONE),strKeyGuess) [tab]print decrypt(base64.b64encode(strCiphertextTWO),strKeyGuess)

The output from this should give you some idea where the key fits, and if you look at different numbers that fit, you should be able to figure out the key length. This will also give you some plaintext, which you can use as a crib to continue the attack.

Finishing up: The process simply repeats itself from here, and since it can branch off depending on the cribs you recognize, I won't take it any farther. I have faith in your ability to finish it though. Use cribs to determine more key, use the key to determine more crib. You'll probably want to start making the cribs by hand, filling them with plaintext you where know, and spaces where you don't; be sure to keep the spacing exactly correct. If you hit a brick wall, try reconsidering the assumptions you've made so far. Guessing the wrong crib can send you down a wild goose chase. If you need any help, please don't hesitate to ask.

In Conclusion: I hope this will be a helpful introduction to the black art of cryptanalysis. As you can see, it requires both logic and intuition, both computer processing and human pattern recognition, and a little bit of luck to top it off. My next article will probably be another case study. I'm leaning towards WEP, since it's far more modern, but it may not be released for a while.

Digital Chameleon

Comments
ghost's avatar
ghost 17 years ago

Gets an awesome from me :)

ghost's avatar
ghost 16 years ago

Wow can't believe this hasn't had more recognition…This is a great article. I have been instructed in many encryption types ranging from sub cyphers to Enigma…however I never truly appreciated the OTP. Thanks for the enlightenment!

ghost's avatar
ghost 16 years ago

Nice work.