Keeping a Secret … from the past (Part 1)

Introduction

Bob, Alice and Eve
Figure 1: Bob, Alice and Eve

The future of the Internet, especially in expanding the range of applications, involves a much deeper degree of privacy, and authentication. Without these the Internet cannot be properly used to replace existing applications such as in voting, finance, and so on. The future is thus towards data encryption which is the science of cryptographics, and provides a mechanism for two entities to communicate without any other entity being able to read their messages. In a secret communications system, Bob and Alice should be able to communicate securely, without Eve finding out the contents of their messages, or in keeping other details secure, such as their location, or the date that their messages are sent (see Figure 1).

The two main methods used are to either use a unique algorithm which both Bob and Alice know, and do not tell Eve, or they use a well-known algorithm, which Eve also knows, and use some special electronic key to uniquely define how the message is converted into cipertext, and back again. A particular problem in any type of encryption is the passing of the secret algorithm or the key in a secure way, as Bob or Alice does not know if Eve is listening to their communications. If Eve finds-out the algorithm or the key, neither Bob nor Alice is able to detect this. This chapter looks at some of the basic principles of encryption, including the usage of private-key and public-key methods. As we will find public and private key methods work together in perfect harmony, with, typically, private key methods providing in the actual core encryption, and public key methods providing ways to authenticate, and pass keys.

Simple cipher methods

Caesar code
Figure 2: Caesar code

One method of converting a message into cipher text is for Bob and Alice to agree on some sort of algorithm which Bob will use to scramble his message, and then Alice will do the opposite to unscramble the scrambled message. An example of this is the Caesar code, where it is agreed by Bob and Alice that the letters of the alphabet will be moved by a certain number of positions to the left or the right. It is named as the Caesar code as it was first documented by Julius Caesar who used a 3-letter shift.

In the example in Figure 2 the letters for the code have been moved forwards by two positions, thus a ‘c’ becomes an ‘A’, thus a coded message of ‘RFC’ is de-coded as ‘the’. There are several problems with this type of coding, though. The main one is that it is not very secure as there are only 25 unique codings, thus it would be easy for someone to find out the mapping. An improvement is to scramble up the mapping, such as in a code mapping (Figure 3), where a random mapping is used to deter the conversion. Why not try some examples:

http://www.asecuritysite.com/Challenges/shifting

Code mapping
Figure 3: Code mapping

As there are more mappings, it improves the security of the code (4.03×10^26 mappings), but it is still seen as being insecure as the probability of the letter in the mapped code is typically a pointer to the mapping. For the code in Figure 3, an ‘A’ appears most often, thus it is likely to be an ‘e’, which is the most probably letter in written English. Next ‘Q’ appears four times, thus this could be a ‘t’, which is the next most probable. A more formal analysis of the probabilities is given in Table 1, where the letter ‘e’ is the most probable, followed by ‘t’, and then ‘o’, and so on. It is also possible to look at two-letter occurrences (digrams), or at three-letter occurrences (trigrams), or even with words, where ‘the’ is the most common word.

Probability of occurrences
Probability of occurrences

A code mapping encryption scheme is easy to implement, but, unfortunately, once it has been ‘cracked’, it is easy to decrypt the encrypted data. Normally this type of cipher is implemented with an extra parameter which changes its mapping, such as changing the code mapping over time depending on the time-of-day and/or date. Thus parties which are allowed to decrypt the message know the mappings of the code for a given time and/or date. For example, each day of the week could have a different code mapping.

Vigenère cipher

An improved code was developed by Vigenère, where a different row is used for each character cipher, and is polyalphabetic cipher as it uses a number of cipher alphabets. Then the way that the user moves between the rows must be agreed before encryption. This can be achieved with a code word, which defines the sequence of the rows. For example the codeword GREEN could be used which defines that the rows used are: Row 6 (G), Row 17 (R), Row 4 (E), Row 4 (E), Row 13 (N), Row 6 (G), Row 17 (R), and so on (see Table 2). Thus the message is converted as:

Keyword            GREENGREENGREE
Plaintext          hellohowareyou
Ciphertext         NVPPBNFAEEKPSY

The great advantage of this type of code is that the same plaintext character will be coded with different values, depending on the position of the keyword. For ex-ample, for a keyword is GREEN, ‘e’ can be encrypted as ‘K’ (for G), ‘V’ (for R), ‘I’ (for E) and ‘R’ (for N). To improve security, the greater the size of the code word, the more the rows that can be included in the encryption process. Also, it is not possible to decipher the code by simple frequency analysis, as letters will change their coding depending on the current position of the keyword. It is also safe from analysis of common two- and three-letter occurrences, if the keysize is relatively long. For example ‘ee’ could be encrypted with ‘KV’ (for GR), ‘VI’ (for RE), ‘II’ (for EE), ‘IR’ (for EN) and ‘RK’ (for NG).

Why not try a few examples at:

http://www.asecuritysite.com/Challenges/vig

Table 2:

Plain a b c d e f g h i j k l m n o p q r s t u v w x y z 
1     b c d e f g h i j k l m n o p q r s t u v w x y z a
2     c d e f g h i j k l m n o p q r s t u v w x y z a b
3     d e f g h i j k l m n o p q r s t u v w x y z a b c
4     e f g h i j k l m n o p q r s t u v w x y z a b c d 
5     f g h i j k l m n o p q r s t u v w x y z a b c d e
6     g h i j k l m n o p q r s t u v w x y z a b c d e f
7     h i j k l m n o p q r s t u v w x y z a b c d e f g
8     i j k l m n o p q r s t u v w x y z a b c d e f g h
9     j k l m n o p q r s t u v w x y z a b c d e f g h i
10    k l m n o p q r s t u v w x y z a b c d e f g h i j
11    l m n o p q r s t u v w x y z a b c d e f g h i j k
12    m n o p q r s t u v w x y z a b c d e f g h i j k l
13    n o p q r s t u v w x y z a b c d e f g h i j k l m
14    o p q r s t u v w x y z a b c d e f g h i j k l m n
15    p q r s t u v w x y z a b c d e f g h i j k l m n o
16    q r s t u v w x y z a b c d e f g h i j k l m n o p
17    r s t u v w x y z a b c d e f g h i j k l m n o p q
18    s t u v w x y z a b c d e f g h i j k l m n o p q r
19    t u v w x y z a b c d e f g h i j k l m n o p q r s
20    u v w x y z a b c d e f g h i j k l m n o p q r s t
21    v w x y z a b c d e f g h i j k l m n o p q r s t u
22    w x y z a b c d e f g h i j k l m n o p q r s t u v
23    x y z a b c d e f g h i j k l m n o p q r s t u v w
24    y z a b c d e f g h i j k l m n o p q r s t u v w x
25    z a b c d e f g h i j k l m n o p q r s t u v w x y

Homophonic substitution code

A homophonic substitution code overcomes the problems of frequency analysis of code, as it assigns a number of codes to a character which relates to the probability of the characters. For example the character ‘e’ might have 12 codes assigned to it, but ‘z’ would only have one. An example code is given in Table 3. With this, each of the codes is assigned at random for each of the letters, with the number of codes assigned relating to the probability of their occurrence. Thus, using the code table in Table 3, the code mapping would be:

Plaintext    h e  l  l  o  e  v  e  r  y  o  n  e
Ciphertext: 19 25 42 81 16 26 22 28 04 55 30 00 32
Example homophonic substitution
Table 3: Example homophonic substitution

In this case there are four occurrences of the letter ‘e’, and each one has a different code. As the number of codes depends on the number of occurrences of the letter, each code will roughly have the same probability, thus it is not possible to determine the code mapping from the probabilities of codes. Unfortunately the code is not perfect as the English language still contains certain relationships which can be traced. For example the letter ‘q’ normally is represented by a single code, and there are three codes representing a ‘u’. Thus, if the ciphertext contains a code followed by one of three codes, then it is likely that the plaintext is a ‘q’ and a ‘u’.

A homophonic cipher is a monoalphabetic code, as it only uses one translation for the code mappings (even though several codes can be used for a single plaintext letter). This type of alphabet remains constant, whereas a polyalphabet can change its mapping depending on a variable keyword.

Try and example at:

http://www.asecuritysite.com/Challenges/ho

One thought on “Keeping a Secret … from the past (Part 1)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s