Introduction to Cryptography & TLS

 

What is cryptography? 

Cryptography is the science of encrypting and decrypting data. It is a method used to provide information security and protects the confidentiality of information. Through the use cryptography, data is stored in such a form that only the intended recipient can read and understand it. It is also used for user authentication.

Cryptography is indispensable in every aspect of our daily life. Some of the innumerable uses of cryptography are listed below:  

  • Cash withdrawal from ATMs 
  • Messaging apps like Whatsapp, Telegram, Instagram
  • HTTPS
  • Emailing
  • Online banking

 

Goals of Modern Cryptography

Modern cryptography has 4 primary goals:

  1. Confidentiality
  2. Integrity
  3. Authentication
  4. Non-repudiation 


Confidentiality

It means that no one other than the intended recipients should be able to read the message/information.

Integrity

It verifies that the information received has not been tampered with or modified in transit between the sender and the receiver.

Authentication

It is the process of verifying the identity of a person or device.

Non-repudiation 

It ensures that the sender cannot deny being the sender of the message.

 

Cryptography terminology 

  • Plaintext: unencrypted, readable data that is inputted to an encryption algorithm.

  • Ciphertext: unreadable encrypted data which is the result of an encryption algorithm. 

  • Encryption: the process whereby plaintext is converted to ciphertext. 

  • Decryption: the process whereby ciphertext is converted back to plaintext.  

  • Cryptographic algorithm/cipher: the series of operations that is used to encrypt or decrypt data. 
  • Key: a value applied to data using a cryptographic algorithm to encrypt or decrypt it.
  • Key space: the total number of possible keys in a cipher. 
  • Cryptanalysis: the process of decoding encrypted messages without knowing the key.



Classic encryption methods

Caesar cipher 

The Caesar cipher is an algorithm that creates ciphertext by shifting each letter of the alphabet by n letters to the right. Letters at the end just wrap around to the beginning. n is the key.

         

Caesar cipher with shift 2

 

 

The ciphertext can be decrypted by applying the same number of shifts (n) in the opposite direction. 

The key space is N-1 where N is the number of letters in alphabet.  For instance, if the English alphabet (26 letters) is used, the key space is 25 since there is only 25 ways of shifting a letter.

The Caesar cipher is insecure as the key space is limited by the number of letters in the alphabet making it relatively easy to guess the key used.

 

Substitution cipher

Substitution cipher works by replacing each letter of the alphabet with another letter. The rule which maps each letter to another one is called the conversion rule. The key is the conversion rule. 

Note that the Caesar cipher is a type of substitution cipher.

 

The key space is N! (N factorial) where N is the number of letters in the alphabet. 

'A' can be replaced with any of the 26 letters, 

'B' with the 25 remaining ones, 

'C' with the 24 remaining ones and so on. 

Key space= 26 x 25 x 24 x ... x 1 = 26!

 

Polyalphabetic cipher 

The plaintext is divided into blocks of fixed size of n letters and the number of letters by which each letter shifts depends on its position in the block (the conversion rule). The key is the conversion rule.

Vigenere algorithm is an example of polyalphabetic cipher.

Let's encrypt the word VIGENERE using the vigenere algorithm with the key CAT.

Each letter is represented by a value from 0 to 25 as shown below.


 

With the key CAT, the conversion rule is as follows: 

The first letter of the block moves by 2 letters (C), 

the second letter moves by 0 letter (A) 

and the third letter moves by 19 letters (T).

 

A quick way of finding the ciphertext is through modular addition. For instance the letter V has position 21 and should be shifted by 2. 

(21+2 )(mod 26) = 23 (mod 26) = 23 = X 

 

The key space is 26n  where n is the number of letters in a block. 

Each letter in the block can be replaced with any of the 26 letters. If a block has size of 3 letters, the key space would be 26 × 26 × 26 = 263

 

Transposition cipher 

The plaintext is divided into fixed size blocks of n letters and then the sequence of letters in each block is changed according to a regular system (substitution rule). The ciphertext is basically a permutation of the plaintext. The key is the substitution rule. 

 

An example of row transposition is given below. 

The word TRANSPOSITION is encrypted using the key 43512. Two extra padding characters 'X' is added at the end so that it meets the required block size.


The ciphertext is NASTRISTPOXNXIO. 

 The key space is n! where n is the number of letters in a group.

For example if n=3,

The first letter of the block can be replaced with any of the 3 letters of the block, 

the second letter with any of the 2 remaining ones. 

and the third letter with the last remaining one.  

Key space= 3 × 2 × 1 = 3!

 

 

Note: Classical encryption methods are no longer used because they are not secure.

Usually a classical encryption can be broken when these 3 conditions are met: 

  1. When you understand the algorithm.
  2. When there is data about statistical properties of the encrypted plaintext. For example the frequency of words and letters.
  3. When you have a large number of encrypted example sentences.

 

Secure ciphers

1.  Fail-safe (unconditionally secure) ciphers

Fail-safe ciphers, also known as unconditionally secure ciphers, are theoretically unbreakable even if the adversary has infinite computational resources and time.

For example vernam cipher with one time pad. 

2.  Computationally secure ciphers

Computationally secure ciphers can be broken but they are computationally infeasible because it's time and labor intensive. Moreover it is not worth trying since the time required exceeds the period over which the information is useful and the cost of doing so exceeds the value of information. These are what we use today.


Vernam cipher

The key is a string of randomly generated numbers that is as long as the plaintext and is only used once. It cannot be decrypted without knowledge of the key making it perfectly secure. 

However, key distribution to all communicating parties ahead of time is an issue. Moreover, it is unsuitable for long messages as it would require a key which is as long as the message. Encrypting and decrypting such long texts consumes a large amount of computational resources and time.





1.  Each letter is converted to its numerical value.

 


 

2.  It is then added to the key.



 

 

3.  The modulo 26 of each result is computed. This is done to have only values from 0 to 25.


 



4.  Finally each value is encoded back to its respective letter.


 

 

 




 

Symmetric-key algorithm 

It is a type of algorithm that uses the same key for encryption and decryption. It is also called the shared key or the secret key algorithm since the both the sender and recipient share the same key which is unknown to anyone else.

Benefits:

  •  It is fast and efficient for large amounts of data.

Disadvantages

  • Distribution of the shared key to all communicating parties ahead of time is an issue.

  • It is unsuitable for communication in large groups as a large number of keys would be needed ( nC2 where n is the number of people involved in the communication). A key is needed for each possible pair of people. 

 

There are two main types of symmetric-key algorithm: stream cipher and block cipher.

Stream cipher

In stream cipher, the key is a long pseudorandom sequence of numbers. Encryption or decryption is performed by XORing the key and the text. The text is encoded/decoded a bit at a time.

Examples of stream ciphers: ChaCha, RC4, SEAL 

Note that if P ⊕ K = C, then C ⊕ K = P. ( P ⊕ C = K is equally true). The ciphertext is obtained by XORing the plaintext and the key. To get back the plaintext, the ciphertext is XORed with key.

 

Block cipher

Block cipher splits text into fixed-size blocks and encodes the plaintext one block at a time. 

Examples of block ciphers: Data Encryption Standard (DES), Advanced Encryption Standard (AES)

Two well-known cipher modes of operation are Electronic Code Book (ECB) and Cipher Block Chaining (CBC).

Electronic Code Book (ECB) 

Each block is treated, encrypted and decrypted individually. Consequently, identical plaintext blocks have identical ciphertext block, providing a clue about how to crack them making them insecure.

CBC (Cipher Block Chaining)

After the plaintext is divided into blocks, the first block is XORed with an initialization vector which is known to everyone and then the result is encrypted using a block cipher with a key. 

After that, each plaintext block is XORed with the cipherblock preceding it and the result is encrypted using the same block cipher and key. The general equation is given below:

Xi = Ci-1 ⊕ Pi, where Xi is the block to be fed to the block cipher algorithm (for example DES), Ci-1 is the previously encrypted block (ciphertext block), and Pi is the current plaintext block.    

The outputs are different ciphertext even if plaintext blocks are identical. This conceals the plaintext pattern, therefore making it more secure.

 

Other block cipher modes of operation are Cipher Feedback (CFB) and Output Feedback (OFB). 

 

DES (Data Encryption Standard)

DES was based on the Lucifer cipher which was developed by Horst Feistel at IBM in the early 1970s.  

 

Basic configuration of Feistel cipher

  1. The plaintext is converted to its 8-bit ASCII format. It is further divided into blocks of 64 bits. If the size of a block is less than 64 bits, padding is added to make it 64-bit.
  2. An initial permutation is applied to each block of 64 bits so that the bits are rearranged. The permuted block is then divided into two halves namely the left half L0 and the right half R0.
  3. Each half goes through 16 rounds of the encryption process using 48-bit subkeys such that
    Ln=Rn-1 and Rn=Ln-1⊕ f(Rn-1,Kn) for n from 1 to 16, where n is the round number.
    For instance, L1= R0 and R1=L0 ⊕ f(R0,K1).
    f is the round function and Kn is the subkey for a particular round(round key). A different subkey is used for each round of the encryption.
  4. After the final round, the halves are swapped such that L'16 = R16 and R'16 = L16
  5. A final permutation, which is the inverse of the initial permutation, is applied to obtain ciphetext.



 
DES subkeys generation
  1. Permuted choice 1 (PC-1) is performed on the 56 non-parity bits of the initial key. Note that the initial key has 64 bits but the key itself is actually 56-bit long as each 8th bit is used for a parity. Parity is used for error detection and checks if the key was correctly retrieved.
  2. The 56 bits are divided into a 28-bit left block (C0) and a 28-bit right block(D0)
  3. Depending on the round number, a number of left circular shifts is applied to each half to create 16 pairs of Cn and Dn (C1D1 to C16D16). Each pair is formed from the previous pair.
  4. Compression permutation PC-2 is applied to each of the 16 CnDn calculated in the previous step to form the 48-bit subkey Kn for each round. 


Round function

  1. Expansion permutation to convert Rn-1 from 32 bits to 48 bits. DES round function only works with 48 bits blocks.
  2. The expanded data E(Rn-1) is XORed with the subkey Kn for that round.
  3. The result is divided into 8 blocks of 6 bits and each block is passed to a different S-box. An S-box takes a 6 bit input and outputs a 4-bit result. 8 blocks of 6 bits will be transformed to 8 blocks of 4 bits, amounting to 32 bits in total.
  4. Lastly a permutation P is performed on the result of the s-box. This is the output of the round function. 

 

Drawbacks of DES

DES is no longer considered secure due to its short key size (56 bits) and outdated design. With the advancement in technology and computers becoming more powerful, DES can be broken using brute force attack. Brute force involves trying out all the possible key combinations until the correct key is discovered. 

 

Successors of DES 

Triple DES (3DES) was created to provide a more secure alternative to DES.  As the name suggests, it involves performing the DES algorithm 3 times. It performs encryption by using three 56-bit shared keys. Firstly, the plaintext is encrypted using the first key, then the result is decrypted using the second key and finally it is encrypted again with the third one. 

However 3DES is still insecure and vulnerable to brute force. Its usage is disallowed after 2023. 

Advanced Encryption Standard (AES) was developed as an alternative to DES. It is based on Substitution-Permutation Network (SPN) instead of Horst Feistel Network. AES has never been cracked and is still used to this day.

Methods of cracking ciphers 

Exhaustive search: searching for the key by trying every possible keys until the correct one is identified. 

Differential cryptanalysis: finding vulnerabilities in the cipher by comparing how differences in input is related to differences in output.

Linear cryptanalysis: finding vulnerabilities in the cipher by looking for mathematical weaknesses in the cipher structure.

 

Public key encryption

It is also known as asymmetric cryptography. It uses a pair of keys known as a public key and a private key. The public key is made available to everyone and is used for encryption. The corresponding private key is a secret key used for decryption. It was invented in 1976 by two Stanford mathematicians, Diffie and Hellman.

If Alice wishes to send a message to Bob, Alice encrypts the plaintext using Bob's public key. Only Bob is able to decrypt the ciphertext using his private key as he is the only one who has it.

 

Benefits:

  • Since encryption is done using a public key which is open to everyone, key distribution is not an issue. 
  • It requires a smaller number of keys for communication in large groups as compared to symmetric key cryptography. (only  2n keys where n is the number of people involved in the communication). Each person has 2 keys: a public and a private key. 

 

Disadvantages:

  • Encryption and decryption is less efficient and slower than symmetric encryption.

 

Public key algorithm relies on two types of mathematical problems: 

  • the integer factorization problem (RSA encryption, Rabin encryption) 
  • the discrete logarithm problem (Elgamal, DSA authentication). 


One way function 

It is a function for which it is easy to compute the output for a particular input, but difficult to find the input given the output. 

 

Integer Factorization 

In the integer factorization problem, multiplying two large prime numbers and computing the result is fairly easy. However, finding the two prime numbers given the composite number is difficult.  


Discrete logarithm

Consider the equation below: 

ax ≡ y mod p 

In the discrete logarithm problem,  given a and x, it is easy to compute y. However, finding x if you know a and y is difficult. x is the logarithm of y.

Note that if a= b, then x is the logarithm of b to base a. 

 

Fermat's little theorem

It states that if n is a prime number and a is an integer coprime to n, then 

an-1 ≡ 1 mod n.

 

Euler's theorem

It states that given a natural number n and  a coprime a, then
akφ(n) ≡ 1 mod n, where φ(n) is Euler's totient function and k is an integer.  

The totient function gives the number of integers between 1 to n which are coprime to n. For example, 

φ(4) = 2 . The numbers that are coprime to 4 between 1 and 4 are {1, 3}

When we multiply the equation by a, we get 

aφ(n)+1  ≡ a mod n 

Note: Fermat is a special case of Euler's theorem as for prime numbers
φ(n) = n - 1. 

 

Example of Euler's theorem: 

3 and 5 are coprime. Let's say n=5 and a=3. 

φ(5) = 4 
3kφ(n) ≡ 1 mod 5, where k is an integer
34k ≡ 1 mod 5

34 ≡ 1 mod 5, 38 ≡ 1 mod 5, 312 ≡ 1 mod 5

 

Euler's totient function for the product of two prime numbers  

N is the product of two prime numbers p and q (N=pq). Since p and q are prime numbers, the multiples of p and the multiples of q make up all the numbers that are not coprime to N.

Multiples of p between 1 and N are p, 2p... up to qp (excluding qp which is N). 

Number of multiples of p = q-1

  

Multiples of q between 1 and N are q, 2q,... up to pq (excluding pq which is N). 

Number of multiples of q = p-1

 

φ(N)  =  Total number of integers from 1 to N-1 - (number of multiples of p + number of multiples of q)
=  (N - 1) - ((q-1)+(p-1)) 
=  (pq - 1) - (p + q - 2)

=  pq - p - q + 1

=  (p-1)(q-1)

This means φ(pq) = φ(p) x φ(q) 

 

RSA (Rivest–Shamir–Adleman)

To encrypt plaintext (m) , we raise it to the power of e (a public key) and its remainder when divided by N(another public key) yields the ciphertext (C).

C= me mod N

To decrypt ciphertext , we raise it to the power of d (the private key) and its remainder when divided by N yields the plaintext (m). 

m= Cd mod N

RSA comprises 4 steps: key generation, key distribution, encryption and decryption.

Generating RSA keys

  1. Select two large prime numbers p and q at random and calculate N=p × q. p and q are kept secret.
  2. Find the totient of N
    φ(N)=φ(p)×φ(q)
    φ(N)=(p-1)×(q-1)
  3. Select a random integer e such that is e is coprime with φ(N) and 1<e<φ(N).   me should be greater than N otherwise the message would not get scrambled. It's best to use a reasonable large e.
  4. Using Euclidean algorithm to find d. d is the modular multiplicative inverse of e mod φ(N) . 


Example:

Two large prime numbers are chosen. For the sake of simplicity, we will use small numbers. 17 and 11 are chosen. 

N= 17 x 11= 187 

φ(187)= φ(17) x φ(11) = 16 x 10 = 160.

 

A large enough integer e is chosen such that me > 187, e is coprime with 160 and 1<e< 160. Let e = 67.

Using Euclidean algorithm, the multiplicative inverse of 67 mod 160 is calculated to be 43. Therefore d = 43.

Since N = 187, the data is divided in 7-bit units (27 = 128 which is less than 187). 

Suppose the letter h were to be encrypted. It is converted to its equivalent binary ASCII code (01101000) and divided into groups of 7 bits. If there is an insufficient number of bits to make a complete 7-bit, padding is added to the right (in red). 

0110100     0000000 

(0110100)2 = (52)10

(0000000)2 = (0)10

Each group is then converted into decimal and undergoes the following calculation:

C= me mod N

52 67 mod 187 =  35 

67 mod 187 =  0 

For the sake of simplicity, the steps for calculating 52 67 mod 187 have been omitted.

The denary numbers are converted to their corresponding ASCII characters and sent.

Decryption

The recipient converts the data back to their 7-bit decimal notation and the following calculation is performed to decrypt the data: 

m= Cd mod N 

35 67 mod 187 =  52 

67 mod 187 =  0

They are converted to 7-bit binary numbers. They are further divided into 8-bit groups and padding is discarded. The groups are converted to ASCII to obtain the message.  

0110100     0000000 

Message = 01101000 = h 


Hash function

A hash function is a one-way mathematical function which converts a numerical input into another unique numerical value known as hash. The hash is always of fixed length. Hash functions are used for data integrity. Examples of hash functions include Message Digest (MD), Secure Hash Function (SHA) and Whirpool.

When a message is sent, the sender transmits its hash value along with it.  Once the message is received, the recipient recalculates the hash and compares it to the hash attached to the message. If the values match, this means that the message has not been tampered with. This process is known as performing a checksum.

 

Message Authentication Code (MAC)

Message Authentication Code or tag is a value which verifies the authenticity of a message. It is calculated using a pre-shared key which only the sender and the recipient have. The MAC is sent along with the message to ensure that the message comes from the stated sender and has not been falsified in transit.  Once the message is received, the receiver calculates the MAC value and compares it to the original MAC from the sender. If they match, the integrity and authenticity of the message are assured. However MAC does not provide non-repudiation. A sender can deny being the sender of the message.

Message Authentication Codes are used for electronic funds transfer and online shopping. 

  

Digital signature

The sender signs a message using their own private key. This signature is sent along with the message. The recipient decrypts the signature using the public key of the sender. If the decrypted signature matches the message, the message and the sender are assumed to be authentic. Moreover, a digital signature provides non-repudiation since the message was signed with the sender's private key which only they have access to. Examples of digital signature schemes include RSA and ElGamal.

 

Note: Message Authentication Codes provide integrity in symmetric encryption whereas digital signatures provide integrity in public key encryption.


Man In The Middle (MITM) Attack

The attacker positions themself between the sender and the receiver and secretly eavesdrops and alters the messages while the latter believe they are talking to each other directly over a private connection.


Digital Certificate 

It is a certificate issued by a trusted entity called the Certificate Authority (CA).  A digital certificate includes the public key, information about its owner and the digital signature of that public key by the Certificate Authority's private key.

An individual who wants to publish their public key must first make a request to the CA. After following strict guidelines to confirm the identity of the person, the CA signs the public key of the latter and attaches the digital signature to the public key. The CA then stores the certificate in a repository from which other people may download it and verify the downloaded public key using the CA's public key. 

 

SSL/TLS

Secure Sockets Layer (SSL) is an encryption-based security protocol for web browsers and servers for encryption, authentication and integrity of data over the internet.

Transport Layer Security (TLS) is simply an improved and more secure version of SSL (the latest version being TLS 1.3). The name change was to avoid legal issues due to a change of ownership in 1999. However TLS is still referred to as SSL or "SSL/TLS" nowadays. We will be calling it "TLS" in this article. 

A website that uses TLS has HTTPS instead of HTTP.

TLS can only be used by websites or applications that have a TLS certificate. A TLS certificate is simply a digital certificate which is issued by CA and it resides on the server of the website or application. A user's device uses the public key on the web certificate during the process of establishing a secure connection to the web server while the latter uses its private key to decrypt the data sent by user. 

When a user goes to a website, the browser will look for the website's TLS certificate. Then a TLS connection is initiated using handshaking to check the validity of the certificate and for authentication. If valid, session keys are generated thus an encrypted link is established between client and server.  

 

TLS Handshake

A TLS handshake is a process during which the client and the server use asymmetric encryption to establish the details of their connection such as the version of TLS and the cipher suites they will use. During that process, the server is authenticated and the session keys to be used (for symmetric encryption) after the handshake are generated as well. Note that different session keys are used in each new session.

In brief, a TLS handshake has three main aims:  

  1. Exchanging encryption capabilities
  2. Authenticating the TLS certificate
  3. Generating and exchanging session keys

 

The steps of a TLS 1.2 handshake are listed below:

  1.  The clientHello message : The client sends a clientHello to the server. The message includes the highest version of TLS and a list of cipher suites that they support. It also contains a random large prime number called the client random.
  2.  The serverHello message: The server sends a serverHello to the client. The serverHello includes the chosen version of TLS and cipher suite. If client and server do not share any common suite, the connection terminates unsuccessfully. The message also includes a randomly selected large prime number called the server random.
  3.  Server certificate: The server sends its digital certificate to prove its identity to the client.  
  4.  (Optional) Certificate request: Sometimes the server may also request a client's digital certificate. If so, the server sends a digital certificate request to the client. This rarely happens. 
  5.  A server key exchange message: is sent by the server if the public key information from the server certificate is not sufficient and additional information is needed for the client to exchange the pre-master key. For example, in cipher suites based on Diffie-Hellman (DH), this message contains the server's DH public key.  An additional step is required for authentication. The client random, server random and the DH parameter are signed with the server's private key and sent. The client will use the server's public key to verify the signature later on and respond with its own DH parameter.
  6.  A server hello done message: The server tells the client it has sent all its messages (the serverHello message is finished). 
  7.  (Optional) Client certificate: The client sends their digital certificate if the server made a request.
  8.  Client key exchange: The client generates a random string of bytes called the pre-master key. The pre-master key is encrypted using the server's public key (extracted from the TLS certificate) and sent to the server. Upon receipt, the server decrypts the pre-master key using its private key (This acts as authentication as only the server can decrypt the pre-master key with its private key). Now, using the client random, server random and pre-master key, both the client and the server compute the master key from which the session key is derived. Note that if DH is used, there is no need for the client to send a pre-master secret to the server as the DH parameters exchanged in step 5 are used to calculate the session key instead. The client sends its own DH parameter to the server.
  9.  Client Change Cipher Spec: The client sends a "change cipher spec" message to the server to let it know that they have generated the session key and is switching to an encrypted environment. 
  10.  Client Finished: The client sends a "finished" message to indicate that the handshake is finished on the client side. The message is encrypted using the session key and contains a MAC to ensure integrity of handshake.
  11.  Server Change Cipher Spec: The server sends a "change cipher spec" message to the client to let them know that it has generated the session key and is switching to an encrypted environment.  
  12.  Server Finished: The server sends a "finished" message to indicate that the handshake is finished on the server side. The message is encrypted using the session key and contains a MAC to ensure integrity of handshake.

 

Cipher suite

A cipher suite is a set of algorithms and protocols that enable secure a network connection through TLS. The set of algorithms usually includes a key exchange algorithm (an asymmetric algorithm for exchanging symmetric keys), a bulk encryption algorithm (a symmetric encryption algorithm to encrypt and decrypt large amounts of data) and a MAC algorithm (for authentication and integrity of message).  

 

 

 

 

 

 

 

Comments