Cryptopals - Set 1: Basics

crypto

The Cryptopals challenges have been on my radar for some time but I have never taken the time to check them out. Recently I had a short conversation about wanting to better understand some topics like assembly, binary exploitation, and crypto, and the person I was talking to recommended checking out Cryptopals, given that I’ve had it bookmarked forever and it has just been on my “things to eventually get to” list I decided that with that little push maybe now is the time to give them a go.

I have decided that instead of researching blog posts or opinions about the challenges, something I do often before trying out a new resource, I was just going to dive into this one relatively blind.

If you are planning on doing the Cryptopals challenges, the below content contains answers and spoilers for these challenges.

Table Of Contents


Set 1: Basics

I will probably tackle these challenges in Python as this is the language I am most familiar with.

1. Convert hex to base64

For the first challenge, we are asked to convert a hex string to base64, there is no answer validation for Cryptopals challenges so they give you the expected output to compare your code output to.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 1 (Convert hex to base64)
#
# Expected Output: 'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t'

import base64

hex_input = '49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d'

# Convert our string to bytes
hex_bytes = bytes.fromhex(hex_input)

# Encode
base64_string = base64.b64encode(hex_bytes).decode('utf-8')

print(base64_string)

Output:

$ python ./cryptopals/set1/set1chal1.py
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

2. Fixed XOR

For this challenges we are asked to write a function that takes two buffers of equal-length and produce their XOR’d combination.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 2 (Fixed XOR)
#
# Expected Output: '746865206b696420646f6e277420706c6179'

hex_input = '1c0111001f010100061a024b53535009181c'
xor_input = '686974207468652062756c6c277320657965'

def fixed_xor(input_hex_string, input_xor_string):
    hex_string_bytes = bytes.fromhex(input_hex_string)
    xor_string_bytes = bytes.fromhex(input_xor_string)
    result_bytes = bytearray()
    for i in range(len(hex_string_bytes)):
        result_bytes.append(hex_string_bytes[i] ^ xor_string_bytes[i])

    return result_bytes.hex()

result = fixed_xor(hex_input, xor_input)
print(result)

Output:

$ python ./cryptopals/set1/set1chal2.py
746865206b696420646f6e277420706c6179

3. Single-byte XOR cipher

For this challenge, we are given a hex string to decode and are told it has been XOR’d with a single byte. From this, I made an assumption it was a single byte from a printable character so I iterated over printable characters with Python to find the decoded string.

Afterward, I added some regex just to provide a clean output.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 3 (Single-byte XOR cipher)
#
# Expected Output: Decrypted version of...
# "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"

import re
from string import printable

hex_input = '1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736'
hex_string_bytes = bytes.fromhex(hex_input)

# Regex to filter output, built after finding correct response.
regex_pattern = r"^[a-zA-Z0-9 \']*$"

for char in printable:
    char_byte = char.encode('utf_8')
    result_bytes = bytearray()
    for i in range(len(hex_string_bytes)):
        result_bytes.append(hex_string_bytes[i] ^ char_byte[0])

    result_string = result_bytes.decode("utf-8")

    # Some regex for clean output
    if re.match(regex_pattern, result_string):
        print('XOR Char: {}\t-\tOutput: {}'.format(char, result_string))

Output:

$ python ./cryptopals/set1/set1chal3.py
XOR Char: X     -  Output: Cooking MC's like a pound of bacon

4. Detect single-character XOR

For this challenge, we are given a text file containing 300+ lines of hex strings and we are tasked with finding the one string that is XOR’d with a single character.

To do this we can modify our code from Challenge 3 to iterate over every line in the file.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 4 (Detect single-character XOR)
#
# Expected Output: Detect which string in the list of strings from '4.txt' is 
# xor'd with a single byte. 


import re
from string import printable

regex_pattern = r"^[a-zA-Z0-9 \'\"]*$"

with open("./cryptopals/set1/4.txt") as file:
    for line in file:
        hex_input = line
        hex_string_bytes = bytes.fromhex(hex_input)
        for char in printable:
            char_byte = char.encode('utf_8')
            result_bytes = bytearray()
            for i in range(len(hex_string_bytes)):
                result_bytes.append(hex_string_bytes[i] ^ char_byte[0])

            try:
                result_string = result_bytes.decode("utf-8")
                if re.match(regex_pattern, result_string):
                    print('XOR Char: {}\t-\tOutput: {}'.format(char, result_string))
            except:
                pass

Output:

$ python ./cryptopals/set1/set1chal4.py
XOR Char: 5     -  Output: Now that the party is jumping

5. Implement repeating-key XOR

This one is pretty simple, we XOR the provided string with the value “ICE” repeating, I have chosen to use a modulo as an index to indicate which letter of “ICE” is needed on each round.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 5 (Implement repeating-key XOR)
#
# Expected Output: XOR the provided phrase with a repeating key of 'ICE' output 
# should read as follows...
# 0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272
# a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f

input_string = """Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal"""
xor_string = 'ICE'

result_bytes = bytearray()
for i in range(len(input_string)):
    string_byte = input_string[i].encode('utf-8')[0]
    xor_byte = xor_string[i % len(xor_string)].encode('utf-8')[0]
    result_bytes.append(string_byte ^ xor_byte)

print(result_bytes.hex())

Output:

$ python ./cryptopals/set1/set1chal5.py
0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f

6. Break repeating-key XOR

This is the first challenge that feels like it has some real substance. We are provided with a file (6.txt) which is a base64 representation of data that has been XOR’d with an unknown repeating key.

To complete this challenge we need to first determine the Hamming distance to try and determine the likely size of our repeating key.

Once we know the likely size we can break the content into byte chunks of that key size and solve each index as if it were a single character XOR. So if we line up all of our chunks we should be solving one character XOR encryption in each column. Then we can either stitch it back together or combine each of our single-byte XOR values into the full XOR key and decrypt it as a whole.

One of the bits that tripped me up for a while was programmatically picking the best-decoded output candidate. The approach that I figured out and worked best for me was to use a regex findall query and pick the candidate that had the highest match for alphanumeric and space characters [a-z0-9\s]{1}. I then appended that candidate to our xor key for later use.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 6 (Break repeating-key XOR)
#
# Expected Output: Break the repeating-key XOR'd encryption on the provided file
# '6.txt'. Provided file is encoded with base64 after having been XOR'd

import re
import base64

def get_hamming_distance(bytes_1, bytes_2):
    distance = 0
    for i in range(len(bytes_1)):
        distance += format(bytes_1[i] ^ bytes_2[i], '08b').count('1')
    return distance


def get_key_length(byte_string, max_key_size=50):
    key_dict = dict()
    for test_length in range(1, min(max_key_size, len(byte_string) // 4 + 1)):
        prev_block = None
        distance = 0
        dev_by = 0
        for i in range(0, len(byte_string), test_length):
            block = byte_string[i:i+test_length]
            if prev_block:
                distance += get_hamming_distance(block, prev_block) / test_length
                dev_by += 1
            prev_block = block
        key_dict[test_length] = distance / dev_by
    return min(key_dict, key=lambda k: key_dict[k])


def brute_force_xor(byte_string, key_size):
    blocks = [byte_string[i:i+key_size] for i in range(0, len(byte_string), key_size)]
    regex_pattern = r"[a-z0-9\s]{1}"
    xor_key = b''
    for i in range(key_size):
        block_bytes = b''
        for block in blocks:
            if i < len(block):
                block_bytes += bytes([block[i]])
        current_candidate = b''
        highest_match = 0
        for j in range(255):
            dec_block_bytes = b''
            for byte in block_bytes:
                dec_block_bytes += bytes([byte ^ j])
            try:
                decoded = dec_block_bytes.decode('utf-8')
                alphanum_count = len(re.findall(regex_pattern, decoded))
                if  alphanum_count > highest_match:
                    highest_match = alphanum_count
                    current_candidate = bytes([j])
            except:
                pass
        xor_key += current_candidate
    return xor_key


def xor_with_key(byte_string, xor_key):
    result_bytes = bytearray()
    for i in range(len(byte_string)):
        byte = byte_string[i]
        xor_byte = xor_key[i % len(xor_key)]
        result_bytes.append(byte ^ xor_byte)
    return result_bytes.decode('utf-8')


with open('./cryptopals/set1/6.txt', 'rb') as file:
    base64_data = file.read()
    ciphertext_bytes = base64.b64decode(base64_data)


key_length = get_key_length(ciphertext_bytes, 50)
key = brute_force_xor(ciphertext_bytes, key_length)
plaintext = xor_with_key(ciphertext_bytes, key)

print("Key: {}\n\n{}".format(key.decode('utf-8'), plaintext))

Output:

$ python ./cryptopals/set1/set1chal6.py
Key: Terminator X: Bring the noise

I'm back and I'm ringin' the bell
A rockin' on the mike while the fly girls yell
In ecstasy in the back of me
Well that's my DJ Deshay cuttin' all them Z's
Hittin' hard and the girlies goin' crazy
...cut...

7. AES in ECB mode

This one is fairly straightforward, if you are using python you may need to install the pycryptodome package.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 7 (AES in ECB mode)
#
# Expected Output: Using AES in ECB mode, decrypt the provided '7.txt' file
# using the provided key 'YELLOW SUBMARINE'.

from Crypto.Cipher import AES
import base64


def decrypt_aes_ecb(ciphertext_bytes, key):
    cipher = AES.new(key, AES.MODE_ECB)
    decrypted = cipher.decrypt(ciphertext_bytes)
    return decrypted


with open('./cryptopals/set1/7.txt', 'rb') as file:
    base64_data = file.read()
    ciphertext_bytes = base64.b64decode(base64_data)

key = b'YELLOW SUBMARINE'
plaintext = decrypt_aes_ecb(ciphertext_bytes, key)
print(plaintext.decode('utf-8'))

Output:

$ python ./cryptopals/set1/set1chal7.py
I'm back and I'm ringin' the bell 
A rockin' on the mike while the fly girls yell
In ecstasy in the back of me
Well that's my DJ Deshay cuttin' all them Z's
Hittin' hard and the girlies goin' crazy
...cut...

8. Detect AES in ECB mode

For this challenge we are trying to detect which line of the ‘8.txt’ file has been encrypted with AES in ECB mode, to do this we can break each line into 16-byte blocks and check for duplicates.

Code:

#!/usr/bin/env python 
#-*- coding: utf-8 -*-
# Cryptopals: Set 1 - Challenge 8 (Detect AES in ECB mode)
#
# Expected Output: You are provided with the file '8.txt' containing ~200 hex strings, 
# and are asked to detect the string that has been encrypted with AES in ECB mode.

with open('./cryptopals/set1/8.txt', 'r') as file:
    for line in file:
        line_bytes = bytes.fromhex(line.strip('\n'))
        blocks = [line_bytes[i:i+16] for i in range(0, len(line_bytes), 16)]
        for block in blocks:
            if blocks.count(block) > 1:
                print(line_bytes)
                break

Output:

$ python ./cryptopals/set1/set1chal8.py
b'\xd8\x80a\x97@\xa8\xa1\x9bx@\xa8\xa3\x1c\x81\n=\x08d\x9a\xf7\r\xc0oO\xd5...
...cut...

Thoughts on Set 1

I’m sure there are a myriad of better ways to do a lot of what I coded, but I’ve never considered myself much of a programmer.

So that’s it for Set 1, all are well-defined and interesting challenges that are not too involved to start with as I managed to knock these out in an evening. I will say they were quite enjoyable in that “scratching the problem-solving itch” way. I plan on tackling the other sets eventually and when I do they will be posted here as well.