Python Basics for Hackers, Part 6: Decoding an Encrypted Message with Frequency Analysis (Cryptanalysis)

Cryptography Basics

Welcome back, my aspiring cyberwarriors!

 

Cryptography is a fundamental skill of cybersecurity. It enables so many of the protocols and technologies that keep our data safe such as passwords, Internet traffic, database info, messaging, and so much more.

 

Cryptanalysis is the flip side of cryptography. It is the science of de-crypting or unmasking the encrypted data. Imagine the mathematicians and cryptographers (including one of the greatest minds of the 20th century, Alan Turing) working around the clock at Bletchley Park in the UK during World War II. Their task was to de-crypt the Nazi Germany messages encrypted with the enigma machine (the enigma machine was a state-of-the-art encryption machine before digital computers). In that case, lives and nations were at stake.

 

Cryptanalysis in our modern world might involve de-crypting messages of adversaries during war-time, decrypting a hard drive encrypted using ransomware, decrypting a criminal’s encrypted data on their hard drive and so many other applications.

 

In a previous post, I outlined the major methods of cryptanalysis with an assessment of their likely success in de-crypting data. You can read it here. In this series, I will be addressing some techniques useful in decrypting data in all of the scenarios cited above. No data is free from decryption, it only a matter of how difficult it is to decrypt. In many cases, the decryption process can be very slow and expensive. With the advent of quantum computing, the time- frames and cost of decryption will likely become much shorter and cheaper, respectively.

 

As with the study of any discipline, it is best to start with the simplest first. In this case, we will examine frequency analysis. Every language has repeating patterns of how they use individual letters. Some letters are used more frequently than others. In English, the letter “e” is used far more commonly than “x”, for instance. These patterns can be used to de-crypt some messages by simply looking at the frequency of each letter in the encrypted message and matching it to the English language letter of the same frequency. Simple right!

 

Of course, the limitation of this method is that works best for long messages and simple encryption algorithms such as the Caesar Cipher. In addition, if the algorithm encrypts multiple words or phrases at a time, this method will not work.

 

Short messages may not incur the same frequency of letters of the alphabet as we would expect. The longer the message, the greater the probability that the letter frequency in the encrypted message will match the frequency of the general language.

 

 

As you can see above, this chart shows the relative frequency of each letter in the Latin alphabet in the English Language. As you would expect, “e” occurs most often, almost 13% of all letters are “e”. “X” and “Z” is used very infrequently in English. This knowledge can help us break many simple encryption algorithms and some more advanced algorithms.

 

In an earlier post here, I showed you have to create a simple Caesar Cipher encryption algorithm in Python (look for my new book, Python Basics for Hackers coming late 2025). Let’s develop a python script here that can decrypt those messages using frequency analysis.

 

Step 1: Open spyder or any text editor or IDE

 

The first step, of course, is to open a text editor or a Interactive Development Environment (IDE). There are many, but Spyder is very good and free. You can download it from the Kali repository.

 

kali> sudo apt install spyder

 

When the downlaod and install is complete, simply enter:

 

kali > spyder

 

This will open the IDE in Kali.

 

Step 2: Start a New Project

 

Next, open a new file in spyder.

 
 

Step 3: Imports

 

As we learned earlier. Python has almost innumerable modules and packages that you can employ in your code. In this case, we will need the module re (regex) or regular expressions, and a python collection named “Counter”. Collection modules in python provide different containers to store different types of data. These collections can provide a way to access the objects and iterate through them. Let’s add an import “re” the regular expressions module and the Counter class for counting objects.

 

import re

from collections import Counter

 
 

Next, we need to define a class that we will call TextAnalyzer (class TextAnalyzer) and then define several functions within it. this class analyzes the text to calculate the letter frequencies, calculates the percentages and ignores the latter case and any non-alphabetic characters.

 

We begin by initializing __init__ and creating a dictionary to store the counts of the letters and then tracks the total number of letters processed. This is necessary to calculate the percentages.

 

Next, it defines a function analyze_text. This function;

 
  1. normalizes the text by converting it all to lowercase

  2. uses regex to extract only alphabetic characters

  3. generates letter frequencies

  4. Updates the letter count

  5. Divides each letter’s count by total_letters and multiplies by 100

  6. Orders the letters by frequency with the highest first

 
 

Step 4: Create a Main Function

 

Finally, we will create a main function that will calls the TextAnalyzer function, prompt the user for the sample text to analyze, and then print out the frequency of each letter in that message

 
 

Step # 5 Run the Script

 

Now that we have the script complete, let’s try running it and see how well it works. The first step it to create a cipher text. I will be using the introduction to my new Linux Basics for Hackers 2nd edition. That book begins;

 

Welcome to the new, updated second edition of Linux Basics for Hackers!

I want to begin by thanking all my readers for making the first edition

such a resounding success.

 

Now let’s enter it into our caesar cipher script and it produces an output like that below with a shift of 4.

 

kali > sudo python3 ./caesar_cipher.py

 
 

Now that we have created our cipher using a shift of 4, let’s enter the cipher into our frequency cryptanalysis script. You can simply copy and paste.

 

kali > sudo python3 ./frequency_cryptanalysis.py

 
 

As you can see below, our frequency cryptanalysis tool displays the frequencies of each letter.

 
 

Since the most common letter in this analysis is “i”, we can probably conclude that it represents an “e” in the original message.

 
 

A you can see here, we leave the user with a message on to use this tool to de-crypt messages.

 

Summary

 

Cryptanalysis is the study of decoding encrypted messages. We can use some simple python to decode a message encoded with the ancient caesar cipher. This is very similar to what Alan Turing and his team did during World War II to decrypt the messages from the Nazis to their remote submarines.

 

Cryptography and python is a fundamental skills in cybersecurity. In this exercise, we used some fundamental python to create a simple de-cryption tool both to demonstrate the capabilities of Python and some fundamental cryptanalysis.

 

If you want to learn more Python, look for my upcoming book, “Python Basics for Hackers”. Available only in our online store.