Data Analytics for Cybersecurity, Part 1: Getting Started with R Programing Language for Cybersecurity Analytics

Data Analytics

Welcome back, my aspiring cyberwarriors!

 

 

As our digital world progresses toward a circumstance where data analytics and artificial intelligence become key components of any cybersecurity strategy, it is becoming increasingly critical that you understand these techniques to remain relevant and employed in the most exciting and well-paying field in IT. Towards that end, we at Hackers-Arise are offering a new class in data science analytics in cybersecurity. This will be the first of several classes on this subject to take you to the leading edge of cybersecurity.

 

 

Within data science, there is a programming language that is dominant in this field and goes by the very simple moniker, R. When IT recruiters are asked what programming languages they want their prospects to be proficient in, Python is first and R is second.

 

 

In this tutorial, we get you started with R but for a more complete understanding of R and Data Science Analytics in Cybersecurity, sign up for these classes in our Subscriber Pro program.

 

What is R?

 

R is a programming language designed specifically for data science and graphical analysis. You will find it used extensively in statistical inference, data analysis and machine learning (ML).

 

Why R in Cybersecurity?

 

One of key strengths of R in cybersecurity is its ability to handle VERY large datasets. In cybersecurity, we are often working with massive data sets that can include:

 

  1. Network Traffic

  2. Malware

  3. Web Apps

  4. Software

  5. email

  6. Binaries

  7. passwords

  8. botnets

  9. malicious URL’s

  10. SCADA/ICS Attack Datasets

  11. hashes

  12. YARA rules

 

R is designed to handle these large data sets effectively and efficiently.

 

In addition, R can be used in cybersecurity for such things as network analysis, intrusion detection, spam detection, log analysis, machine event analysis, and much, much more. It can also be used for hacking/penetration testing as it has the capability of detecting vulnerabilities and, in some cases, attempting an exploit.

 

Let’s get started with R and its application to cybersecurity!

 

Step # 1: Download and Install R

 

You can download R from the r-project at http://cran.r-project.org or if you are using Kali Linux, you can simply download the package from the repository by entering:

 

kali > sudo apt install r-base

 

The latest version is R-4.4.2

 

Like most programming languages, you will likely want to use an Integrated Development Environment (IDE) to aid your writing of code. Such IDE’s can vastly improve your efficiency in writing effective code. The IDE of choice for R is the R Studio.

 

You can the download R-Studio at the link below.

 

Now we are ready to begin using R!

To start the r-studio, simply enter:

kali > ./rstudio

 

 

When you do so, you should see a console like the screenshot above.

 

Now that you have R-Studio up and running, we can begin to enter some simple commands and become familiar with the R syntax.

 

Step # 2: Some Simple Functions and Variables in R

 

Inside the R-Studio console, you simply enter commands at the > prompt and hit return to get the system to process your command. Not that much different than Linux or Python, for that matter.

 

To print some text, we can enter the print command followed a (” and then the text we want to print and close with a “) such as;

 

/> print (“R is essential to data science in cybersecurity”)

 

As you can see above, the console printed the text between the (” and “) in the print function.

 

In some cases, we may want to store this text into a variable. We can do so

 

by creating a variable and directing (in Linux, we use both the < and > for directing data. The same in R) the text into the variable. Let’s create a variable named “essential” and direct our statement into that variable. We can use <- to direct our text into the variable.

 

/>essential <-“R is essential to data science in cybersecurity”)

/> print (essential)

[1] “R is essential to data science in cybersecurity”

 

 

Now every time you tell the console to print the essential variable, it will print this statement.

 

Step # 3: Simple Math in R

 

Now, let’s look at some simple math operations in R. Let’s say we wanted to multiply two numbers. We can enter;

/> 3 * 3

If we want to use an exponent (raised to the power), we use the ^ symbol, such as

/> 3 ^ 3

If we want to add two numbers

/> 3 + 3

If we want to subtract one number from another

/> 3- 3

If we want to store the results of a math operation into a variable called “threecube”

/> threecube <- 3 ^ 3

We can then print that variable by using the print statement with variable name enclosed in ( )

/> print (threecube)

 

 

Then, if you look at the upper right of our R studio, you will see a list of our global variables we created.

 

 

Step # 3: Visualization

 

One of the key elements of data science is the visualization of data. Pictures are worth a thousands words or, for that matter, a thousand data points. Visualization helps the reader of the report to better understand the results more quickly and more completely than any table or raw data. Visualization is one of strengths of R.

 

Let’s imagine that we are tracking attacks against some of our hosts on our internal network. Our first step is to create a variable x and direct the IP addresses (although IP addresses appear to be numbers, we need to treat them as strings of text the same as Python) of these systems into it.

 

Next, we need to create a variable y and direct the values of the number of attacks against each of these hosts.

 

Finally, we need to use the barplot function to create bar graph that shows each IP address on the bottom or x axis and the number of attacks on vertical or y axis.

 

 

As you can see below, the r-studio has generated a nice graphic showing us the number of attacks on each system!

 

 

Note that on the bottom or x axis, that not all of the data has been printed due to a lack of space. We can remedy this by simply clicking on the zoom button on the upper left of the graph plot. When we do so, the bar graph is displayed in a larger format and all of the IP addresses are displayed

 

 

Step # 4 Getting Help in R Studio

 

Like any programming language, you will likely need some help at times in R. The R-Studio has powerful help engine that can answer most of your questions.

 

For instance, if you wanted help understanding how the sqrt (square-root) function works, you can simply enter the ? followed by sqrt such as seen below

 

 

The R-Studio will respond by offering a you help screen to the lower right window like seen below.

 

 

In addition, you can use the keyword help, followed by the function name such as seen below.

 

/>help (sqrt)

 

This will also provide you the same help screen as seen above.

 

Summary

 

Cybersecurity is rapidly advancing to become a quantifiable science. Data science, machine learning, and AI will become critical to this field in coming months and years. The R programming language is the first choice of data scientists and employment recruiters for those seeking employment in this rapidly growing field.

 

To learn more about data science applied to cybersecurity, look for our upcoming Data Science Analytics in Cybersecurity and Artificial Intelligence in Cybersecurity classes in the near future.