Web App Hacking: Spidering a Web Site with Web Scarab

Hacking

Web Applications (apps) provide so much opportunity for mischief by hackers. They can be defaced and compromised and probably most importantly, they can provide an entre’ point to the corporation’s internal network and resources (most importantly, the database). This series is designed to show you the many ways to hack these Internet facing applications.

Now that we have begun this trip down web app hacking lane, we need to first address target reconnaissance. Like any hack, reconnaissance is critical. (Are you tired of me saying that yet?)

There is no better telltale sign of a script-kiddie than a hacker who runs willy-nilly into trying to hack/exploit without doing proper recon. They say they don’t “have time” to do proper recon; then when they are invariably unsuccessful, they scratch their head and ask, “Why didn’t this hack work?”

For a professional hacker, reconnaissance is often 70% or more of the time we spend on a hack. Since each hack/exploit is specific to a vulnerability, and the vulnerability is specific to the OS, the ports, the apps, the technologies used–and even the language– hacking without recon is simply an exercise in futility.

Web App Reconnaissance

I think it’s important to note here that web app reconnaissance is a process and not a tool. Before attacking a web app, you need to gather as much information as possible. In some cases, you will get conflicting information, and when that happens, you may need to run another reconnaissance tool or technique.

Here are some of the tools and tutorials that I have already covered that you can use for reconnaissance.

  • Operating System – The underlying operating system of the target can often be determined by using Nmap, Xprobe2, P0f, or Netcraft.

  • Web Server – The underlying web server can often be determined by Netcraft, banner grabbing with Netcat, Httprint, or Shodan.

  • Web Technologies – The underlying technologies can be determined with Netcraft.

  • DNS – You may need to perform DNS recon to find hidden servers.

  • Wikto – This is an excellent tool for finding so much information on the website including finding hidden directories and Google hacking.

  • DirBuster – OWASP’s tool maps nearly every directory in a website and often finds hidden or unknown directories in a website.

  • Maltego – This tool is great for many of the above tasks, as well as social networking relationships.

  • Httrack – This tool enables us to make a copy of the website for online reconnaissance and analysis before exploitation.

Website Spidering

Before a website attack or penetration test, we need to spider the site. Many of the tools we use to attack a site need a map of the website in order to do their work. We could manually spider the site by simply navigating to each page and saving it, but fortunately, we have tools that can save us time and automate this process. The tool we will use here is called WebScarab by OWASP. It’s built into Kali, so no need to download or install anything.

Step 1: Fire Up Kali & Open WebScarab

Let’s begin by firing up Kali and then navigate to Applications -> Kali Linix -> Web Applications -> Web Crawlers -> webscarab.

Step 2: WebScarab

When we click on the webscarab option, it opens with a GUI interface like that below. As you can see, WebScarab has many web reconnaissance features, but here we will focusing on its ability to spider a website. In later tutorials, we will explore some of its other capabilities.

Step 3: Configure Your Browser

Before we begin spidering a website, we need make certain that your browser is configured properly. By default, WebScarab uses a proxy on 127.0.0.1 on 8008. You can change it by clicking on the “Proxy” tab, but for now, let’s keep the default setting and make certain that our browser is using the same setting.

For more information on configuring the proxy setting in IceWeasel, see my tutorial on THC-Hydra and Burp Suite.

Step 4: Point WebScarab at a Website

Now, to see how WebScarab can spider a site, let’s point it at www.wonderhowto.com In the “Allowed Domains” window, simply type in www.wonderhowto.com.

Next, go to your browser, in this case IceWeasel, and navigate to www.wonderhowto.com. When we do so, WebScarab will begin to populate the main window with every web address linked on that page. Note that the webpages are arranged in alphabetical order.

Let’s navigate down a bit to the N’s and find http://null-byte.wonderhowto.com:80. We can click on it and expand all the links within Null Byte.

In this way, we can now see every webpage and link on the target website. In a future tutorial, when we begin the attack phase, we will see how we can actually use this information.