Unleash the Power of Shodan Dorking: A Comprehensive Introduction

Introduction about Shodan

Shodan is a search engine for Internet-connected devices it was created by John C. Matherly (@achillean) in 2009.. Web search engines, such as Google and Bing, are great for finding websites. Shodan helps you to find information about desktops, servers, IoT devices, and more. This information includes metadata such as the software running on each device. But what if you’re interested in finding computers running a certain piece of software (such as Apache)? Or if you want to know which version of Microsoft IIS is the most popular? Or you want to see how many anonymous FTP servers there are? Maybe a new vulnerability came out and you want to see how many hosts it could infect? Traditional web search engines don’t let you answer those questions.

Common uses of Shodan include Network Security, Market Research, Cyber Risk, scanning IoT devices, and Tracking Ransomware. This guide will focus on comprehensively covering these applications in a pentesting context.

Shodan interfaces

This section will show you the various ways you can connect to Shodan. It’s possible to interact with Shodan via the well known website, the official python command-line interface tool and library, a variety of community driven libraries for many languages and also the official REST API.

Shodan GUI mode:

Here, in the shodan GUI you can do all the things that can be done in the CLI mode with a more user friendly way.

All about data

Before starting the practical I would like to explain you the basics. Before going to understand the shodan response I would like toshow you the basic concept of HTTP response.

The first line of every HTTP response consists of three items, separated by spaces:

• The HTTP version being used.
• A numeric status code indicating the result of the request. 200 is the most common status code; it means that the request was successful and that the requested resource is being returned.

• A textual “reason phrase” further describing the status of the response. This can have any value and is not used for any purpose by current browsers.

Here are some other points of interest in the response:

• The Server header contains a banner indicating the web server software being used, and sometimes other details such as installed modules and the server operating system. The    information contained may or may not be accurate.

• The Set-Cookie header issues the browser a further cookie; this is submitted back in the Cookie header of subsequent requests to this server.

• The Pragma header instructs the browser not to store the response in its cache. The Expires header indicates that the response content expired in the past and therefore should not be   cached. These instructions are frequently issued when dynamic content is being returned to ensure that browsers obtain a fresh version of this content on subsequent occasions.
• Almost all HTTP responses contain a message body following the blank line after the headers. The Content-Type header indicates that the body of this message contains an HTML document.

• The Content-Length header indicates the length of the message body in bytes.

Banner

The basic unit of data that Shodan gathers is the banner. The banner is textual information that describes a service on a device. For web servers this would be the headers that are returned or for Telnet it would be the login screen.

The content of the banner varies greatly depending on the type of service. For example, here is a typical HTTP banner:

HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Sat, 03 Oct 2015 06:09:24 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 6466
Connection: keep-alive

The above banner shows that the device is running the nginx web server software with a version of 1.1.19. To show how different the banners can look like, here is a banner for the Siemens S7 industrial control system protocol:

Copyright: Original Siemens Equipment
PLC name: S7_Turbine
Module type: CPU 313C
Unknown (129): Boot Loader A
Module: 6ES7 313-5BG04-0AB0 v.0.3
Basic Firmware: v.3.3.8
Module name: CPU 313C
Serial number of module: S Q-D9U083642013
Plant identification:
Basic Hardware: 6ES7 313-5BG04-0AB0 v.0.3

The Siemens S7 protocol returns a completely different banner, this time providing information about the firmware, its serial number and a lot of detailed data to describe the device.
You have to decide what type of service you’re interested in when searching in Shodan because the banners vary greatly.

Note: Shodan lets you search for banners – not hosts. This means that if a single IP exposes many services they would be represented as separate results.

Device Metadata

In addition to the banner, Shodan also grabs meta-data about the device such as its geographic location, hostname, operating system and more . Most of the meta-data is searchable via the main Shodan website, however a few fields are only available to users of the developer API.

IPv6

In As of October 2015, Shodan gathers millions of banners per month for devices accessible on IPv6. Those numbers still pale in comparison to the hundreds of millions of banners gathered for IPv4 but it is expected to grow over the coming years.

Data Collection

Frequency

The Shodan crawlers work 24/7 and update the database in real-time. At any moment you query the Shodan website you’re getting the latest picture of the Internet.

Presence Of Shodan crawlers

Crawlers are present in countries around the world, including:

• USA (East and West Coast)
• China
• Iceland
• France
• Taiwan
• Vietnam
• Romania
• Czech Republic

Data is collected from around the world to prevent geographic bias. For example, many system administrators in the USA block entire Chinese IP ranges. Distributing Shodan crawlers around the world ensures that any sort of country-wide blocking won’t affect data gathering.

Randomized

The basic algorithm for the crawlers is:

1. Generate a random IPv4 address
2. Generate a random port to test from the list of ports that Shodan understands
3. Check the random IPv4 address on the random port and grab a banner
4. Go to 1

This means that the crawlers don’t scan incremental network ranges. The crawling is performed completely random to ensure a uniform coverage of the Internet and prevent bias in the data at any given time.

SSL In Depth

SSL is becoming an evermore important aspect of serving and consuming content on the Internet, so it’s only fit that Shodan extends the information that it gathers for every SSL-capable service. The banners for SSL services, such as HTTPS, include not just the SSL certificate but also much more. All the collected SSL information discussed below is stored in the ssl property on the banner.

Vulnerability Testing

Heartbleed

If the service is vulnerable to Heartbleed then the banner contains 2 additional properties. opts.heartbleed contains the raw response from running the Heartbleed test against the service. Note that for the test the crawlers only grab a small overflow to confirm the service is affected by Heartbleed but it doesn’t grab enough data to leak private keys. The crawlers also added CVE-2014-0160 to the opts.vulns list if the device is vulnerable. However, if the device is not vulnerable then it adds “!CVE-2014-0160”. If an entry in opts.vulns is prefixed with a ! or – then the service is not vulnerable to the given CVE.

{
    "opts": {
    "heartbleed": "... 174.142.92.126:8443 - VULNERABLEn",
    "vulns": ["CVE-2014-0160"]
    }
}

Shodan also supports searching by the vulnerability information. For example, to search Shodan for devices in the USA that are affected by Heartbleed use:

country:US vuln:CVE-2014-0160

FREAK

If the service supports EXPORT ciphers then the crawlers add the “CVE-2015-0204” item to the opts.vulns property: 

"opts": {
    "vulns": ["CVE-2015-0204"]
}

Logjam

The crawlers try to connect to the SSL service using ephemeral Diffie-Hellman ciphers and if the connection succeeds the following information is stored: 

"dhparams": {
    "prime": "bbbc2dcad84674907c43fcf580e9...",
    "public_key": "49858e1f32aefe4af39b28f51c...",
    "bits": 1024,
    "generator": 2,
    "fingerprint": "nginx/Hardcoded 1024-bit prime"
}

Version

Normally, when a browser connects to an SSL service it will negotiate the SSL version and cipher that should be used with the server. They will then agree on a certain SSL version, such as TLSv1.2, and then use that for the communication.

Shodan crawlers start out the SSL testing by doing a normal request as outlined above where they negotiate with the server. However, afterwards they also explicitly try connecting to the server using a specific SSL version. In other words, the crawlers attempt to connect to the server using SSLv2, SSLV3, TLSv1.0, TLSv1.1 and TLSv1.2 explicitly to determine all the versions that the SSL service supports. The gathered information is made available in the ssl.versions field:

{
"ssl": {
    "versions": ["TLSv1", "SSLv3", "-SSLv2", "-TLSv1.1", "-TLSv1.2"]
    }
}

If the version has a – (dash) in front of the version, then the device does not support that SSL version. If the version doesn’t begin with a -, then the service supports the given SSL version. For example, the above server supports:

TLSv1
SSLv3

And it denies versions:

SSLv2
TLSv1.1
TLSv1.2

The version information can also be searched over the website/ API. For example, the following search query would return all SSL services (HTTPS, POP3 with SSL, etc.) that allow connections using SSLv2:

ssl.version:sslv2

Follow the Chain

The certificate chain is the list of SSL certificates from the root to the end-user. The banner for SSL services includes a ssl.chain property that includes all of the SSL certificates of the chain in PEM-serialized certificates.

Beyond the Basics

For most services the crawlers attempt to analyze the main banner text and parse out any useful information. A few examples are the grabbing of collection names for MongoDB, taking screenshots from remote desktop services (RDP) and storing a list of peers for Bitcoin. There are 2 advanced data analysis techniques Shodan uses that I’d like to highlight:

shodan

Web Components

The crawlers try to determine the web technologies that were used to create a website. For the http and https modules the headers and HTML are analyzed to breakdown the components of the website. The resulting information is stored in the http.components property. The property is a dictionary of technologies, where the key is the name of the technology (ex. jQuery) and the value is another dictionary with a property of categories. The categories property is a list of categories that are associated with the technology. For example

"http": {
...
    "components": {
        "jQuery": {
            "categories": ["javascript-frameworks"]
        },
        "Drupal": {
            "categories": ["cms"]
        },
    "PHP": {
        "categories": ["programming-languages"]
        }
    },
...
},

The http.components property indicates that the website is running the Drupal content management system, which itself uses jQuery and PHP. The Shodan REST API makes the information searchable via the filter http.component and 2 facets (http.component and http.component_category). To get a full list of all the possible component/ category values please use the new facets. For example, to get a full list of all the possible categories use the following shodan command:

$ shodan stats --facets http.component_category:1000 http
Top 47 Results for Facet: http.component_category
javascript-frameworks 8,982,996
web-frameworks 1,708,503
programming-languages 1,409,763
font-scripts 1,280,397

Cascading

If a banner returns information about peers or otherwise has information about another IP address that runs a service then the crawlers try to perform a banner grab on that IP/ service. For example: the default port for the mainline DHT (used by Bittorrent) is 6881. The banner for such a DHT node looks as follows:

DHT Nodes
97.94.250.250        58431
150.77.37.22            34149
113.181.97.227         63579
252.246.184.180     36408
83.145.107.53           52158
77.232.167.126         52716
25.89.240.146          27179
147.23.120.228        50074
85.58.200.213         27422
180.214.174.82        36937
241.241.187.233      60339
166.219.60.135        3297
149.56.67.21            13735
107.55.196.179        8748

Previously, a crawler would grab the above banner and then move on. With cascading enabled for the DHT banner grabber the crawler now launches new banner grabbing requests for all of the peers. In the above example, the crawler would launch a scan for IP 54.70.96.157 on port 61770 using the dht banner grabber, IP 85.82.92.188 on port 42155 and so on. I.e. a single scan for an IP can cause a cascade of scans if the initial scan data contains information about other potential hosts.
To keep track of the relationship between the initial scan request and any child/ cascading requests we’ve introduced 2 new properties:

_shodan.id: A unique ID for the banner. This property is guaranteed to exist if a cascading request could get launched from the service, though it doesn’t necessarily mean that any cascading requests succeeded.

_shodan.options.referrer: Provides the unique ID of the banner that triggered the creation of the current banner. I.e. the referrer is the parent of the current banner.

Scroll to Top