Open-source intelligence doesn’t just refer to the accessibility of information. OSINT is the practice of collecting information from publicly available sources that is widely used by analysts in nearly every type of organization — from government and law enforcement to financial crime analysts, fraud and brand misuse investigations, and particularly cybersecurity.
Cybersecurity teams frequently use OSINT for OPSEC (operational security) to try and understand what type of information about their company is out there and how sensitive this information is (like which assets the company controls and where they operate). They also look for accidental leaks through social media posts and third-party partner sites which might reveal data that was not designed to be shared outside the organization.
OSINT would seem to pertain only to research on the surface web, that is, the internet that we use for everyday activities — like browsing, searching, online shopping and social media. But it can also be a useful tool to find out what’s being shared on the deep and dark web. Just because a site requires a login or a special browser to access doesn’t mean it’s a closed source.
To build OSINT, researchers need some foundational understanding of how to access certain resources, the risks associated with them and tradecraft to mitigate those risks.
OSINT on the dark web
Access to the dark web requires specific software, like Tor (The Onion Router). Once inside, there’s lots of information that can be beneficial to threat intelligence gathering and other investigations.
For an analyst using the dark web for OSINT, there are a few things that are important to remember:
- Paying for hacked or stolen items can qualify as OSINT, but there are lots of practical, ethical and legal considerations, which depend on the company policy, industry regulations and other factors (the DOJ CCIPS has good guidance here)
- Any website could introduce malicious code to the researcher’s computer, and this is especially true on the dark web, where site owners often set boobytraps to track potential adversaries
- While there’s some anonymity to using the dark web, site owners can still see lots of details about an investigator’s identity — so it’s still important for analysts to control their digital fingerprints and avoid any attribution to their organization
Using OSINT for threat intelligence gathering
Beyond OPSEC research, OSINT is often used to gather threat intelligence to proactively reduce cyber risks. OSINT can help analyze, monitor and track cyberthreats from targeted or indiscriminate attacks against an organization. The following events will typically trigger an investigation where OSINT can play an important role:
- A flag or item of interest identified from a threat intelligence platform (TIP) or subscription service
- A new threat, vulnerability or data breach is identified from an OSINT news source
- A threat hunter identifying a potential advanced persistent threat (APT) within the network
When an issue is reported by a TIP, it often requires enrichment to understand how significant and urgent it is. For example, a TIP may flag that a some of the company’s email addresses and passwords appeared in a breach package or on a forum on a dark web. An analyst will want to go and see the full breach package and gather as much information as possible to understand how the breach occurred, who it targets, and who might be the potential perpetrators.
When a threat hunter identifies an anomaly on the internal network, they first try to understand if it’s malicious. This often requires research into current attacker tactics, techniques, procedures (TTPs), and collecting information in spaces where attackers reside, like the forums.
If a new threat or exposed vulnerability is reported by a news organization or a cybersecurity research group, the analyst’s job is to confirm the report using all available sources on the surface, deep and dark web.
Important OSINT tools and techniques
When searching for information on the surface web, the websites themselves hold several clues about who might be behind the content. (On the dark web, you won’t be so lucky as site operators and owners are anonymous.) These services provide user-friendly protocols for retrieving identifying information from the databases that house domain data. Below are some resources that OSINT researchers have been successfully using to gather information on people, organizations and websites.
Identifying site owners through WHOIS
WHOIS records provide top-level domain (the .com or .org root of the URL) information. This includes addresses, names and phone numbers used to register the domain, the date of registration and details about where it is hosted. By combining WHOIS query and response protocols with additional search tools, investigators can uncover more information.
IP address analysis with URLscan.io and DomainIQ
URLscan.io is a service that provides the end user with analysis of the IP address information and HTTP connections made during the site’s retrieval. WHOIS analysis can reveal how many subdomains the site contains, what sites it’s linked to, and what country it’s hosted in. This can help investigators find servers that host multiple sites or share webmasters, as well as uncover valuable owner information. DomainIQ operates similarly to URLscan.io and can provide identifying details about the site owner, host and what other pages they may be operating.
Utilizing advanced search engine techniques
By using advanced search engine techniques, analysis can narrow down search results to find relevant information quicker and with greater precision.
Carbon Date uses the advanced search engine technique of “carbon dating” that analyzes a website and gives its earliest known creation date. Researchers can also view previous versions of the page, including the first known scrape through archive.org.
“Google Dorking” is the process of using advanced search parameters on Google. There are several techniques that can be used – ranging from simple to more advanced. Some of the most common Boolean Operators are using quotes to search for exact phrasing or the dash symbol (-) to exclude specific words.
Analysts can also use Google to search specific file types or recent caches of a specific site.
These techniques can help find identifying information about moderators or search a site for identifying pieces. It can also be used to string together sites sharing specific information.
Common Google Dorking techniques include:
- Intitle: identifies any mention of search text in the web page title
- Allintitle: only identifies pages with all of the search text in the web page title
- Inurl: identifies any mention of search text in the web page URL
- Intext: only identifies pages with all of the search text in the web page URL
- Filetype: limits results to only the specified file type
- Cache: shows the most recent cache of a site specified
- Around (X): searches for two different words within X words of one another
Read more: Advanced Google search tools and techniques
Popular OSINT research tools
Of all the OSINT tools, the following are among the top, go-to solutions for threat intelligence gathering:
Find free OSINT resources with OSINT Framework
OSINT Framework indexes multiple connections to different URLs, recommending where to look next when conducting an investigation. It also provides suggestions on what services can help analysts find specific data that might aid in their research. When an analyst plugs a piece of data (such as an email address, phone number, name, etc.) into the framework, it returns all known online sources that contain information relevant to that data.
Mine, merge and map information using Maltego Transform Hub
Maltego Transform Hub help integrate data from public sources, commercial vendors and internal sources. All data comes pre-packaged as Transforms, ready to be used in investigations. Maltego takes one artifact and finds more. A user feeds Maltego domain names, IP addresses, domain records, URLs or emails. The service finds connections and relationships within the data and allows users to create graphs in an intuitive point- and-click logic.
Shodan: the search engine for the IoT
Websites are just one part of the internet. Shodan allows analysts to discover which of their devices are connected to the internet, where they are located and who is using them. Shodan helps researchers monitor all devices within their network that are directly accessible from the internet and therefore vulnerable to attacks.
ThreatMiner: IOC lookup and contextualization
ThreatMiner is a threat intelligence portal designed to enable analysts to research indicators of compromise (IOCs) under a single interface. That interface allows for not only looking up IOCs but also providing the analyst with contextual information. With this context, the IOC is not just a data point but a useful piece of information and potentially intelligence.
Torch search engine: explore the darknet
Torch, or TorSearch, is a search engine designed to explore the hidden parts of the internet. Torch claims to have over a billion darknet pages indexed and allows users to browse the dark web uncensored and untracked. Torch promises peace of mind to researchers who venture into the dark web to explore .onion sites. It also doesn’t censor results — so investigators can find all types of information and join discussion forums to find out more about current malware, stolen data for sale or groups who might be planning a cyberattack.
Go deeper into the dark web with Dark.fail
Dark.fail has been crowned the new hidden wiki. It indexes every major darknet site and keeps track of all domains linked to a particular hidden service. Tor admins rely on Dark.fail to disseminate links in the wake of takedowns of sites like DeepDotWeb. Researchers can use Dark.fail when exploring sites that correlate with the hidden service.
Safely access, analyze and store information
Whether researching on the surface, deep or dark web, it’s important to protect yourself and your organization from threats and attacks while streamlining your workflow. Having a purpose-built research platform is key to decreasing your time-to-insight, protecting against targeted or incidental attacks and preventing web activity from being attributed to your organization.
Cloud-based web isolation and managed attribution platforms, like Silo for Research, can get analysts up and running in a secure, anonymous browsing environment in one click, with built-in access to collection and analysis tools, SaaS apps and workflows in a single interface. Silo for Research also gives analyts access to a global managed research network to access sites as an in-region visitor, avoiding blocking and further protecting attribution details. Silo Storage also provides an off-network location to store and collaborate on collected data.
This article was written by Jeff Phillips, Director of Product Marketing Authentic8. To learn more about our secure online research solution, visit our website. Or tune into NeedleStack, the podcast for online researchers. Practical OSINT tips from the field features OSINT practitioners from both government and private sectors sharing hands-on tips from the field.