Google Dorks For OSINT
There are entire books dedicated to Google searching and Google hacking. Most of these focus on penetration testing and securing computer networks. These are full of great information, but are often overkill for the investigator looking for quick personal information. A few simple rules can help locate more accurate data.
Search Operators
Most search engines allow the use of commands within the search field. These commands are not actually part of the search terms and are referred to as operators. There are two parts to most operator searches, and each are separated by a colon. To the left of the colon is the type of operator, such as "site" (website) or "ext" (file extension). To the right is the rule for the operator, such as the target domain or file type. This post will explain each operator and the most appropriate uses.
The Site Operator
The site operator asks Google to search within one website or domain. This operator provides two benefits to the search results. First, it will only provide results of pages located on a specific domain. Second, it will provide all of the results containing the search terms on that domain. If you want to view every page on a specific domain that includes your target of interest, the site operator is required.
To see how many pages Google has indexed for a page, enter the following query
textsite:eforensicsmag.com
But how many of these are blog posts? Let us find out
textsite:eforensicsmag.com/blog
Note: Google only gives a rough approximation when using this operator. For the full picture, check Google Search Console.
Next, I conducted the following exact search.
textsite:eforensicsmag.com "Joseph Moronwi"
The result was all eleven pages on eforensicsmag.com
that include my name within the content. This technique can be applied to any domain. This includes social networks, blogs, and any other website that is indexed by search engines.
To view the subdomains of the target website, enter the following query
textsite:*.google.com -www
To find unsecure pages of a target domain, enter the following query
textsite:google.com -inurl:https
Using the same operator, you can restrict your search within one domain type. An example usage is given below.
textcomputer forensics site:gov
This searches for the term computer forensics in all websites with the .gov domain. Jung Kim showed a nice dork to find people within GitHub:
textsite:github.com/orgs/*/people
Simply replace the asterik (*) with the organisation's name to target people from a specific organization
The Filetype Operator
Another operator that works with both Google and Bing is the filetype
filter. It allows you to filter any search results by a single file type extension. While Google allows this operator to be shortened to "ext", Bing does not. When using the filetype suffix with your search terms, Google will restrict the results to web pages that end with this extension.
Consider the following search attempting to locate PDF files associated with the terror group ISIS.
text"ISIS" filetype:pdf
There are many uses for this technique. A search of filetype:doc "resume" "target name"
often provides resumes created by the target which can include cellular telephone numbers, personal addresses, work history, education information, references, and other personal information that would never be intentionally posted to the internet. The "filetype" operator can identify any file by the file type within any website. This can be combined with the "site" operator to find all files of any type on a single domain. By conducting the following searches, I was able to find several documents stored on the website cnn.com
textsite:cnn.com filetype:pdf site:cnn.com filetype:pptx site:cnn.com filetype:doc
Previously, Google and Bing indexed media files by type, such as MP3, MP4, AVI, and others. Due to abuse of pirated content, this no longer works well. The following extensions have been found to be indexed and provide valuable results.
text7Z: Compressed File BMP: Bitmap Image DOC: Microsoft Word DOCX: Microsoft Word DWF: Autodesk GIF: Animated Image HTM: Web Page HTML: Web Page JPG: Image Hyphen (-) JPEG: Image KML: Google Earth KMZ: Google Earth ODP: OpenOffice Presentation ODS: OpenOffice Spreadsheet ODT: OpenOffice Text PDF: Adobe Acrobat PNG: Image PPT: Microsoft PowerPoint PPTX: Microsoft PowerPoint RAR: Compressed File RTF: Rich Text Format TXT: Text File XLS: Microsoft Excel XLSX: Microsoft Excel ZIP: Compressed File
The Exclusion operator (-)
You may want to exclude some content from appearing within results. The hyphen (-) tells most search engines and social networks to exclude the text immediately following from any results. It is important to never include a space between the hyphen and filtered text. The following query shows all links to my blog excluding the internal links.
textsite:* digitalinvestigator.blogspot.com -site:digitalinvestigator.blogspot.com
My goal in search filters is to dwindle the total results to a manageable amount. When you are overwhelmed with search results, slowly add exclusions to make an impact on the amount of data to analyze.
The InURL Operator
Previously, the operators discussed applied to the content within the web page. This search operator, however, will search for a specific word or phrase inside the URL of a web page. Using suitable keywords for the title in the URL, rather than getting a lot of irrelevant data, the inurl search operator is very useful and helpful. My favourite search using this technique is to find File Transfer Protocol (FTP) servers that allow anonymous connections.
The following search would identify any FTP servers that possess PDF files that contain the term terror within the file
textinurl:ftp -inurl(http|https) filetype:pdf "terror"
Obviously, this operator could also be used to locate standard web pages, documents, and files.
In an investigation, you might want to check if your target left a resume online. Simply enter the query below
textinurl:curriculum vitae "Julian Assange"
You can add all
to this search to force all listed words to appear in any order. For example, enter
textallinurl: OSINT intelligence
and Google will return pages with the terms OSINT intelligence in their URLs
The InTitle Operator
This operator will filter web pages by details other than the actual content of the page. This filter will only present web pages that have specific content within the title of the page. Practically every web page on the internet has an official title for the page. This is often included within the source code of the page and may not appear anywhere within the content. Most webmasters carefully create a title that will be best indexed by search engines.
If you conduct a search for "business email compromise" on Google, you will receive 552,000 results. However, the following search will filter those to 6,150. These only include web pages that had the search terms within the limited space of a page title.
textintitle:"business email compromise"
You can add all
to this search to force all listed words to appear in any order. The following would find any sites that have the words business, email, and compromise within the title, regardless of the order
textallintitle:"business email compromise"
An interesting way to use this search technique is while searching for online folders. We often focus on finding websites or files of interest, but we tend to ignore the presence of online folders full of content related to our search. As an example, I conducted the following search on Google.
textintitle:index.of OSINT>
The results contain online folders that usually do not have typical website files within the folders. Each possess dozens of documents and other files related to our search term of OSINT. Some provides a folder structure that allows access to an entire web server of content. Notice that none of these results points to a specific page, but all open a folder view of the data present.
The intext operator
intext:term
restricts results to documents containing term
in the text. This is a very helpful Google dorks search operator. By using intext, you can search and get a glimpse of the material of a web page without having to open it. Generally, we use the shortcut key, that is, CTRL + F to search the term which we are looking for. But by using intext, we will get the results only with the term which we used in the intext search.For example - We are going to search for the web series Tom Clancy's Jack Ryan. I just want to search and gather more information about the series, characters, etc. The appropriate query is given below
intext:"jack ryan"
This will display all the results that have Jack Ryan in the content of the web page. In an investigation, you might want to check if your target left a resume online. Simply enter the query below
textintext:curriculum vitae "Julian Assange"
You can add all
to this search to force all listed words to appear in any order. For example, enter
allintext:TOR Dark markets
and Google will only return the pages that have the three terms TOR and Dark and markets within its text.
The OR operator
You may have search terms that are not definitive. You may have a target that has a unique last name that is often misspelled. The OR operator in capital letters only—also written as a vertical bar (|)— returns pages that have just A, just B, or both A and B. For example, entering
textDFIR OR OSINT
or entering
textDFIR|OSINT
will retrieve pages that contain either the term DFIR or the term OSINT.
The Wildcard operator
The asterisk (*) represents one or more words to Google and is considered a wild card. Google treats the * as a placeholder for a word or words within a search string. For example, "DFIR * training" tells Google to find pages containing a phrase that starts with "DFIR" followed by one or more words, followed by "training".
Let’s say that you are looking into a person-of-interest and the only information you have about this individual is a username: JoseffMoro. While a search for JoseffMoro might return other places that the username shows up online, by using the Wildcard Operator and searching for
textJoseffMoro*com
we can instead see if any email addresses or other personal details appear publicly online that use the username as the unique identifier.
While this will not always return significantly different results to searching the username itself, it can be used as a quick way to identify an email address that can later be tied to other accounts.
The Range Operator
The "Range Operator" tells Google to search between two identifiers. These could be sequential numbers or years. As an example,
textOSINT Training 2015 .. 2018
would result in pages that include the terms OSINT and training, and also include any number between 2015 and 2018. I have used this to filter results for online news articles that include a commenting system where readers can express their views. The following search identifies websites that contain information about Deborah Samuel, a Nigerian female Christian student lynched to death by her male Muslim colleagues on accusation of blasphemy against Islam in May 2022, and between 1 and 999 comments within the page.
text"Deborah Samuel" "1...999 comment"
The Related Operator
To search for similar web pages, use the related operator. It collects a domain, and attempts to provide online content related to that address. As an example, I conducted a search on Google with the following syntax.
textrelated:google.com
The results included no references to that domain but did associate it with other search engines.
The Cache operator
The cache operator enables users to return the most recently cached version of a webpage when the web page has been indexed. Investigators can use the cache operator to locate previous versions of edited or deleted web pages to locate removed intelligence.
textcache:eforensicsmag.com
The Map operator
The map operator enables users to force Google to show map results for a locational search. The results show only location-specific data and do not include recent news stories. Investigators can use the map operator to focus on geospatial relevant intelligence.
textmap:Johannesburg
It is important to note here that these operators can be combined in ways that the OSINT investigator deems fit. Some example usage are considered as follows.
To find guest blogging opportunities, you can combine operators as follows
textdigital forensics intitle:"write for us" inurl:"write-for-us"
This uncovers so-called “write for us” pages in the digital forensics niche
If you know of a serial guest blogger in your niche, try this:
textKronos Banking Trojan intext:"Marcus Hutchins" inurl:"author" -site:evilsite.com
Got someone in mind that you want to reach out to on social media? Try this trick to find their contact details:
textBrett Shavers dfir training (site:twitter.com | site:facebook.com | site:linkedin.com)
To find Q+A threads related to your target term, enter the following query
textSatoshi Nakamoto site:quora.com intitle:(TOR | "Bitcoin" | "cryptocurrency market")
By focusing on the LinkedIn site, you can look for people with a certain job title and a certain location. There is a trick that can prove useful, which is that you can search for icons or Unicode characters:
textsite:linkedin.com/in “<job title>” (☎ OR ☏ OR ✆ OR 📱) +”<location>”
It is also possible to search for a specific target name
text“<name>” (☎ OR ☏ OR ✆ OR 📱)
You can search for copies of databases via Google too. To find some of them, simply search for:
textext:sql intext:"-- phpMyAdmin SQL Dump"
To search for excel files in a target organisation that have the word contact in their URL, you can enter the following query (Replace google.com with your target domain).
textfiletype:xls site:google.com inurl:contact
This yield web pages that have contact list from the target organisation.
The power of Google Dorks or search operators in investigations is in combining them. The reader is encouraged to explore various possible combinations of search operators to achieve the desired results.
Comentários
Postar um comentário