DARPA’S Memex Project Shines Light on the Dark Web

Tuesday, March 03, 2015

Robert Vamosi


To better combat the increasing use of the Dark Web for illegal purposes, DARPA, the U.S. military’s Defense Advanced Research Projects Agency, is building a search engine known as Memex for law enforcement use.

Google and Yahoo only index five percent of the Internet, the “Surface Web.” The remaining “Deep Web” is unstructured data from sensors and other devices, temporary pages, or hidden behind password protection that makes it hard for conventional search engines to index.

A smaller portion of the missing 95 percent is the “Dark Web,” sites only accessible through specialized browsers and networks such The Onion Router (Tor) and increasingly being used for sex, drugs, and other illegal activities.

According to Scientific American, Memex currently includes eight open-source, browser-based search, analysis and data-visualization programs as well as back-end server software that perform complex computations and data analysis.

“We’re envisioning a new paradigm for search that would tailor indexed content, search results and interface tools to individual users and specific subject areas, and not the other way around,” said Chris White, DARPA program manager, in a press release.

“By inventing better methods for interacting with and sharing information, we want to improve search for everybody and individualize access to information. Ease of use for non-programmers is essential.”

The resulting Dark Web crawler has been mapping the Tor-accessible and peer-to-peer only sections of the larger Internet. The resulting size has surprised many.

The Dark Web had been assumed to be small, only about a thousand pages, yet Memex has already found between 30,000 and 40,000 pages, with estimates of around 70,000 pages in total size predicted.

“Just finding these pages and seeing what’s on them is a new aspect of search technology,” White told Scientific American.

The goal is to one day connect Memex to regular browser-based software such as Firefox or Chrome that law enforcement agencies and the general public would typically use. This next step would allow law enforcement to access the software from any Internet-connected device, including mobile devices.

As reported on a recent 60 Minutes report, Memex is currently being beta tested by law enforcement to identify sex rings.

Why focus on sex crimes? According to the United Nations Office on Drugs and Crime there are roughly 2.5 million human trafficking victims worldwide. Therefore tracking and prosecuting the purveyors is a top law enforcement priority.

Additionally profits from such activities have been used to fund actions against our national security, White told 60 Minutes.

A typical sex ring investigation begins with scant few pieces of information, such as a single e-mail address. In a demonstration, White plugged an example address into Google and received a page of links from the portion of the Surface Web, the part of Internet that Google crawls.

By clicking through each of the search results, an investigator might find an additional piece of information, say, a phone number associated with the single e-mail address.

The idea behind Memex comes from a 1945 article by Vannevar Bush, director of the U.S. Office of Scientific Research and Development (OSRD) during World War II, according to The Atlantic Monthly.

The original Memex was proposed as an analog computer designed to supplement human memory and automatically cross-reference all of the user’s books, records and other information.

The modern Memex analyzes and graphically represents all known sites (including those within the Dark Web) related to the initial email search query, saving investigators valuable time and effort.

Scientific American cited one detective in Modesto, CA, who used a companion piece of software from Carnegie Mellon University called Traffic Jam to follow up on a tip about one particular victim from Nebraska.

That investigator was able to identify a sex trafficker traveling with prostitutes across the Midwest and West.

Researchers at Carnegie Mellon are also studying ways to apply computer vision to Memex searches. This will allow law enforcement to identify images with similar elements—such as furniture from the same hotel room—even if the images themselves are not strictly identical.

This was cross-posted from the Dark Matters blog. 

General Impersonation Phishing Phreaking Breaches CVE DB Vulns US-CERT
Post Rating I Like this!
The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.