Over the past year I’ve been asked by investors and startups on what to do about the dark web. Crime is rampant. The increasing variety of dark web technologies is confusing. And there is much hyperbole about the size, secrecy, and strength of the technologies. I’m helping advising businesses-some small, some large-on how to tackle the various dark webs.
The same basic questions are asked again and again. Answers appear elusive so far, but there are some teams making progress. The basic business model is a crawler and alerting system for targeted content appearing on various dark web sites or locations. The core technology is indexing the dark web. Once you have an index, you then sell access to it for keyword searches as defined by the customers. When a keyword has results, you send an alert to the customer. An analog could be “Google Alerts for the dark web”. Pretty straightforward so far. The challenge is in the technology, the vastness of the address spaces, and what’s happening in these dark spaces today.
The first questions asked are:
- How do you define the dark net or dark web? Do you mean un-lit fiber? unassigned IPv4/IPv6 addresses? overlay networks like I2P or Tor? Known blackhat hacker hangouts? Distributed networks like ZeroNet or Tribler?
How many darknet sites have you crawled? What scope of the darknet do you think you have in a content index? 25%? 50%? 77.4%? How do you know?
Have you crawled behind authentication walls? How do you get into “pay to play” forums/sites? Meaning those which require new content to get access.
Do you have archives of sites/content for searching in the past? The darknet is fluid, so as data dumps occur, they tend not to stay online for long. Do you have legal authority to handle certain content? To store it?
How many languages do you crawl? Do you translate them or just serve them up raw as indexed content? Right to left? Left to right? How do you handle accompanying images?
Are you able to comply with evidence requirements? Snapshots of the sites? Has your data/content been through a court case? Has your methodology been through a court case?
Do you analyze the data for connections? Correlate edges where surface web cross into dark web and vice versa?
How do you discover the sites for crawling? Are you dependent on exploits in current code for discovery? How fast have you adapted to bug fixes or technology changes?
The more successful teams are able to answer these questions, or at least have answers for their planned approach. This is vastly more than just “google search for the dark web”. As mentioned in prior posts, the technologies operate unlike an IP address (whether v4 or v6) and the address spaces are huge and algorithmic generated. Learning what information you can glean from the interconnections between addresses and where they map to other address spaces can help make the difference between success and failure.
The other side of this coin are the businesses looking to host or really embrace the same technologies for good. How they learn, utilize, and leverage the technology to give them an edge over others fascinates me. I’ll put some of these thoughts into another post in the future.
I hope these questions help you as you build your business plan or pivot your current operations. I look forward to helping more dark web focused startups succeed and grow the market for their services. It’s an exciting time for all.
This post was originally written for LinkedIn Pulse.