We had a Deep Web specialist on the show today. Bill Wardell talked about the “Deep Web” and how it offers a virtually untapped bounty of information and services most people don’t know exist.
What is the Deep Web?
It’s basically the 90% of the web the the big 2 search engines haven’t spidered or accessed.
Google only has about 10% of the web spidered. Given that they reported indexing around 26 billion pages on the web, the other 90% of what experts call the Deep Web is a pretty big chunk of data.
How is the Deep Web different from the Surface Web?
“To put it very simply, the web is defined as a collection of hyperlinks that are indexed by search engines. In other words, the pages/content that appears when we do a Google search, is the Internet as we know it – called the surface web.
The Dark Web, also known as the deep web, invisible web, and dark net, consists of web pages and data that are beyond the reach of search engines. Some of what makes up the Deep Web consists of abandoned, inactive web pages, but the majority of data that lies within have been crafted to deliberately avoid detection in order to remain anonymous.
Michael K. Bergman – who first coined the phrase “deep web” describes how searching on the Internet today can be compared to dragging a net across the surface of the ocean; a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed.” – Wikipedia
Learn how the Deep Web can help researchers, bloggers, marketers, and other folks looking to tap little known information and resources. Lots of cool Deep Web research links to share with you under the show player today as well.
Deep Web Research Tools and Resources
- Deep Web Research Starting Point
- Deep Web Resources (PDF) – This is the crucial source of resources and tools for making the most of the Deep Web.
- Marcus Zillman’s Blog
- Resource Shelf
- Research Buzz
- Top 100 Lists – from Alt Search Engines
- Exalead – example of specialty search engine
- Specialty Video Search – Blinkx.com
- Check out Bill Wardell at The Cyberhood Watch





{ 21 comments… read them below or add one }
Any page that doesn’t want to get indexed shouldn’t really be considered deep web, it’s not useful anyway. Though I agree that there are LOT of pages that search engines haven’t been able to index because the site owners didn’t use the correct navigation schemes on their sites.
We shouldn’t really be bothered about abandoned sites, parked pages etc. should we?
I never even heard about the ‘Deep Web’. Very interesting piece. Good work.
Joseph Condron@Yellow Magpie´s last blog ..Quotes On Challenging Conventions
Very Interesting article… Never knew there is so much of information in the deep web. http://www.turbo.com is a great deep web search engine.
The “deep web” is a pretty interesting concept to think about. The thing that sticks out to me the most, though, is the fact that Google only has spidered 10% of the internet. I really didn’t realize before just how much content there is out there that hasn’t been discovered by Google. It makes you realize just how huge the internet is.
Alec@Trash Bag Holder´s last blog ..Keeping Things Green With Trash Bags
Web services, APIs and mashups seem to be the way this content is being pulled and rendered. This will deepen considerably after the Internet of Things gets going. Good post, thanks.
dave tribbett´s last blog ..Casting an Information Shadow
Great post. Here is a good article that adds some additional detail to the topic and a good set of links to the deep web search engines and other helpful sites.
dave tribbett´s last blog ..The Deep, Dark Invisible Web
This is great stuff Jack, and something most people don’t even start to think about…I didn’t until reading this.
Thanks for the resources too.
Joseph Ratliff´s last blog ..How Do You Expect To Do Any One Thing Well, When You’re Trying To Do Many Things At The Same Time?
yeah, so much useful information lies buried beyond the reach of search engines like google.com…not surprisingly, pepole anxiously awaits a way to access this information.
Yes, very interesting. Don’t you think that the main reason some websites are included in Google’s index comes down to web programming? For example, sites designed with AJAX cannot really be crawled properly.
Yeah you have to make sure not to count on any of that content being read and including it somewhere in a format that can be. Lots of small-time designers miss this point entirely. But the sites are pretty.
Never knew that web pages that’s not been spidered by the search engines was called Deep Web. I would leave its exploration to the curious folks out there. The owners of these web pages abandoned and hidden the information for a reason.
Cool topic, Jack. Love this clandestine Jack Bauer/NCIS stuff ….
I had never heard that Google has only spidered 10% of the web. Freakin’ amazing if you think about it. Imagine the serp competition if it was the other way around!
Time to go exploring … thanks.
Scott@GBG´s last blog ..GBG Business Success – A Proven Path
Good post, Jack! I’ve never heard of the Deep Web before either. But if those pages are mainly inactive or hard to index anyway, isn’t that a good indication of the so-called “quality” of those pages? Perhaps Google just considers them dead for the visitors, you know. Perhaps they haven’t been updated in years, or something.
Anyway, interesting subject. Have you got more of these coming up?
Be aware that finding that oh-so-precious knowledge is just as hard as looking for a needle in a haystack; you need to know the proper search terms to get results you want.
Jessica@internet parental filter´s last blog ..Porn Blocker – Internet Parental Filter Software
Yeah you have to be a real sleuth and go through a lot of junk sometimes to find the worthwhile stuff.
In some cases, yes. In others its more a matter of “eye of the beholder” and what you’re after that sets the real value of pages not in Google. Some info is evergreen too, and many people have put up great things in the past that they abandoned, but the info is still someone else’s gold mine if they need it for a book or product. The real reason a lot more stuff isn’t in Google is the stuff has no links. Google is a popularity contest and most of the pages on the web are not “popular.” But the info on some of them can be a big resource depending on what you’re looking for.
Might do another on this topic if Bill digs up some new info on the topic. But nothing planned in the near future.
As the major search engines start to experiment with incorporating Deep Web content into their search results, they must figure out how to present different kinds of data without overcomplicating their pages. This poses a particular quandary for Google, which has long resisted the temptation to make significant changes to its tried-and-true search results format.
I never thought that such thing as “Deep Web” exists. Does anyone know where I can find a good article that explains very well about “Deep Web”? I’m just curious about this.
I don’t really see how the deep web can help marketeers. If these pages are off googles radar then they have no significance as they won’t have a page rank so therefore worthless to linking to. Just my opinion.
Mike@web design surrey´s last blog ..Personal Trainer Website
ARTICLE: Exploring the Deep Web, the 90% of the web the big search engines haven’t spidered or accessed .. http://digg.com/u1Mp63 .. RT THIS
This comment was originally posted on Twitter
Exploring the “Deep Web”…. interesting…hmmm…. http://bit.ly/cMLxNH http://url4.eu/1P8ks
This comment was originally posted on Twitter