Exploring the Deep Web

by Jack Humphrey on Feb 5

We had a Deep Web specialist on the show today.  Bill Wardell talked about the “Deep Web” and how it offers a virtually untapped bounty of information and services most people don’t know exist.

What is the Deep Web?

It’s basically the 90% of the web the the big 2 search engines haven’t spidered or accessed.

Google only has about 10% of the web spidered.  Given that they reported indexing around 26 billion pages on the web, the other 90% of what experts call the Deep Web is a pretty big chunk of data.

How is the Deep Web different from the Surface Web?

“To put it very simply, the web is defined as a collection of hyperlinks that are indexed by search engines. In other words, the pages/content that appears when we do a Google search, is the Internet as we know it – called the surface web.

The Dark Web, also known as the deep web, invisible web, and dark net, consists of web pages and data that are beyond the reach of search engines. Some of what makes up the Deep Web consists of abandoned, inactive web pages, but the majority of data that lies within have been crafted to deliberately avoid detection in order to remain anonymous.

Michael K. Bergman – who first coined the phrase “deep web” describes how searching on the Internet today can be compared to dragging a net across the surface of the ocean; a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed.” – Wikipedia

Learn how the Deep Web can help researchers, bloggers, marketers, and other folks looking to tap little known information and resources.  Lots of cool Deep Web research links to share with you under the show player today as well.

Deep Web Research Tools and Resources

Google Buzz

4 Tweets

{ 21 comments… read them below or add one }

African Sunrise Feb 9 at 8:08 pm

Any page that doesn’t want to get indexed shouldn’t really be considered deep web, it’s not useful anyway. Though I agree that there are LOT of pages that search engines haven’t been able to index because the site owners didn’t use the correct navigation schemes on their sites.

We shouldn’t really be bothered about abandoned sites, parked pages etc. should we?

Joseph Condron from Yellow Magpie Feb 17 at 6:52 am

I never even heard about the ‘Deep Web’. Very interesting piece. Good work.
Joseph Condron@Yellow Magpie´s last blog ..Quotes On Challenging Conventions My ComLuv Profile

athandz Mar 9 at 11:31 pm

Very Interesting article… Never knew there is so much of information in the deep web. http://www.turbo.com is a great deep web search engine.

Alec from Trash Bag Holder Mar 10 at 2:25 pm

The “deep web” is a pretty interesting concept to think about. The thing that sticks out to me the most, though, is the fact that Google only has spidered 10% of the internet. I really didn’t realize before just how much content there is out there that hasn’t been discovered by Google. It makes you realize just how huge the internet is.
Alec@Trash Bag Holder´s last blog ..Keeping Things Green With Trash Bags My ComLuv Profile

dave tribbett Mar 17 at 2:05 pm

Web services, APIs and mashups seem to be the way this content is being pulled and rendered. This will deepen considerably after the Internet of Things gets going. Good post, thanks.
dave tribbett´s last blog ..Casting an Information Shadow My ComLuv Profile

dave tribbett Mar 18 at 10:25 pm

Great post. Here is a good article that adds some additional detail to the topic and a good set of links to the deep web search engines and other helpful sites.
dave tribbett´s last blog ..The Deep, Dark Invisible Web My ComLuv Profile

Joseph Ratliff Mar 19 at 11:03 am

This is great stuff Jack, and something most people don’t even start to think about…I didn’t until reading this.

Thanks for the resources too.
Joseph Ratliff´s last blog ..How Do You Expect To Do Any One Thing Well, When You’re Trying To Do Many Things At The Same Time? My ComLuv Profile

amr to mp3 Apr 19 at 5:28 am

yeah, so much useful information lies buried beyond the reach of search engines like google.com…not surprisingly, pepole anxiously awaits a way to access this information.

Lisa from Specialty Roofing Apr 26 at 11:24 am

Yes, very interesting. Don’t you think that the main reason some websites are included in Google’s index comes down to web programming? For example, sites designed with AJAX cannot really be crawled properly.

Jack Humphrey Apr 29 at 9:39 am

Yeah you have to make sure not to count on any of that content being read and including it somewhere in a format that can be. Lots of small-time designers miss this point entirely. But the sites are pretty. :)

Handbags May 7 at 1:41 am

Never knew that web pages that’s not been spidered by the search engines was called Deep Web. I would leave its exploration to the curious folks out there. The owners of these web pages abandoned and hidden the information for a reason.

Scott from GBG Jun 8 at 9:47 am

Cool topic, Jack. Love this clandestine Jack Bauer/NCIS stuff ….

I had never heard that Google has only spidered 10% of the web. Freakin’ amazing if you think about it. Imagine the serp competition if it was the other way around!

Time to go exploring … thanks.
Scott@GBG´s last blog ..GBG Business Success – A Proven Path My ComLuv Profile

Angela from Freelance webdesign Jun 28 at 11:16 pm

Good post, Jack! I’ve never heard of the Deep Web before either. But if those pages are mainly inactive or hard to index anyway, isn’t that a good indication of the so-called “quality” of those pages? Perhaps Google just considers them dead for the visitors, you know. Perhaps they haven’t been updated in years, or something.

Anyway, interesting subject. Have you got more of these coming up?

Jessica from internet parental filter Jun 28 at 11:47 pm

Be aware that finding that oh-so-precious knowledge is just as hard as looking for a needle in a haystack; you need to know the proper search terms to get results you want.
Jessica@internet parental filter´s last blog ..Porn Blocker – Internet Parental Filter Software My ComLuv Profile

Jack Humphrey Jun 29 at 3:56 pm

Yeah you have to be a real sleuth and go through a lot of junk sometimes to find the worthwhile stuff.

Jack Humphrey Jun 29 at 4:00 pm

In some cases, yes. In others its more a matter of “eye of the beholder” and what you’re after that sets the real value of pages not in Google. Some info is evergreen too, and many people have put up great things in the past that they abandoned, but the info is still someone else’s gold mine if they need it for a book or product. The real reason a lot more stuff isn’t in Google is the stuff has no links. Google is a popularity contest and most of the pages on the web are not “popular.” But the info on some of them can be a big resource depending on what you’re looking for.

Might do another on this topic if Bill digs up some new info on the topic. But nothing planned in the near future.

Live TV stream Jul 18 at 8:06 am

As the major search engines start to experiment with incorporating Deep Web content into their search results, they must figure out how to present different kinds of data without overcomplicating their pages. This poses a particular quandary for Google, which has long resisted the temptation to make significant changes to its tried-and-true search results format.

danna from web development Jul 29 at 5:43 am

I never thought that such thing as “Deep Web” exists. Does anyone know where I can find a good article that explains very well about “Deep Web”? I’m just curious about this.

Mike from web design surrey Aug 10 at 6:43 am

I don’t really see how the deep web can help marketeers. If these pages are off googles radar then they have no significance as they won’t have a page rank so therefore worthless to linking to. Just my opinion.
Mike@web design surrey´s last blog ..Personal Trainer Website My ComLuv Profile

Leave a Comment

CommentLuv Enabled

This site uses KeywordLuv. Enter YourName@YourKeywords in the Name field to take advantage.

Additional comments powered by BackType

Previous post:

Next post:

Blog Marketing | Linkbait Tips