HOUSTON, WE HAVE A PROBLEM 2006

  

 

Effective Searching, part 2,

Metasearch, Directories and the Deep Web

 

Copy

Contrary to popular belief Google is not the only Internet search engine. In fact there are dozens of them and some are even better at finding certain things than Google, but with so many to choose from, where do you start?

 

Searching the Internet is an imprecise science -- some say a black art -- based on a number of constantly evolving technologies and techniques but what most web users may not realise is that traditional search engines only trawl the upper layers of the Surface Web, which represents just a tiny fraction of the information stored on the Internet.

 

Most of the Internet is invisible to the mainstream search engines; the Deep Web, as it has become known, exists in archives, databases, catalogues, private and secure websites and as non-standard or ‘dynamic’ pages that are created in response to specific enquiries. Search engines are getting better at penetrating the Deep Web but experts estimate that the unseen portion could be between 400 and 600 times larger than the Surface web.

 

To understand why such a large proportion of the Internet remains beyond the reach of search engines it helps to know a little about how most of them work. Search engines are essentially databases and the information they contain is obtained by automated software programs called crawlers or spiders. They patrol the web, visiting sites, indexing the information they contain and follow links to other websites. Google and some other search engines use the number of links and complex algorithms to gauge a website’s importance and hence how high up it appears on the list of results.

 

Crawler-based search engines constantly scour the web, revisiting sites to check for updated content and to look for new sites. It can take several weeks for a new website to be ‘crawled’, though web site owners can sometimes speed up the process by submitting the site address or URL to the search engine. This is free for Google and MSN Search though some search engines charge a fee.  

 

For most routine enquiries a traditional search engine like Ask, Google, MSN Search or Yahoo, two or three carefully crafted keywords and maybe a simple operator or two (see Part 1) will eventually find what you are looking for, but it’s all too easy to get bogged down or sidetracked with huge numbers of irrelevant returns.

 

Each search engine has its own strengths, weaknesses and foibles and one way to reduce the number of spurious or skewed returns is to use a Metasearch engine. 

 

Unlike a normal search engine these websites do not build or maintain databases nor do they actively look for information. Instead they submit search keywords to several different search engines at the same time. Then, depending on their degree of sophistication, they collate the results as they are returned and present them in order of relevance, speed or other user-set criteria.

 

Metasearch engines tend to look quite similar but the way in which they work, and the quality of the results they return varies enormously. Unfortunately there is no simple way to evaluate their performance as it depends on the nature of the search and the searching techniques used. However, in general because they spread the net wider they are better at finding the offbeat and obscure. The downside of a Metasearch is a reduction in precision and the purity of the results, which may contain advertising or returns from sites that have paid to be listed.  

 

Search engines are by their nature general-purpose tools and often quite unsuitable for highly focused enquiries within a particular discipline, moreover, as we have seen, most of the Internet -- the unseen Deep Web -- is beyond their reach.

 

A more productive way to search for specialist information is to use a web ‘Directory’. They are indexes or catalogues of web pages, but unlike a search engine they are compiled by intelligent and knowledgeable humans and list only web sites and search tools within a particular subject area.

 

Some large, complex or wide-ranging topics, medicine for example, may support dozens of Directories, in which case the trick is to find the ones that work for you. The simplest way to find a directory for your subject area is to use a normal search engine, adding the word ‘directory’ or ‘database’ to your search keyword.

 

Web Directories are a better at dipping into the Deep Web but the only way to gain access to this vast repository of information it is to use specialist tools and resources, such as Web Directory Metasearch engines, which create searchable databases of searchable Directories, as it were…

 

Searching the Deep Web is a complex business, often requiring custom software and the services of experts to build and maintain databases, so at this point we have to leave the cosy world of free to use search engines and directories. In short it searching the Deep Web can be an expensive and time-consuming business and if the information you seek is difficult to obtain and has value then you can expect to pay for it.

 

 

Box Out

 

Meta Search Engines

 

www.copernic.com/index.html

www.dogpile.com/

www.ixquick.com/

www.mamma.com/

www.metacrawler.com/

http://metasearch.com/

www.search.com/

www.surfwax.com/

 

 

Academic & Specialist Web Directories

 

Academic Info

www.academicinfo.net/

 

BUBL (Bulletin Board for Libraries)

http://bubl.ac.uk/

 

Humbul Humanities Hub

http://www.humbul.ac.uk/

 

Medic8 (Medicine)

www.medic8.com/

 

Medscape (medicine)

www.medscape.com/home

 

Resource Discovery Network

http://www.rdn.ac.uk/

 

WWW Virtual Library

http://vlib.org/

 

 

Searching the Deep Web

 

www.brightplanet.com/

http://turbo10.com/

www.completeplanet.com

 ---end---

 

© R. Maybury 2006 2104

 

Search PCTopTips 


Web

PCTopTips

Digital Life Index

Houston 2006

Houston 2007

Houston 2008

 

Top Tips Index

Windows XP

Windows Vista

Internet & Email

Microsoft Word

Folders & Files

Desktop Mouse & Keyboard

Crash Bang Wallop!

Privacy & Security

Imaging Scanning & Printing

Power, Safety & Comfort

Tools & Utilities

Sound Advice

Display & screen

Fun & Games

Windows 95/98/SE/ME

 

 

 

 

 

 

 Copyright 2006-2008 PCTOPTIPS UK.

All information on this web site is provided as-is without warranty of any kind. Neither PCTOPTIPS nor its employees nor contributors are responsible for any loss, injury, or damage, direct or consequential, resulting from your choosing to use any of the information contained herein.