Searching, part 2,
Metasearch, Directories and the Deep Web
Contrary to popular belief Google is not the only Internet search
engine. In fact there are dozens of them and some are even better at finding
certain things than Google, but with so many to choose from, where do you
Searching the Internet is an imprecise science -- some say a black art
-- based on a number of constantly evolving technologies and techniques but
what most web users may not realise is that traditional search engines only
trawl the upper layers of the Surface Web, which represents just a tiny
fraction of the information stored on the Internet.
Most of the Internet is invisible to the mainstream search engines; the
Deep Web, as it has become known, exists in archives, databases,
catalogues, private and secure websites and as non-standard or ‘dynamic’ pages
that are created in response to specific enquiries. Search engines are getting
better at penetrating the Deep Web but experts estimate that the unseen portion
could be between 400 and 600 times larger than the Surface web.
To understand why such a large proportion of the Internet remains
beyond the reach of search engines it helps to know a little about how most of
them work. Search engines are essentially databases and the information they
contain is obtained by automated software programs called crawlers or spiders.
They patrol the web, visiting sites, indexing the information they contain and
follow links to other websites. Google and some other search engines use the
number of links and complex algorithms to gauge a website’s importance and
hence how high up it appears on the list of results.
Crawler-based search engines constantly scour the web, revisiting sites
to check for updated content and to look for new sites. It can take several
weeks for a new website to be ‘crawled’, though web site owners can sometimes
speed up the process by submitting the site address or URL to the search
engine. This is free for Google and MSN Search though some search engines
charge a fee.
For most routine enquiries a traditional search engine like Ask,
Google, MSN Search or Yahoo, two or three carefully crafted keywords and maybe
a simple operator or two (see Part 1) will eventually find what you are
looking for, but it’s all too easy to get bogged down or sidetracked with huge
numbers of irrelevant returns.
Each search engine has its own strengths, weaknesses and foibles and
one way to reduce the number of spurious or skewed returns is to use a Metasearch
Unlike a normal search engine these websites do not build or maintain
databases nor do they actively look for information. Instead they submit search
keywords to several different search engines at the same time. Then, depending
on their degree of sophistication, they collate the results as they are
returned and present them in order of relevance, speed or other user-set
Metasearch engines tend to look quite similar but the way in which they
work, and the quality of the results they return varies enormously.
Unfortunately there is no simple way to evaluate their performance as it
depends on the nature of the search and the searching techniques used. However,
in general because they spread the net wider they are better at finding the
offbeat and obscure. The downside of a Metasearch is a reduction in precision
and the purity of the results, which may contain advertising or returns from
sites that have paid to be listed.
Search engines are by their nature general-purpose tools and often
quite unsuitable for highly focused enquiries within a particular discipline,
moreover, as we have seen, most of the Internet -- the unseen Deep Web -- is
beyond their reach.
A more productive way to search for specialist information is to use a
web ‘Directory’. They are indexes or catalogues of web pages, but unlike a
search engine they are compiled by intelligent and knowledgeable humans and
list only web sites and search tools within a particular subject area.
Some large, complex or wide-ranging topics, medicine for example, may
support dozens of Directories, in which case the trick is to find the ones that
work for you. The simplest way to find a directory for your subject area is to
use a normal search engine, adding the word ‘directory’ or ‘database’ to your
Web Directories are a better at dipping into the Deep Web but the only
way to gain access to this vast repository of information it is to use
specialist tools and resources, such as Web Directory Metasearch engines, which
create searchable databases of searchable Directories, as it were…
Searching the Deep Web is a complex business, often requiring custom
software and the services of experts to build and maintain databases, so at this
point we have to leave the cosy world of free to use search engines and
directories. In short it searching the Deep Web can be an expensive and
time-consuming business and if the information you seek is difficult to obtain
and has value then you can expect to pay for it.
Meta Search Engines
Academic & Specialist Web
Board for Libraries)
WWW Virtual Library
Searching the Deep Web
© R. Maybury 2006 2104