How Do Search
Engines Work?
Most search engines use a
computer program called a "spider" to collect information and
index web resources. Sometimes called "webcrawlers" or "robots",
these computer programs crawl through web sites on the Internet,
gathering information from all the pages of a web site. The spider
returns the information to a central database and then indexes the
information it has gathered. When you perform a search engine
search, you are searching the database compiled and indexed by the
spider.
Spiders are part of a larger
category of Internet computer programs called "agents". Agents are
computer programs that perform specific functions for their users,
usually gathering, comparing and organizing information. There are
agents that will gather stock prices for you, and shopping agents
that will locate merchandise on the Internet. As agents become
more sophisticated, they will be able to provide up-to-the-minute
research results, gathering the latest information as soon as it
is posted on the Internet.
While all search engines rely on
spiders to collect and index information, each performs its tasks
in a slightly different way. Each search engine has its own search
interface and uses different criteria for matching searches with
documents. Each may also differ in terms of search speed and how
it ranks results in order of relevance.
Searching would be easier
if the search engines used a common standard. However, since each
search engine operates a little differently, it is a good idea to
search more than one to be sure you have retrieved most of the
relevant information available on your topic.
Location and
Frequency
Search engines look at both
the location of search terms as well as the frequency of
occurrence of search terms to help determine relevancy. The higher
up on a web site that a search term appears, the higher the
ranking of that web site. A web site which contains a search term
in the title or in the first few paragraphs of text will be
determined to be more relevant than one in which the search term
appears toward the end of the document.
Search engines also look at
the number of times search terms appear in the text of the web
site. Sites with a higher frequency of a search term are
determined to be more relevant.
Ranking and
Popularity
In addition to
text-matching techniques, an increasing number of search engines
are using popularity and link analysis as a means of ranking
search results.
Direct Hit uses
technology that measures what people are selecting from the
results of a number of search engines. In addition to measuring
the number of hits a site receives, Direct Hit also measures how
much time people spend at a site. The longer a user stays at a
site the higher that site is ranked. Through a combination of
these two measurements Direct Hit can show what it considers to be
the most relevant sites for a search topic. Direct Hit stands
alone as a search engine but it is also integrated into the search
results of other search engines such as HotBot and Lycos.
Google uses link
analysis to rank the usefulness of a web site. Google interprets a
link from web site A to web site B as a vote by site A for site B.
The more votes or links a site receives the more relevant that
site is. In addition to looking at the number of links a site
receives, Google also analyzes the sites casting the votes. Votes
cast by sites which are themselves major sites (e.g. receiving
many votes themselves) are weighed more heavily than votes from
other less popular sites.
Refining
your Search Statements
The next step in developing your search
statement is to refine your keyword search string. This may help
to narrow or direct your search so that you retrieve the most
relevant results. Complex search statements may need to be refined
by adding words and characters such as Boolean operators,
quotation marks to indicate exact phrases, proximity operators,
truncation symbols, or field search limitations.
General Search Features
Most of the major search
engines support the following search techniques, although each
search engine operates a little differently. Be sure to read the
specific instructions provided for each search engine in the HELP
files. There is usually a link to a HELP screen near the search
box or near the top of the search engine's home page. These HELP
screens should be consulted on a regular basis as the searching
features of the search engine may change.
Boolean Searching
Boolean
Searching
Boolean searching is based on a system of
symbolic logic developed by George Boole, a 19th century English
mathematician. Most computer databases and Internet search engines
support Boolean searches. Boolean search techniques may be used to
perform accurate searches without producing many irrelevant
documents.
The power of Boolean searching is based on
combinations of keywords with connecting terms called operators.
The three basic operators are the terms AND, OR, and NOT.
Many Internet search engines replace Boolean
operators with symbols, for example + for AND, -
for NOT.



The examples above illustrate general topics expressed with
just two keywords. Actual search strings, which express complex
topic ideas, may consist of several keywords and combinations of
Boolean operators. (See the Advanced Web
Research page).
- Most search engines
support Boolean searching, allowing AND, OR, and
NOT searches. Some engines only allow AND. In some
search engines, the exclusionary NOT operator is expressed as
AND NOT.
- If a list of terms is
entered and no Boolean operator is specified, many search
engines use the OR operator as the default, while others
use the AND operator.
- Some search engines
require that the Boolean operator be capitalized; others do not,
though those not requiring capitalization accept it. Therefore,
it is a good idea to capitalize any Boolean operator.
- Many search engines use
a simplified form of Boolean operator, replacing the operator
with a symbol:
- the +
sign for an AND search
- Example:
+drinking +driving searches for the words drinking
AND driving, in no specific order in the text of the web
page.
- the - sign for
a NOT search
- Example:
+dolphins -football will search for documents which
contain the word dolphins but NOT the word football.
- Search statements
combining more than one type of Boolean operator must also use
nesting or parentheses around synonymous terms. The parentheses
tells the search engine to perform that search first.
- Example: +suicide
+(teen youth adolescent) will search for documents
containing any or all of the terms within the parentheses
before combining that result with the word suicide. This
assumes that the default operator for the search engine is OR.
Phrase Searching
- Most search engines
support the use of double quotation marks around words, terms or
names you want searched as a phrase, i.e., appearing in exactly
the order you enter them:
- Example:
"ozone layer depletion" searches for the phrase, with
the words in the order given
- Example: "Martin
Luther King" searches for the name as a phrase
- Example: "Society
for Creative Anachronism" searches for the organization
- In some search engines,
if a phrase is not specified in the search statement, the
default search is an OR Boolean search in which just one of the
terms in the search need be present to retrieve a document. This
can lead to thousands of irrelevant hits.
- Some search engines use
pull-down menus to allow the searcher to select "exact phrase"
as the search option.
Proximity Searching
- Some search engines,
most notably, Alta Vista, support proximity searching. The
NEAR operator will allow you to look for words within 10
words of each other.
- Example: "college
students" NEAR "binge drinking" would look for those two phrases
within 10 words of each other in any order.
Field Searching
- Some search engines
allow you to limit your search to specified fields, such as the
title of the document, a word from the URL, the domain name, and
the availability of such features as images, sound, and video.
An easy way to limit your search is using the advanced link in
google.
- Example:
title:"affirmative action" searches for the phrase
within titles of documents. Limiting a search to the title
field can be one of the most effective ways to narrow search
results to only the most relevant sites.
- Example: +domain:gov
+title:"health care reform" searches for the phrase within
titles of documents produced by a government agency.
- Example: url:ccla
searches for documents with ccla (College Center for Library
Automation) as part of the Internet address.
- Example: link:http://www.ccla.lib.fl.us
searches for web sites which have linked to College Center for
Library Automation.
Truncation
- Some search engines
automatically look for singular and plural forms of terms as
well as ing or ed endings. Others use the asterisk (*) to
specify that all endings of the root term be searched.
Case Sensitivity
- Some search engines are
case sensitive, requiring that proper names and place names be
capitalized.
- In general, when a
search statement is entered in all lower case, both lower case
and upper case will be retrieved. The reverse is not true. When
upper case is used the search engine will only retrieve the
exact match. For example, AIDS will not retrieve the common
word, aids.
Keyword vs. Concept
Searching
- Most search engines use
keyword searching. They look for documents containing the
exact words entered. This necessitates a careful
selection of keywords to describe a topic. For example, a search
for the word cancer would not retrieve documents
containing the word neoplasm or carcinoma unless
the word cancer was also present in the document, although all
three words express the same concept.
- A search engine which
utilizes concept cluster searching looks for documents related to
the idea of the search as well as those documents containing the
exact word(s) of the search. Concept searching takes into
account that a topic can be described in a wide variety of ways
with different words and expressions (for example, cancer,
neoplasms, carcinoma).
Related Sites
- Some search engines
(Alta Vista, Go, Google and Raging Search) provide links to
related or similar sites along with the sites retrieved. In this
way, if you like the content of a particular site you may be
able to find similar or comparable sites which were not
retrieved in your initial search.
Miscellaneous Hints
- Searching can be
confusing! Remember, each search engine works a little
differently. To make it easier, be sure to read the HELP
files for each search engine on a regular basis!
- Make sure you try your
search in several search engines. Each search engine's database
includes unique documents that will not be included in other
databases.
- For the latest
developments in search engines bookmark the following site.
Links to Major
Search Engines
Below are links to some of the
largest and most popular search engines along with links to their
basic help files.
Meta-Search Engines
A special kind of seach
engine called a meta-search engine, (or parallel
search engine) allows you to query several search engines at once.
Instead of doing a search itself, a meta-search engine sends your
request to other search engines, compiles the results, and
displays them for you. This process is much faster than querying
several search engines separately.
Meta-search engines do not
own any database of web pages--they use and deliver results from
the databases and search programs of each of the individual search
engines they query. Meta-search engines act as an intelligent
middle-man to pass your search through, gather the responses and
then give you a report from several engines at once. As well as
saving time, this kind of search engine can give you an overview
of the kind of document you may find using your search terms and
may even result in giving you exactly what you need if you are
searching for a unique term or phrase.
There are some
disadvantages in relying exclusively on meta-search engines. None
of the meta-search engines query all of the largest search
engines. At this writing, none queries Northern Light; several do
not query HotBot. If a connection of search takes too long, one or
more of the search engines may time out and produce no results. If
you submit a complicated search to a meta-search engine that one
of the queried tools does not "understand" you may get no hits at
all from that engine. However, you will usually get results from
another tool that supports your search strategy.
Meta-search engines
retrieve only the first 10-50 hits from each search engine; the
total number of hits may be less than you would retrieve with a
direct search on a single search engine. Thus, meta-search engines
do not eliminate the need to learn how to intelligently search at
least one or more general web search engines (such as AltaVista,
Fast, Google, HotBot, or Northern Light).
Each meta-search engine has its
own interface and method for letting you choose engines to search.
Below are links to four popular meta-search engines along with
links to their basic help screens.
-
|