GCCC home Distance Education   |  Contact us
The Library at Gulf Coast Community College

GCCC > Library > Research on the web for beginners


Search Engines
Refining your results
Subject Directories
Can you trust the site?

Search Engines



How Do Search Engines Work?

Most search engines use a computer program called a "spider" to collect information and index web resources. Sometimes called "webcrawlers" or "robots", these computer programs crawl through web sites on the Internet, gathering information from all the pages of a web site. The spider returns the information to a central database and then indexes the information it has gathered. When you perform a search engine search, you are searching the database compiled and indexed by the spider.

Spiders are part of a larger category of Internet computer programs called "agents". Agents are computer programs that perform specific functions for their users, usually gathering, comparing and organizing information. There are agents that will gather stock prices for you, and shopping agents that will locate merchandise on the Internet. As agents become more sophisticated, they will be able to provide up-to-the-minute research results, gathering the latest information as soon as it is posted on the Internet.

While all search engines rely on spiders to collect and index information, each performs its tasks in a slightly different way. Each search engine has its own search interface and uses different criteria for matching searches with documents. Each may also differ in terms of search speed and how it ranks results in order of relevance.

Searching would be easier if the search engines used a common standard. However, since each search engine operates a little differently, it is a good idea to search more than one to be sure you have retrieved most of the relevant information available on your topic.

Location and Frequency

Search engines look at both the location of search terms as well as the frequency of occurrence of search terms to help determine relevancy. The higher up on a web site that a search term appears, the higher the ranking of that web site. A web site which contains a search term in the title or in the first few paragraphs of text will be determined to be more relevant than one in which the search term appears toward the end of the document.

Search engines also look at the number of times search terms appear in the text of the web site. Sites with a higher frequency of a search term are determined to be more relevant.

Ranking and Popularity

In addition to text-matching techniques, an increasing number of search engines are using popularity and link analysis as a means of ranking search results.

Direct Hit uses technology that measures what people are selecting from the results of a number of search engines. In addition to measuring the number of hits a site receives, Direct Hit also measures how much time people spend at a site. The longer a user stays at a site the higher that site is ranked. Through a combination of these two measurements Direct Hit can show what it considers to be the most relevant sites for a search topic. Direct Hit stands alone as a search engine but it is also integrated into the search results of other search engines such as HotBot and Lycos.

Google uses link analysis to rank the usefulness of a web site. Google interprets a link from web site A to web site B as a vote by site A for site B. The more votes or links a site receives the more relevant that site is. In addition to looking at the number of links a site receives, Google also analyzes the sites casting the votes. Votes cast by sites which are themselves major sites (e.g. receiving many votes themselves) are weighed more heavily than votes from other less popular sites.


Refining your Search Statements

The next step in developing your search statement is to refine your keyword search string. This may help to narrow or direct your search so that you retrieve the most relevant results. Complex search statements may need to be refined by adding words and characters such as Boolean operators, quotation marks to indicate exact phrases, proximity operators, truncation symbols, or field search limitations.


General Search Features

Most of the major search engines support the following search techniques, although each search engine operates a little differently. Be sure to read the specific instructions provided for each search engine in the HELP files. There is usually a link to a HELP screen near the search box or near the top of the search engine's home page. These HELP screens should be consulted on a regular basis as the searching features of the search engine may change.

Boolean Searching

Boolean Searching

Boolean searching is based on a system of symbolic logic developed by George Boole, a 19th century English mathematician. Most computer databases and Internet search engines support Boolean searches. Boolean search techniques may be used to perform accurate searches without producing many irrelevant documents.

The power of Boolean searching is based on combinations of keywords with connecting terms called operators. The three basic operators are the terms AND, OR, and NOT. Many Internet search engines replace Boolean operators with symbols, for example + for AND, - for NOT.

 The examples above illustrate general topics expressed with just two keywords. Actual search strings, which express complex topic ideas, may consist of several keywords and combinations of Boolean operators. (See the Advanced Web Research page).

  • Most search engines support Boolean searching, allowing AND, OR, and NOT searches. Some engines only allow AND. In some search engines, the exclusionary NOT operator is expressed as AND NOT.
  • If a list of terms is entered and no Boolean operator is specified, many search engines use the OR operator as the default, while others use the AND operator.
  • Some search engines require that the Boolean operator be capitalized; others do not, though those not requiring capitalization accept it. Therefore, it is a good idea to capitalize any Boolean operator.
  • Many search engines use a simplified form of Boolean operator, replacing the operator with a symbol:
    • the + sign for an AND search
      Example: +drinking +driving searches for the words drinking AND driving, in no specific order in the text of the web page.
    • the - sign for a NOT search
      Example: +dolphins -football will search for documents which contain the word dolphins but NOT the word football.
  • Search statements combining more than one type of Boolean operator must also use nesting or parentheses around synonymous terms. The parentheses tells the search engine to perform that search first.
    Example: +suicide +(teen youth adolescent) will search for documents containing any or all of the terms within the parentheses before combining that result with the word suicide. This assumes that the default operator for the search engine is OR.

Phrase Searching

  • Most search engines support the use of double quotation marks around words, terms or names you want searched as a phrase, i.e., appearing in exactly the order you enter them:
    • Example: "ozone layer depletion" searches for the phrase, with the words in the order given
    • Example: "Martin Luther King" searches for the name as a phrase
    • Example: "Society for Creative Anachronism" searches for the organization
  • In some search engines, if a phrase is not specified in the search statement, the default search is an OR Boolean search in which just one of the terms in the search need be present to retrieve a document. This can lead to thousands of irrelevant hits.
  • Some search engines use pull-down menus to allow the searcher to select "exact phrase" as the search option.

Proximity Searching

  • Some search engines, most notably, Alta Vista, support proximity searching. The NEAR operator will allow you to look for words within 10 words of each other.
  • Example: "college students" NEAR "binge drinking" would look for those two phrases within 10 words of each other in any order.

Field Searching

  • Some search engines allow you to limit your search to specified fields, such as the title of the document, a word from the URL, the domain name, and the availability of such features as images, sound, and video.  An easy way to limit your search is using the advanced link in google.
    • Example: title:"affirmative action" searches for the phrase within titles of documents. Limiting a search to the title field can be one of the most effective ways to narrow search results to only the most relevant sites.
    • Example: +domain:gov +title:"health care reform" searches for the phrase within titles of documents produced by a government agency.
    • Example: url:ccla searches for documents with ccla (College Center for Library Automation) as part of the Internet address.
    • Example: link:http://www.ccla.lib.fl.us searches for web sites which have linked to College Center for Library Automation.

Truncation

  • Some search engines automatically look for singular and plural forms of terms as well as ing or ed endings. Others use the asterisk (*) to specify that all endings of the root term be searched.

Case Sensitivity

  • Some search engines are case sensitive, requiring that proper names and place names be capitalized.
  • In general, when a search statement is entered in all lower case, both lower case and upper case will be retrieved. The reverse is not true. When upper case is used the search engine will only retrieve the exact match. For example, AIDS will not retrieve the common word, aids.

Keyword vs. Concept Searching

  • Most search engines use keyword searching. They look for documents containing the exact words entered. This necessitates a careful selection of keywords to describe a topic. For example, a search for the word cancer would not retrieve documents containing the word neoplasm or carcinoma unless the word cancer was also present in the document, although all three words express the same concept.
  • A search engine which utilizes concept cluster searching looks for documents related to the idea of the search as well as those documents containing the exact word(s) of the search. Concept searching takes into account that a topic can be described in a wide variety of ways with different words and expressions (for example, cancer, neoplasms, carcinoma).
     
    Examples of Concept Cluster Search tools:
    Clusty
    AlltheWeb
    KartOO

Related Sites

  • Some search engines (Alta Vista, Go, Google and Raging Search) provide links to related or similar sites along with the sites retrieved. In this way, if you like the content of a particular site you may be able to find similar or comparable sites which were not retrieved in your initial search.

Miscellaneous Hints

  • Searching can be confusing! Remember, each search engine works a little differently. To make it easier, be sure to read the HELP files for each search engine on a regular basis!
  • Make sure you try your search in several search engines. Each search engine's database includes unique documents that will not be included in other databases.
  • For the latest developments in search engines bookmark the following site.


Links to Major Search Engines

Below are links to some of the largest and most popular search engines along with links to their basic help files.

General Search Engines
AltaVista AltaVista Help
Teoma Teoma Hit Help
WiseNut WiseNut help
AllTheWeb AllTheWeb help
Google Google Help
HotBot HotBot Help



Meta-Search Engines

A special kind of seach engine called a meta-search engine, (or parallel search engine) allows you to query several search engines at once. Instead of doing a search itself, a meta-search engine sends your request to other search engines, compiles the results, and displays them for you. This process is much faster than querying several search engines separately.

Meta-search engines do not own any database of web pages--they use and deliver results from the databases and search programs of each of the individual search engines they query. Meta-search engines act as an intelligent middle-man to pass your search through, gather the responses and then give you a report from several engines at once. As well as saving time, this kind of search engine can give you an overview of the kind of document you may find using your search terms and may even result in giving you exactly what you need if you are searching for a unique term or phrase.

There are some disadvantages in relying exclusively on meta-search engines. None of the meta-search engines query all of the largest search engines. At this writing, none queries Northern Light; several do not query HotBot. If a connection of search takes too long, one or more of the search engines may time out and produce no results. If you submit a complicated search to a meta-search engine that one of the queried tools does not "understand" you may get no hits at all from that engine. However, you will usually get results from another tool that supports your search strategy.

Meta-search engines retrieve only the first 10-50 hits from each search engine; the total number of hits may be less than you would retrieve with a direct search on a single search engine. Thus, meta-search engines do not eliminate the need to learn how to intelligently search at least one or more general web search engines (such as AltaVista, Fast, Google, HotBot, or Northern Light).

Each meta-search engine has its own interface and method for letting you choose engines to search. Below are links to four popular meta-search engines along with links to their basic help screens.

Meta-Search Engines
Ask Jeeves Ask Jeeves Help
Dogpile Dogpile Help
Mamma.com Mamma.com Help
MetaCrawler MetaCrawler Help
SavvySearch SavvySearch Help

Subject Directories

Subject directories are usually compiled and maintained by people, or if by a computer program, by some type of automated selection criteria. Like specialized search engines, since they are usually maintained by human beings and are selective, subject directory databases are smaller than those of the general purpose search engines. Like the specialized search engines, directories usually produce more relevant results than search engines because of their size and because they usually index a web site's first page only.

Although web subject directories catalog a small segment of the Web's millions of documents, they provide a quick and easy search by subject, and often by keyword. Directories may be extremely useful if you have no idea where to start searching. They are more useful for searching general subjects rather than for more specific information. Beginning an information search in a subject directory can give you some idea of the types of information files available on the Internet for that particular subject.

When beginning a search, you will notice the top level subjects headings usually consist of very broad subjects, such as "Arts and Humanities", "Education", and "Health". After choosing a subject at the top level, you can move through lists of submenus to narrow your search. Under "Health" you might find "Diseases", "Drugs" and "Fitness". Continue following the subheadings and eventually you will reach a page that lists web documents. Click on the links that look interesting and use your browser's Back button to return to the subject directory.

Some of the subject directories provide an alternative to moving down their hierarchical lists of menus by providing a search engine for their database. You can use keywords to search these directories, but you will be limited to resources in the directory's database.

Examples of general subject guides include:

Remember that some of the sites above also offer a web search engine in addition to their subject directory of reviewed sites. The search engine may search a database which includes non-reviewed sites compiled by an automated spider.

There are several specialized directories of subject guides compiled by subject specialists who are experts in their subject fields. These directories are called distributed subject trees. These directories distribute the responsibility of maintaining lists of the best, most relevant Internet documents in various subject areas to volunteers. Each volunteer is responsible for maintaining a list of documents in his or her area of subject expertise. These guides are likely to produce highly relevant information sources.

Examples of distributed subject trees include:

Above all else, when performing research on the web you must evaluate the information.

Evaluate your source

The best thing about the WWW - volumes of information on almost any imaginable topic - is also the worst thing about it. All this information means that it can be very difficult to sort the good information from the bad. The WWW has no editor to ensure that information published there is accurate and no librarian to make sure that information is organized into subjects so that it's easy to find
 
Therefore when using information found on the WWW it is very important to evaluate that information to make sure it is reliable. Use the table below to start evaluating your WWW resource:
 
authority
  • does the author's name appear on the web page?
  • what is the author's expertise on this particular subject?
  • what is the author's organizational affiliation?
  • is contact information available so that the author can be reached for questions?
bias
  • is the web site objective?
  • does the author's organizational affiliation make him/her biased?
currency
  • is there a date of creation or revision?
  • if the topic is timely, is the date recent?
  • are the links up to date? are there any dead links?
content
  • does the information provided seem logical?
  • is the information intended as an advertisement?
  • does the text follow the basic rules of grammar and spelling?
  • are citations given when facts and statistics are used?
domain
  • .edu = educational institution
  • .gov = government 
  • .mil = military
  • .com = commercial
  • .org = organization
 

 

 


Citations: MLA or APA
e-Reserves
Easy Access Subject Guides
Research help for beginners, advanced or browsing the library
GCCC Catalog BROWSE , ADVANCED , or BASIC
Ask a Librarian at GCCC: Live or Email and get a quick response
A Glossary of Library Terms
Library Home
(850) 872-3893
© MMVI  Gulf Coast Community College