Search engines are one of the primary ways that
Internet users find web sites. That's why a web site with good
search engine listings may see a dramatic increase in traffic
everyone wants those good listings. Unfortunately, many web sites
appear poorly in search engine rankings or may not be listed at all
because they fail to consider how search engines work. Let’s go
forward and explore first the two major ways search engines get
their listings, and then see how search engine optimization can
especially help with crawler-based search engines.
Information on this page has been drawn from the help pages of each
search engine, along with knowledge gained from articles, reviews,
books, independent research, tips from others and additional
information received directly from the various search engines.
The term "search engine" is often used generically to describe both
crawler-based search engines and human-powered directories. These
two types of search engines gather their listings in radically
different ways.
Crawler-Based Search Engines
Crawler-based search engines, such as Google, create their listings
automatically. They "crawl" or "spider" the web, then people search
through what they have found. If you change your web pages,
crawler-based search engines eventually find these changes, and that
can affect how you are listed. Page titles, body copy and other
elements all play a role.
Human-Powered Directories
A human-powered directory, such as the Open Directory, depends
on humans for its listings. You submit a short description to the
directory for your entire site, or editors write one for sites they
review. A search looks for matches only in the descriptions
submitted. Changing your web pages has no effect on your listing.
Things that are useful for improving a listing with a search engine
have nothing to do with improving a listing in a directory. The only
exception is that a good site, with good content, might be more
likely to get reviewed for free than a poor site.
"Hybrid Search Engines" Or Mixed Results
In the web's early days, it used to be that a search engine
either presented crawler-based results or human-powered listings.
Today, it extremely common for both types of results to be
presented. Usually, a hybrid search engine will favor one type of
listings over another. For example, MSN Search is more likely to
present human-powered listings from LookSmart. However, it does also
present crawler-based results (as provided by Inktomi), especially
for more obscure queries.
The Parts of a Crawler-Based Search Engine
Crawler-based search engines have three major elements. First is the
spider, also called the crawler. The spider visits a web page, reads
it, and then follows links to other pages within the site. This is
what it means when someone refers to a site being "spidered" or
"crawled." The spider returns to the site on a regular basis, such
as every month or two, to look for changes.
Everything the spider finds goes into the second part of the search
engine, the index. The index, sometimes called the catalog, is like
a giant book containing a copy of every web page that the spider
finds. If a web page changes, then this book is updated with new
information.
Sometimes it can take a while for new pages or changes that the
spider finds to be added to the index. Thus, a web page may have
been "spidered" but not yet "indexed." Until it is indexed -- added
to the index -- it is not available to those searching with the
search engine.