WWW Resources

How Search Engines Work
General Web Search Engines
Invisible Web Search Engines
Meta-Search Engines
Specialty Search Engines ("Vortals")
Resources to Consult About Search Engines & Web Sites

How Search Engines Work

Search engines for the general web do not really search the World Wide Web directly. Each searches a database of the full text of web pages selected from the billions of pages out there residing on servers. When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results list, you retrieve from the server the current version of the page.

Search engine databases are selected and built by computer robot programs called spiders. Although it is said they "crawl" the web in their hunt for pages to include, in truth they stay in one place. They find the pages for potential inclusion by following the links in the pages they already have in their database (i.e., already "know about"). They cannot think or type a URL or use judgment to "decide" to go look something up and see what's on the web about it.

If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page-one that no other page has ever linked to-can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this.

After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the test, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.

Some types of pages and links are excluded from most search engines by policy. Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the "Invisible Web"-what you don't see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web. Examples of invisible web content include specialized searchable databases (e.g., EBSCOhost, ProQuest) and web pages excluded due to search engine company policies.

Top

General Web Search Engines

http://www.google.com
The largest search engines among general web search engines, Google claims over 3 billion pages. It is a general purpose web database with a useful ranking system of search results by popularity. Not as comprehensive as some of the other search engines, but Google finds "the best" pages. A subject directory allows browsing by subject. Google now provides the ability to search the full text of many PDF files by converting these files to text, and encasing the text in HTML so it can work like an ordinary web page in the Google database. Use the image databases to locate

http://www.alltheweb.com
The second largest search engine, Alltheweb claims over 2 billion pages. Has an excellent ranking system and an advanced search function worth mastering.

http://www.altavista.com
A general purpose search engine that allows advanced search with Boolean operators. AltaVista, which means "a view from above," was inspired by the creation of big ideas from a team of experts with a fascination for keeping track of information. During the spring of 1995, scientists at Digital Equipment Corporation's Research lab in Palo Alto, CA, devised a way to store every word of every HTML page on the Internet in a fast, searchable index. This led to AltaVista's development of the first searchable, full-text database on the World Wide Web. Other notable AltaVista inventions include the first-ever multi-lingual search capability on the Internet and the first search technology to support Chinese, Japanese and Korean languages. Babel Fish, the Web's first Internet machine translation service, translates words, phrases or entire Web sites to and from English, Spanish, French, German, Portuguese, Italian and Russian.

http://search.yahoo.com
In October 2002, Yahoo! replaced its human editors with spiders and the Google search engine. You can search the directory of topics at http://www.yahoo.com/ or use the new search engine, with search buttons for directory, maps, news, yellow pages and images.

http://www.teoma.com
Teoma means "expert" in Gaelic. Teoma was founded at Rutgers University in April 2000. Since April 2001 when the search engine was launched, Teoma's unduplicated reach had grown 175 percent, making it the third-most widely used search technology in the United States, with a reach of over 25% of the Web (according to Nielsen//NetRatings). Like social networks in the real world, the Web is clustered into local communities. Communities are groups of Web pages that are about or are closely related to the same subject. Teoma is the only search technology that can view these communities as they naturally occur on the Web (displayed under the heading "Refine" on Teoma.com). This unique method allows Teoma to generate more finely tuned search results, exposing dimensions of the Web that have previously gone unseen by other search engines. In other words, Teoma's community-based approach reveals a 3-D image of the Web, providing it with more information about a particular Web page than other search engines, which have only a one-dimensional view of the Web. This wealth of information allows Teoma to add a new level of relevance to search results, known as authority. Authority represents a level of expertise or knowledge to a Web page as validated by other Web pages about the same subject. In early 2003, Teoma launched an advanced version of its technology-Teoma 2.0-with improved relevance, new search tools and advanced functionalities.

Top

Invisible Web Search Engines

  • What is the "Invisible Web"?
    The Invisible Web is made up of thousands of databases & searchable sources that (1) contain highly targeted and valuable information, and (2) whose content is not seen (indexed) by traditional search engines. These sources include databases, specialized search engines, archived material, and interactive tools (such as calculators and dictionaries)

  • Why should I use InvisibleWeb.com?
    The InvisibleWeb.com is a directory of over 10,000 databases, archives, and search engines that contain information that traditional search engines have been unable to access. InvisibleWeb.com takes you to these invisible sources.

  • Why would I want this?
    Classic search engines such as Yahoo or AltaVista are just too large. They work just like the index in the back of a book; you give the engine a word to look for and it returns every page it has ever seen that word on. You don't want to wade through futile, repetitive information; you want targeted, precise information and that's exactly what InvisibleWeb.com delivers!

  • An Invisible Web example:
    For example, say you need cash and are looking for the nearest Automated Teller Machine (ATM). Since there are nearly one billion web pages on the Internet, does it make sense to find every single page on the web that ever mentioned the word "ATM?" Unfortunately, large search engines will do just that.

    There are websites that are connected to databases which contain the locations of ATMs. Type in your address in a form and these sites will find the nearest ATM for you. The data that is available to you at this site (in this case the ATM location database) is invisible to normal search engines. We consider that site to be an InvisibleWeb source. We take you directly to the interactive website that is connected to the information you need.

  • When to use the Invisible Web?
    Use InvisibleWeb.com whenever you want to find targeted information!

Complete Planet: http://www.completeplanet.com
A large database (103,000) of searchable databases, with web pages with search boxes. Has a ranking of the most popular sites on the Web and the largest deep-web sites.

Invisible Web Companion Site: http://www.invisible-web.net/
This site is a companion to The Invisible Web: Finding Hidden Internet Resources Search Engines Can't See by Chris Sherman and Gary Price. It includes a directory of some of the best resources the Invisible Web has to offer. The directory includes resources that are informative, of high quality, and contain worthy information from reliable information providers that are not visible to general-purpose search engines.

Top

Meta-Search Engines

Web sites that search the major search engines simultaneously and are easy to use are called meta-search engines. In a meta-search engine, you submit keywords in a search box. And your search is run in several individual search engines and their databases of web pages. Meta-search engines generally do not build and maintain their own web indexes. Instead, they use the indexes built by others, aggregating and often post-processing results in unique ways. There are over 100 of these meta-search engines available on the web (see Yahoo! for a listing). Meta searching is an excellent approach if the purpose of your search is to get an overview of a topic.

Dog-Pile: http://www.dogpile.com/

Ixquick: http://www.ixquick.com/

Metacrawler: http://www.metacrawler.com/

Search.com: http://www.search.com/

Top

Specialty Search Engines ("Vortals")

Specialized search engines are limited to a particular subject content or function. These search engines range from covering business topics, to auctions, medical information, library sources and web sites. http://searchenginewatch.com/links/article.php/2156351

AllExperts: http://www.allexperts.com
This is the oldest and largest free Q&A service on the Internet. Users can pick a category and click on the volunteer's name to ask a question!

Business.com: http://www.business.com/

EBay: http://www.ebay.com
eBay is the world's largest trading community where millions of people buy and sell millions of items every day.

FindLaw: http://www.findlaw.com
FindLaw is broken down into six main areas or channels -legal professionals, students, business, the public, legal news, and MYFindLaw. You can access these channels by clicking on the graphics located at the top of FindLaw's homepage, or by using the links situated directly under the homepage's search box. Special content and services tailored to the needs of each of our four main user groups have been placed within the first four channels, while our Legal News section contains news and commentary on a variety of topics.

Hardin Meta Directory of Internet Health Sources: http://www.lib.uiowa.edu/hardin/md/
The Hardin Meta Directory of Internet Health Sources provides easy access to comprehensive resource lists in health-related subjects. It includes subject listings in large "one-stop-shopping" sites, such as MedWeb and Yahoo, and also independent discipline-specific lists. Hardin MD subject pages indicate the length of lists in each subject, making it easy to see at a glance which lists are most comprehensive

Internet Public Library: http://www.ipl.org
The Internet Public Library (IPL), is hosted as a learning/teaching environment at the University of Michigan School of Information and Library Studies. It consists of a reference center, reading room, special collections, searching tools, subject collections and youth resources.

Librarians' Index to the Internet: http://lii.org
The Librarians' Index to the Internet is a searchable, annotated subject directory of more than 11,000 Internet resources selected and evaluated by librarians for their usefulness to users of public libraries. lii.org is used by both librarians and the general public as a reliable and efficient guide to Internet resources.

LibrarySpot: http://www.libraryspot.com
This site provides links to major libraries, a reference desk, a reading room, research paper writing tips, and exhibits.

MedlinePlus Health Information: http://www.nlm.nih.gov/medlineplus/
MEDLINEplus has extensive information from the National Institutes of Health and other trusted sources on over 600 diseases and conditions for both consumers and healthcare professionals. There are also lists of hospitals and physicians, a medical encyclopedia and a medical dictionary, health information in Spanish, extensive information on prescription and nonprescription drugs, health information from the media, and links to thousands of clinical trials.

Open Directory Project: http://dmoz.org
The Open Directory Project is the largest human-edited directory of the web. Over 3.8 million sites, 57,041 editors and 460,000+ categories are found on this web site.

Top

Resources to Consult About Search Engines & Web Sites

Ackerman, Ernest & Hartman, Karen. The Information specialist's guide to search and researching on the Internet and World Wide Web. 2nd ed. Fitzroy Dearborn, 2000. (DeVry University Call Number -- ZA 4201 .A25 2000b)

Hock, Randolph. The Extreme searcher's guide to web search engines. 2nd ed. CyberAge Books, 2001. (DeVry University Call Number -- ZA 4226 .H63 2001)

ResearchBuzz Web Site. http://www.researchbuzz.com/

Schlein, Alan M. Find it online. Facts on Demand Press, 2002. (DeVry University Library South Florida Call Number -- TK 5105.875 .I57 S35 2002)

Scout Report. http://wwwscout.cs.wisc.edu/report/
The Scout Report is one of the Internet's longest-running weekly publications, offering a selection of new and newly discovered online resources of interest to researchers, educators, and anyone else with an interest in high-quality online material.

Search Engine Tutorial. Berkley (CA) University Library http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html

SearchEngineShowDown Web Site. http://www.searchengineshowdown.com/

Search Engine Watch Web Site. http://www.searchenginewatch.com/

Sherman, Chris. The invisible Web. Cyber Age Books, 2001. (DeVry University Library South Florida Call Number -- ZA 4201.S54 2001)

Top


DeVry South Florida Campus Community Web Site Home | Student Central | Student Services | Academics | Campus Info | Financial Info | Admissions | Career Services | Community Outreach | Ft. Lauderdale Center | Miami Center