The listed resources are sites that have been tried and found of use or of interest  however, there is no guarantee, endorsement, affiliation or association to these sites  and resources and no confirmation can be made that these sites will remain active and available in the future, These links are available as of the date of inclusion but no personal views, opinion or guarantee is expressed by the author


Definition of a Search Engine

Search Engine Components

A Search Engine has 3 Basic Parts

1. Spider (crawler, link finder):  a computer program that harvests web links from page to page

2. Index: a database that is organized and searchable of the Spider's harvested results

3. Search and retrieval mechanism: Software that allows users to search the Index and return results in a predetermined order.

But a Search Engine also is commonly used to refer to any software that searches an Index of Words or material types


A. "Small" page related Search Engine - To search this page for the word   "Usenet" Click on EDIT in your browser menu, then Click (Find (on this page) .  Enter your term and search.

  1. B.Database or Index Specific - Searches only for content within an enclosed site

  2. C.C. Directory Search Engines  -- Searching for content or web pages submitted by hand.  In other words materials are found and maintained by a human being.
    Examples - Yahoo used to only search a directory of submitted pages

D. Large Search Engines - These search engines use the "3 Basic Parts" listed above.   They try to find everything on the Internet and fail  for a number of reasons.

Spiders or Robots

1. Robot software (spiders, crawlers) uses HTTP to request documents associated with a certain URL. 

2. Robots use either a depth-first or breadth-first search strategy for following URLs.
- depth-first robot follows the first link on the initial page, then the first link of the second, and so on.  This is used more commonly for subject specific search engines.

- breadth-first robot searches the first link of initial page, then retreats back to the initial page and follows the second link, and so on.   This is used most commonly for broad search engines.

3. URLs are organized in a database. 

4. The URLs from the database are "reharvested" and text from the sites are put in an index.  How much text is harvested varies amongst the various search engines.

5. Harvesters generate text summaries.  Most copy the <title> and a fixed amount of the initial text.

6. The search engine uses search software to search the index created by the robot searches.

7. Algorithms are used to set each individual search engines search parameters: boolean, wildcards, etc.

8.  Algorithms are used by search engines to Rank the reults of the search.  Factors that may be considered in Ranking: Which fields the search terms are found (<title>, URL field,) Number of times the word appears in a single document, Where the search term appears in the document.  Payment by companies to have their pages ranked high or first.

9. Netiquette for Robots.  The root directory of a Web server can be named robots.txt.  The robot should leave these web files alone for privacy reasons.   In our SMC web account, Web files may be located in a folder named "private" to prevent a "local search engine" from viewing.

Invisible Web - What is it and What is composed of.

According to "Invisible Planet" - The Invisible Web is the content that resides in searchable databases, the results from which can only be discovered by a direct query. Without the directed query, the database does not publish the result. When queried, deep Web sites post their results as dynamic Web pages in real-time. Though these dynamic pages have a unique URL address that allows them to be retrieved again later, they are not persistent.

The Visible or Surface Web  uses search engines as the primary means for finding information on the "surface" Web. Authors may submit their own Web pages for listing [Directories like Yahoo]. Or, search engines "crawl" or "spider" documents by following one hypertext link to another. 

Major Point - what can be found via a search engine like Google is much less than what exists in total on the Internet

But Search Engines might not search "deep web" because:

1. Dynamically (Database) driven - websites.  Search Engines may have difficulty harvesting non-html mark-upped websites.

2. Search Engines CAN NOT search password driven sites like EbscoHost journal databases or online catalogs.

3. Search Engines may have "difficulty" searching within Adobe, Word, PowerPoint, etc files on a web page

The Invisible Web is 500 to 1,000 times the size of the the Surface Web.

The pie chart below displays the distribution of deep Web sites by type of content.

Distribution of Deep Web Sites by Content

Comparing and ranking different  search engines: Ranking the different search engines depends on the emphasis one gives the following evaluation criteria:

1. Size of the database 
- everything included, dual numbers
- selected and reviewed content

2. File Types
    - Web Pages, Usenet News, gopher, FTP, PDF (Adobe), Word, 
    - Other [software, sound, images, video]
    - Material type: Location (country), language, newspapers, journals, blogs, wikis

3. Interface
    - modes: simple or complex, look over for details of boolean searching, etc.

4. Ranking of results - what search engines consider in giving search results
    - Was word found in URL address
    - frequency of word choices found on web pages
    - location: words found in meta-tags, first paragraph
    - reviewed sites
    - fee paid to rank sites higher in results list
    - proximity of words to each other
    - Link Popularity (Google, Inkotomi), also known as Peer Ranking
    - bundling of results into concepts, domains, and sites

5. Limitations
    - Language, Geography

6. Timeliness
    - Frequency of Discovery
    - Timelag
    - Weeding

7. Description of sources (annotations) found in hit list

8. Speed

What Search Engines Often Don't Search
- the following listing is often referred to as the "Invisible Web"

  1. 1.Contents of Adobe PDF and formatted files
    2. The content of Sites requiring a log-in
    3. CGI-Bin Output such as data requested by a form
    4. Intranets
    5. Commercial or proprietary indexes like ERIC, UMI, Lexis-Nexis. [But Google is making an agreement with WorldCat to allow library catalogs to be avaialble
    6. Sites that use a robots.txt file extensions to keep robots (search engines) away
    7. Non-html resources: Telnet, ftp, gopher, etc.
    8. Web sites that are "Database Driven" 

  2. 2.Note that the URL ends not with .htm or .html  Also note the ? in the URL. 

Search Engine Spiders will generally not retrieve or harvest these URLs.

                                           Search Engine Comparison Chart

Search Engines

Ask Both -Dual Search


Boolify - easy boolean searching

Cross Language - search arabic sites using english words


Google Search Engine




Search3 - Search Google, Yahoo and Live simultaneously

Slikk Search Engine  



Cluster Search Engines

Carrot clusters from differing search engines

Touch Graph- Top choice

Yippy advanced search with cloud display

Webclust Clustering engine

Country Specific Search Engines

Colossus  country specific search engines

Haystacks  country specific Google sites

Meta Mega and Multi Search Engines


All the web

Ask Both Meta search tool that displays multiple results


Dog pile


Info grid


IX quick





Namecheck username search



Query server

Real-time meta search and analysis site.

Soovle Multiple search engine


Search online info

Slikk multiple match

Surf wax


Web crawler

Real time Social Media search








Reverse search and Domain Links


Semantic and Specific Field Search

Anonymising meta search engine

Arabic and english search engine

Blog and posts search

Glearch Country and Language Specific

Haika Semantic search engine

Highly customisable Search engine

Incy wincy

Kngine Semantic Search

Knowem Semantic search and image engine


Mash pedia

Multiple engine comparison site

Net search, in depth, date ranged

Realtime commercial search engine

Real time social search engine

Search cloud

Search engines from other countries

Sensebot semantic search engine

Social media keyword current search

Stealth  search engine - does not collect IP details or save search data

Split screen search engine

Wink Social network search engine

Worldwide search engines

Translation Search Engines


UK Search Directories

Google UK

AltaVista UK

Excite UK

Lycos UK

UK Plus

Yahoo! UK & Ireland

Mirago the UK search engine

Visual Search Directories



Locate Metadata within documents



Oskope - excellent visual search return  

Search cube graphical search engine that presents        

compact, visual format in three dimensions.

Search me    

Search to find keywords within a document

Social pinup board

TOP visual search engine




Visual meta search engine

Visual search engine

Search Engines - Brief Overview

A search engine consists of the interface that you use to type in a query, an index of web sites that matches the queried data and a software program called a spider or bot which trawls the web at set periods and gets new sites for the index. when you use a search engine you are searching its index for matches with your search terms.


This type of engine, such as Google, Yahoo and similar reads pages from all over the world in many languages and the engines may index more than a billion pages.


Search engines that are limited geographically such as UK sites only


Search engines that are limited to a single subject or topic area


Search engines that index only specific reference works


Human edited smaller databases with more specific targeted matches to sites or directories.

When searching think of the following;

1 -- Use more than one search engine -- Use or at least try search terms in two or  three single search engines to get a overview of results. Search engines cover different parts of the net and may return differing results

2 -- Use AND to increase relevance -- Use an AND operator to significantly reduce returned items and at the same time increase relevance.Check the search engine operator terms to confirm it will accept AND, and in which format. (Type AND  in capitals in most search engines).

3 -- Use OR to include synonyms -- Use the OR operator in the search and to increase the relevant terms returned, expense of  (Type OR in capitals in most search engines).

4 -- Use Semantics -- When looking for keywords to search with, use different spellings, abbreviations, translations, synonyms, plural, singular, truncation etc. Use professional terms when looking for 'professional' information and consider use of variants of the word

5 -- Use NOT to exclude unwanted terms -- Use the NOT operator to exclude unwanted terms from the results. Again verify the search engines operator requirements but most require a dash added  "-"

6 -- Consider using web directories -- Consider using some of the larger web directories if you are unsure about search terms, want tips about experts, or an introduction to a certain subject.

7 -- Consider using meta search engines -- Use meta search engines as a last resort if none of the major search engines do not produce the results you want. They will review larger fields but return more general data in most cases.

8 -- Use field names to restrict your search -- Use field names to significantly increase relevance and lower recall. Use intitle: to search for titlewords only, use site: to search in a particular domain, use filetype: to search for particular document formats.

9 -- Consider using Serial search engines -- Use one of the Serial Search Engines to quickly search more than one single search engine in succession without having to retype the query.

10 -- Use repressive phrase searching,  -- When searching for specifics try using inverted commas “Search” around your search terms or words to restrict the search fields coverage.

Utilise the Google advanced search page to utilise a number of the above operators

Google Advanced Search                                                         CLICK HERE

For advanced search operators and procedures                      CLICK HERE



The inclusion of any link should not be taken to be an endorsement of the information, views or opinions expressed.