From the web page http://www.ouc.bc.ca/libr/connect96/search.htm Sink or Swim: Internet Search Tools & Techniques (Version 3.0 - 1998) By Ross Tyner, M.L.S., Okanagan University College WORKSHOP OUTLINE * Introduction * Search Engines & Subject Guides * Search Engines * Multi-Threaded Search Engines * Subject Guides * Specialized Subject Guides * Search Strategy * Boolean Logic * Search Tips * Search Engine Comparisons * Alta Vista * Excite * HotBot * Infoseek * Northern Light * Exercises * Appendix: Searching for People * References INTRODUCTION According to the results of a study published in the April 3, 1998 issue of Science, the World Wide Web is estimated to contain over 320 million pages of information.(1) As if the Web's immense size weren't enough to strike fear in the heart of all but the most intrepid surfers, consider that the Web continues to grow at an exponential rate: doubling in size every four months, according to some estimates.(2) Add to this, the fact that the Web lacks the bibliographic control standards we take for granted in the print world: There is no equivalent to the ISBN to uniquely identify a document; no standard system, analogous to those developed by the Library of Congress, of cataloguing or classification; no central catalogue including the Web's holdings. In fact, many, if not most, Web documents lack even the name of the author and the date of publication. Imagine you are searching for information in the world's largest library, where the books and journals (stripped of their covers and title pages) are shelved in no particular order, and without reference to a central catalogue. A researcher's nightmare? Without question. The World Wide Web defined? Not exactly. Instead of a central catalogue, the Web offers the choice of dozens of different search tools, each with its own database, command language, search capabilities, and method of displaying results. Given the above, the need is clear to familiarize yourself with a variety of search tools and to develop effective search techniques, if you hope to take advantage of the resources offered by the Web without spending many fruitless hours flailing about, and eventually drowning, in a sea of irrelevant information. SEARCH ENGINES AND SUBJECT GUIDES The two basic approaches to searching the Web are search engines and subject guides. Search engines allow the user to enter keywords that are run against a database (most often created automatically, by "spiders" or "robots"). Based on a combination of criteria (established by the user and/or the search engine), the search engine retrieves WWW documents from its database that match the keywords entered by the searcher. It is important to note that when you are using a search engine you are not searching the Internet "live", as it exists at this very moment. Rather, you are searching a fixed database that has been compiled some time previous to your search. While all search engines are intended to perform the same task, each goes about this task in a different way, which leads to sometimes amazingly different results. Factors that influence results include the size of the database, the frequency of updating, and the search capabilities. Search engines also differ in their search speed, the design of the search interface, the way in which they display results, and the amount of help they offer. In most cases, search engines are best used to locate a specific piece of information, such as a known document, an image, or a computer program, rather than a general subject. Examples of search engines include: * Alta Vista (http://altavista.digital.com) * Excite (http://www.excite.com) * HotBot (http://www.hotbot.com) * Infoseek (http://www.infoseek.com) * Northern Light (http://www.northernlight.com) The growth in the number of search engines has led to the creation of "meta" search tools, often referred to as multi-threaded search engines. These search engines allow the user to search multiple databases simultaneously, via a single interface. While they do not offer the same level of control over the search interface and search logic as do individual search engines, most of the multi-threaded engines are very fast. Recently, the capabilities of meta-tools have been improved to include such useful features as the ability to sort results by site, by type of resource, or by domain, the ability to select which search engines to include, and the ability to modify results. These modifications have greatly increased the effectiveness and utility of the meta-tools. Popular multi-threaded search engines include: * Dogpile (http://www.dogpile.com) * Inference Find (http://www.infind.com) * Metacrawler (http://www.metacrawler.com) * ProFusion (http://profusion.ittc.ukans.edu/) Subject guides are hierarchically organized indexes of subject categories that allow the Web searcher to browse through lists of Web sites by subject in search of relevant information. They are compiled and maintained by humans and many include a search engine for searching their own database. Subject guide databases tend to be smaller than those of the search engines, which means that result lists tend to be smaller as well. However, there are other differences between search engines and subject guides that can lead to the latter producing more relevant results. For example, while a search engine typically indexes every page of a given Web site, a subject guide is more likely to provide a link only to the site's home page. Furthermore, because their maintenance includes human intervention, subject guides greatly reduce the probability of retrieving results out of context. Because subject guides are arranged by category and because they usually return links to the top level of a web site rather than to individual pages, they lend themselves best to searching for information about a general subject, rather than for a specific piece of information. Examples of subject guides include: * Galaxy (http://www.einet.net) * LookSmart (http://www.looksmart.com) * Magellan (http://www.mckinley.com) * Yahoo (http://www.yahoo.com) Specialized subject guides Due to the Web's immense size and constant transformation, keeping up with important sites in all subject areas is humanly impossible. Therefore, a guide compiled by a subject specialist to important resources in his or her area of expertise is more likely than a general subject guide to produce relevant information and is usually more comprehensive than a general guide. Such guides exist for virtually every topic. For example, Voice of the Shuttle (http://humanitas.ucsb.edu) provides an excellent starting point for humanities research. Film buffs should consider starting their search with the Internet Movie Database (http://us.imdb.com). Just as multi-threaded search engines attempt to provide simultaneous access to a number of different search engines, some web sites act as collections or clearinghouses of specialized subject guides. Many of these sites offer reviews and annotations of the subject guides included and most work on the principle of allowing subject experts to maintain the individual subject guides. Some clearinghouses maintain the specialized guides on their own web site while others link to guides located at various remote sites. Examples of clearinghouses include: * Argus Clearinghouse (http://www.clearinghouse.net) * The Mining Company (http://www.miningco.com) * WWW Virtual Library (http://www.vlib.org) SEARCH STRATEGY Regardless of the search tool being used, the development of an effective search strategy is essential if you hope to obtain satisfactory results. A simplified, generic search strategy might consist of the following steps: * Formulate the research question and its scope * Identify the important concepts within the question * Identify search terms to describe those concepts * Consider synonyms and variations of those terms * Prepare your search logic This strategy should be applied to a search of any electronic information tool, including library catalogues and CD-ROM databases. However, a well-planned search strategy is of especially great importance when the database under consideration is one as large, amorphous and evolving as the World Wide Web. Along with the characteristics already mentioned in the Introduction, another factor that underscores the need for effective Web search strategy is the fact that most search engines index every word of a document. This method of indexing tends to greatly increase the number of results retrieved, while decreasing the relevance of those results, because of the increased likelihood of words being found in an inappropriate context. When selecting a search engine, one factor to consider is whether it allows the searcher to specify which part(s) of the document to search (eg. URL, title, first heading) or whether it simply defaults to search the entire document. Boolean logic is the term used to describe certain logical operations that are used to combine search terms in many databases. The basic Boolean operators are represented by the words AND, OR and NOT. Variations on these operators, sometimes called proximity operators, that are supported by some search engines include ADJACENT, NEAR and FOLLOWED BY. Whether or not a search engine supports Boolean logic, and the way in which it implements it, is another important consideration when selecting a search tool. The following diagrams illustrate the basic Boolean operations. AND OR NOT SEARCH TIPS In most cases, an effective search strategy, the correct use of Boolean logic, and familiarity with the features of each of the search engines will lead to satisfactory results. However, there are additional techniques that may further improve your results in particular circumstances. The following search tips apply to one or more of the search engines discussed in this workshop. Ctrl-F: After following a link to a document retrieved with a search engine, it is sometimes not immediately apparent why the document has been retrieved. This may be because the words for which you searched appear near the bottom of the document. A quick method of finding the relevant words is to type Ctrl-F to search for the text in the current document. Bookmark your results: If you are likely to want to repeat a search at a later date, add a bookmark to your current search results. Right truncation of URLs: Often, a search will retrieve links to many documents at one site. For example, searching for "Okanagan University College Library" will retrieve not only the OUC Library home page (http://www.ouc.bc.ca/libr), but also any pages that contain the phrase "Okanagan University College Library", whether or not they are linked to the home page (eg. this page - http://www.ouc.bc.ca/libr/connect96/ search.htm). Rather than clicking on each URL in succession to find the desired document, truncate the URL at the point at which it appears most likely to represent the document you are seeking and type this URL in the Location box of your web browser. Guessing URLs: Basic knowledge of the way in which URLs are constructed will help you to guess the correct URL for a given web site. For example, most large American companies will have registered a domain name in the format www.company_name.com (eg. Microsoft - www.microsoft.com); American universities are almost always in the .edu domain (eg. Cornell - www.cornell.edu or UCLA - www.ucla.edu); and Canadian universities follow the format www.university_name.ca (eg. Simon Fraser University - www.sfu.ca or the University of Toronto - www.utoronto.ca). Wildcards: Some search engines allow the use of "wildcard" characters in search statements. Wildcards are useful for retrieving variant spellings (eg. color, colour) and words with a common root (eg. psychology, psychological, psychologist, psychologists, etc.). Wildcard characters vary from one search engine to another, the most common ones being *, #, and ?. Some search engines permit only right truncation (eg. psycholog*), while others also support middle truncation (eg. colo*r). Relevancy ranking: All of the search engines covered in this workshop use an algorithm to rank retrieved documents in order of decreasing relevance. (3) Consequently, it is often not necessary to browse through more than the first few pages of results, even when the total results number in the thousands. Furthermore, some search engines (eg. Alta Vista) allow the searcher to determine which terms are the most "important", while others (Excite, Infoseek) have a "more like this" feature that permits the searcher to generate new queries based on relevant documents retrieved by the initial search. These features are discussed in more detail in the following section of this document. SEARCH ENGINE COMPARISONS This section compares some of the major Web search engines, based on the following features: * Size of the database (4) * Currency of the database (5) * Number of documents retrieved * Search interface * Usefulness of online help screens to help with formulating a query * Search features * Results list display features * Other features of note Search engines were tested by searching for documents about the Indian political party Rashtriya Janata Dal on February 22, 1998. Alta Vista URL: http://altavista.digital.com Size: Over 100 million pages Retrieved: 528 documents Currency: 2 to 3 weeks (February 5, 1998) Interface: Includes simple and advanced interfaces. They are less intuitive than most other search engine interfaces but they are well documented and allow some of the most powerful searching on the Web if you are willing to learn how to use them. Both interfaces allow the use of Boolean logic, though different syntax is used in the two interfaces. The advanced interface includes options to limit a search by date and to rank results according to keywords of your choice. Alta Vista allows you to select the interface you prefer and save it with a Preferences option. Help: Clear, detailed instructions for both simple and advanced searches. Features: Boolean AND, OR and NOT, plus the proximity operator NEAR, and the ability to search for phrases by enclosing words within quotation marks; right and middle truncation with the '*' character; ability to restrict a search to certain portions of a document or type of document, eg. title, image, URL, Java applets, and links. Relevancy ranking is based on where in a document the search terms are located, the proximity of search terms to one another in the document, and the number of occurrences of a search term in the document. In July 1997, Alta Vista became the first search engine to support limiting searches by the language of documents. Twenty-five different languages are supported. Results: The default display shows the title, URL, first two lines of the document, language, date and size (in bytes) of each document. A more compact display is available in Preferences. Results are displayed in order of decreasing relevance. After you have completed a search, clicking the Refine button allows you to modify the results of a search by suggesting words that may be included or excluded from the search. Other: There are options to search for Usenet News articles, people, and businesses. Alta Vista has recently added access to LookSmart's subject guide. An automated translation service will translate Web documents from and into the major European languages. Excite URL: http://www.excite.com Size: More than 50 million Web sites Retrieved: 197 documents Currency: Unknown (6) Interface: Excite offers two interfaces: simple and Power Search. The simple interface consists of a single search box with no options for modifying or limiting a search, although all the basic Boolean operators and the + and - signs may be used. Power Search presents the searcher with a series of search boxes that allow you to perform either word or phrase searching and to instruct Excite which words and/or phrases the document CAN contain, MUST contain, and MUST NOT contain. Both interfaces are easy to use, even for a novice searcher. Help: Help is sufficiently detailed, clearly written, and well organized. As well as basic search help, Excite provides a fair amount of detail about the way its technology works, which is unusual among search engines. Features: Boolean AND, OR and NOT, + and - to include and exclude words. Power Search also permits phrase searching and more advanced control over words and phrases to include and exclude. After results have been displayed, a "More Like This" link permits the searcher to search for documents that are "like" the one displayed. (7) Excite uses a form of relevancy ranking to sort results but its documentation does not enumerate the criteria used to determine relevance. Results: For each document, Excite displays title, URL, brief summary and "relevance" level, as a percentage (how this number is calculated is not explained). By default, results display in order of decreasing relevance but the searcher may choose instead to display the forty most "relevant" results grouped by Web site. This feature is extremely useful when a search retrieves a large number of results from one or more sites and would be even more useful if it displayed more than the top forty sites in this manner. Other: Excite has entered the field of "push" technology - whereby a server automatically sends web pages to your desktop, based on criteria you have provided - with what it refers to as "Channels" (using a television analogy). In some cases, the channels are simply renamed versions of existing Excite features. For example City.Net - a guide to the web by geographical region - has been renamed the "Travel Channel" and NewsTracker - current news stories from over 300 sources - is now the "News Channel". Excite also includes a hierarchical subject guide, web site reviews, and the ability to search for Usenet News articles. HotBot URL: http://www.hotbot.com Size: More than 50 million documents Retrieved: 599 documents Currency: Less than one hour (February 22, 1998) Interface: HotBot offers two interfaces: a default (not to say simple, as it offers more options than most search engines' advanced interfaces) and a SuperSearch. Both interfaces feature pull-down menus for modifying search criteria (for example, to switch between word and phrase searching), and for restricting searches by date, geographical location, and domain name. These pull-down menus make available advanced search features to users who otherwise might be intimidated by the complex Boolean logic needed to perform a similar search in AltaVista. Check boxes are used to limit searches to particular types of media. Help: Clicking on Help takes you to a page with links to several different files. Of these, the two most useful are "Getting Started", which answers basic questions about searching, and a FAQ (Frequently Asked Questions) file that provides the most comprehensive documentation of any of the major search engines. Features: Search options include "all of the words", "any of the words", "the exact phrase", "the person" (HotBot automatically rotates search terms, so a search for "Bill Gates" will look for "Bill Gates" and "Gates, Bill"), "links to this URL", and "the Boolean phrase". Other options include restricting searches by date, by Internet domain (eg. .edu or www.okanagan.bc.ca), and by media type (eg. Java, Audio, Image, VRML). Relevancy ranking is based on a combination of search term frequency, location of search terms in the document, and other criteria. Results: HotBot offers three options: full descriptions, brief descriptions, and URLs only. The full display includes the document title, the first few lines of text, URL, size (in bytes), and date. The brief display includes the title and the first ten words. Results are displayed in order of decreasing relevance. Other: There are options to search for Usenet News articles, online news services, businesses, people, e-mail addresses, classified ads, and shareware. Like AltaVista, HotBot provides access to LookSmart's subject guide. Infoseek URL: http://www.infoseek.com Size: More than 50 million URLs Retrieved: 25 documents Currency: 2 to 3 weeks (February 6, 1998) Interface: Simple interface only, with the ability to search for certain types of Internet documents, eg. Usenet News, online news services, and company information. Help: Documentation is sufficiently detailed and clearly written. One minor inconvenience is the division of the help documentation into several separate files. Features: Boolean OR is the default search logic. Infoseek also allows variations of the AND and NOT operators, using the + and - characters. Search for phrases by enclosing words within quotation marks. Searches may be restricted to certain portions of a document or certain types of document (eg. titles, links, URLs, documents found at a particular site). Relevancy ranking is based on the location of search terms in the document, the number of occurrences of search terms in the document, and the frequency with which terms appear in the Infoseek database (words that are less common in the database are given a higher weight). Results: Results are displayed in order of decreasing relevance. The default display includes the document title, URL, date, size (in bytes), and first three lines of text. There is an option to show only the URL and size. After results have been displayed, there is an option to modify the current result list by selecting "Search only these results" and adding additional search terms and operators. Infoseek is unique among the major search engines in allowing you to search within a previous set of results, although AltaVista has a similar feature, the difference being that AltaVista forces you to choose from a list of words it suggests, rather than allowing you to choose your own search terms. Other: Infoseek includes a number of additional features, including a hierarchical subject guide, which Infoseek refers to as "Channels", searching for current news articles, searching Usenet News articles, company profiles from Hoover's Online, street maps of the United States, and "Big Yellow", a WWW yellow pages directory. Northern Light URL: http://www.northernlight.com Size: Not available. However, in its documentation, Northern Light claims to index "every page of every Web site." This is rather unlikely, although tests have shown that Northern Light has one of the three largest databases, along with HotBot and Alta Vista. (8) Retrieved: 556 documents Currency: 5 to 6 weeks (January 16, 1998) Interface: Simple interface with a single search box and no options to modify the search logic from the initial screen. Help: The documentation is adequate but could be more detailed and better organized. One of the reasons for the relatively brief documentation may be the fact that Northern Light does not offer many sophisticated search options compared to some of the other search engines and, as a result, is more straightforward to use. Features: The default search logic is Boolean AND. Northern Light does not support full Boolean expressions but does allow the use of OR and NOT and + and - (to include and exclude terms), as well as phrase searching with quotation marks. Results: By default, results are displayed in order of decreasing relevance. The display includes the document title, type of document (e.g. article, review, encyclopedia entry), first 30 words of the document, date, type of site (e.g. Educational, Non-profit, Commercial, presumably based on the URL), and URL. Northern Light also offers the option, unique among search engines, of sorting results based on Custom Search Folders, which are unique to each result list. These folders organize your results into folders that may be of four types: Subject, Type of document, Source, and Language. Clicking on one of these folders displays only those results that have been included in that category. Results are then further sub-divided within each category. This very useful feature provides a method of focusing your results based on the context of the documents retrieved. Other: Along with its database of WWW documents, Northern Light offers fee-based access to a "Special Collection" of over 2 million documents from 2,900 information sources including books, magazines, academic journals, and online news services. While some of these sources are available elsewhere on the Web for free, most are not, and as Northern Light states in its documentation, nowhere else are they available from a single source. Prices are set by each information provider and range from US $1.00 to $4.00 per article. Free abstracts are available for all Special Collection documents. You may choose to search either or both of Northern Light's collections in a single search. EXERCISES Exercise 1 - Search Engines: Select one topic from the list below. Use at least two different search engines to search for information about your topic. Compare the results you retrieve from each. When comparing your results, consider the following points, among others: * How easy or difficult was it to figure out how to search? * Was there adequate documentation to help you formulate your search? * How many results did you retrieve? * What proportion of the results were relevant to your perceived information requirements? * How current were the results? * Was the amount of detail displayed with the results adequate? * Was the order in which the results were displayed evident and/or logical? * What other features contribute to (or detract from) the search engine's utility? Suggested topics: Internet 2 | Mad Cow Disease | The Beat Generation | Tupac Amaru | Microbreweries Links to search engines: Alta Vista | Excite | HotBot | Infoseek | Northern Light Exercise 2 - Multi-threaded search engines: Use a multi-threaded search engine to search for information about the topic you researched in Exercise 1. Compare your results with those you retrieved with an individual search engine. Also consider the same points as you did above. Links to multi-threaded search engines: Dogpile | Inference Find | Metacrawler | ProFusion Exercise 3 - Subject guides: (a) Browse the Yahoo subject categories (do not use the search function) to find information about the topic you researched in Exercise 1. Assuming you were able to find some relevant information (it is possible that you will not), consider the following points: * Which of the two methods - search engines or subject guides - do you consider was more successful or more appropriate? Why? * What are the advantages and disadvantages of each method? (b) Use Yahoo's search form to search for information on the same topic as you did in (a). * Which method of searching Yahoo was more successful? Why? * How does searching Yahoo differ from searching a large search engine database? (c) Browse and/or search a clearinghouse of specialized subject guides to see if there is a subject guide relevant to your topic. If so, peruse it to see if it would be useful for finding information on your topic. Links to clearinghouses: Argus Clearinghouse | The Mining Company | WWW Virtual Library Exercise 4 - Test your skills: Choose a topic that interests you, or one from the list below. Use one or more tools of each type discussed above to search for information about your topic. Try to incorporate some or all of the following into your searches: * Experiment with both simple and advanced interfaces * Read the search tool's documentation for instructions on how to search * Use Boolean logic, proximity operators, wildcard characters and phrase searching * Use one or more of the search tips discussed above * Where applicable, use a search engine's "more like this" feature to generate new searches * Try different methods of displaying results, including sorting where applicable Suggested topics: * What is the relationship between Mad Cow Disease and Creutzfeldt-Jakob Syndrome? * Is it possible to infect your computer's hard drive with a virus by running programs over the Internet that use the Java programming language? * Is there any information on the Internet on Canadian microbreweries, brew pubs, or brew on premises shops? * I am hoping to see the movie The Sweet Hereafter. I would like to find reviews of the movie, any information about the novel on which it was based, and biographical information about the movie's director, Atom Egoyan. Links to search tools: SEARCH ENGINES | Alta Vista | Excite | HotBot | Infoseek | Northern Light MULTI-THREADED SEARCH ENGINES | Dogpile | Inference Find | Metacrawler | ProFusion SUBJECT GUIDES | Galaxy | LookSmart | Magellan | Yahoo CLEARINGHOUSES | Argus Clearinghouse | The Mining Company | WWW Virtual Library SEARCHING FOR PEOPLE The scope of this workshop does not include searching for people on the Internet. However, because it is a common question, the following section is included as an appendix. While there is no master Internet directory of e-mail addresses, there is an increasing number of specialized tools on the Web that are designed to search for people's e-mail addresses, telephone numbers, and postal addresses. The resources listed below are all useful but none is comprehensive. It may be necessary to search several databases, including some that are not listed here. E-Mail Addresses: * Four11 (http://www.Four11.com) * WhoWhere? (http://www.whowhere.com) Postal Addresses & Telephone Numbers: * Canada 411 (http://canada411.sympatico.ca) (includes listings for all provinces, with the exception of Alberta and Saskatchewan) * Switchboard.Com (http://www.switchboard.com) (USA) REFERENCES This list is selective, rather than comprehensive. It contains references to documents, both online and in print, that were either helpful in preparing this workshop or that may be useful for those who want to know more about searching the World Wide Web. Barlow, Linda. The Spider's Apprentice - - Tips on Searching the Web. August 9, 1998. http://www.monash.com/spidap.html Basch, Reva. Find Anything Online. October 14, 1997. http://www1.zdnet.com/complife/fea/9708/findny10.html Cohen, Laura. Searching the Internet: Recommended Sites and Search Techniques. August 17, 1998. http://www.albany.edu/library/internet/search.html Grossan, Bruce. Search Engines: What they Are, How They Work, and Practical Suggestions for Getting the Most Out of Them. February 21, 1997. http://webreference.com/content/search/ Kriesel, Ronald W. Suggested Internet Research Strategies. June 6, 1998. http://www.concentric.net/~Rkriesel/Search/ Strategies.shtml Lake, Matthew. Search Engine Shoot-out. August 1, 1997. http://www4.zdnet.com/pccomp/features/excl0997/sear/ sear.html Lidsky, David and Kwon, Regina. "Your Complete Guide to Searching the Net." PC Magazine 16.21 (December 2, 1997): 227-?. Also available: http://www.zdnet.com/pcmag/features/websearch/ _open.htm Lynch, Clifford. "Searching the Internet." Scientific American 276.3 (March 1997): 52-56. Also available: http://www.sciam.com/0397issue/0397lynch.html Notess, Greg R. "Comparing Net Directories." Database 20.1 (February 1997): 61-64. Also available: http://www.onlineinc.com/database/FebDB97/nets2.html ________. "Internet Search Techniques and Strategies." Online 21.4 (July 1997): 63-66. Also available: http://www.onlineinc.com/onlinemag/JulOL97/net7.html ________. "Measuring the Size of Internet Databases." Database 20.5 (October 1997). Also available: http://www.onlineinc.com/database/OctDB97/net10.html ________. Search Engines Showdown. March 13, 1998. http://imt.net/~notess/search/index.html ________. "Toward More Comprehensive Web Searching: Single Searching Versus Megasearching." Online 22.2 (March 1998): 73-76. Also available: http://www.onlineinc.com/onlinemag/OL1998/net3.html Searching the Internet. June 17, 1998. http://wwwscout.cs.wisc.edu/scout/toolkit/searching/ index.html Stanley, Tracey. "Meta-Searching on the Web." Ariadne 14 (March 1998). http://www.ariadne.ac.uk/issue14/search-engines/ Sullivan, Danny. Search Engine Watch. http://searchenginewatch.com/ (Not just a document, but an entire Web site devoted to Internet search engines; updated constantly.) Tillman, Hope N. Evaluating Quality on the Net. November 13, 1997. http://www.tiac.net/users/hope/findqual.html Tillman, Hope N. and Howe, Walt. Internet Tips and Tricks. November 16, 1997. http://www.tiac.net/users/hope/il97/tips97.htm Westera, Gillian. Using the Best Search Engine for Your World Wide Web Research. July 4, 1997. http://www.curtin.edu.au/curtin/library/staffpages/ gwpersonal/senginestudy/zindex.htm Wighton, D. Searching FAQs. March 12, 1998. http://www.cln.org/searching_faqs.html Zorn, Peggy et al. "Advanced Web Searching: Tricks of the Trade." Online 20.3 (May 1996): 14-28. Also available: http://www.onlineinc.com/onlinemag/MayOL/zorn5.html FOOTNOTES 1. Steve Lawrence and C. Lee Giles, "Searching the World Wide Web." Science 280 (April 3, 1998), 100. 2. Gus Venditto, "Search Engine Showdown." Internet World 7.5 (May 1996), 79. 3. This algorithm is usually based on one or more of the following criteria: the number of times the search terms appear in the document; the location of the search terms in the document (eg. the appearance of a given word in a document's title produces a higher ranking than the appearance of the same word in the body of another document); the proximity of search terms to one another within a document; the number of times a term appears in the search engine's database. 4. As stated by the documentation for each search engine, as of February 21, 1998. 5. Currency was tested by searching for documents about the the Indian political party Rashtriya Janata Dal on February 22, 1998. All of the search engines tested, with the exception of Excite, include the date that the document was added to the database or the date it was last confirmed, which ever is later, as part of the results list display. Where the date was not present in the result list display, it was necessary to look at a sample of the documents retrieved, in order to estimate the date of the most current documents. The date listed under this heading is that of the most recent document that could be positively identified. 6. Excite's results included several documents dated during the month of February 1998. However, because Excite does not provide information about the date an item was added to its database, there is no way of knowing when these items were added to the database or if they were modified since addition to the Excite database. For this reason, no currency information is included for Excite. 7. The "More Like This" feature is sometimes, but not always, an effective method of retrieving relevant documents. During testing for this workshop, clicking on the "More Like This" link in some cases retrieved documents on similar subjects, while in other cases it returned lists of documents located at the same web site as the original document. Excite's documentation about this feature states that it will "find more sites similar to the result you selected," but does not define "similar". 8. Greg R. Notess, Search Engines Statistics: Database Size, May 31, 1998. http://imt.net/~notess/search/statsize.html (c) 1996, 1997, 1998 Ross Tyner. This document may be linked to, downloaded, printed or copied for non-commercial use without further permission of the author, provided the content is not modified and this statement appears at the bottom of the page. Any use not stated above requires the written consent of the author. The distribution of a copy of this document via the Internet or other electronic medium without the written permission of the author is expressly prohibited. On the Web since May 1996. Last updated September 28, 1998. Okanagan University College Library Home Page Document URL: http://www.ouc.bc.ca/libr/connect96/search.htm ---------- End of Document