Human Development
Searching the Web*
Index
Introduction
Searching the Web
Productivity in a Ubiquitous Environment
Active Searching
Basic Searching Aids
Bookmarks
Search Strategies
Starting Points
Searching Educational Databases
Building Your Own Search Page
Robots, Spiders, Worms, WebAnts, and Agents
Heuristics, Metaheuristics, and Knowledge Management Tools
Conclusion

Introduction

Many teachers and trainers are beginning to use the web on a regular basis and are looking for guidance and organizational strategies for what appears to be a chaotic and certainly a ubiquitous environment. They find that simply accessing personal web pages does not provide a useful scheme for finding information to perform higher level operations such as research. They are also finding out that traditional methods of searching for information do not apply in the distributed resource environment of the world wide web. Educators and Trainers need unique searching strategies for efficiently and effectively finding information in this dynamic environment.

Searching the Web

There are three main factors to improving searches on the web: 1) efficiency (speed to increase the number of hits per minute), 2) effectiveness (accuracy to get what you wanted to get), and 3) extraction (strategies for extracting information from a text in the most efficient ways possible). This page will concentrate on efficiency (speed) and effectiveness (accuracy), or what I term active searching.

Active searching is an important concept. For example, when you are reading it is often useful to highlight, underline, and annotate the text. On the web it is also necessary to use strategies which improve your efficiency and effectiveness. These include the appropriate use of searching techniques and tools. Active searching also helps keep you focused on the task at hand and avoids wondering...or in web terms...surfing.

When you begin to examine this topic, you will soon find that using just "a search engine" may or may not be good enough to meet the goals of efficiency and effectiveness. Full-text keyword searches are easily implemented and commonly offered by search engines. But full-text searching simply may not be the right tool for the job you have to do...in the way you need to do it.

Productivity in a Ubiquitous Environment

If you have browsed the web, you have surely found that it contains vast amounts of both useful and non useful information based on your interests. But, as the information contained on the web continues to grow exponentially, filtering through what I term non useful (i.e., advertisements, personal resumes, etc.) information has forced me to develop strategies that would make my job as an educator easier and not more complicated. To give you an example of this growth, although no one knows exactly how many documents there actually are on the web now, the Association of Public Data Users and International Association for Social Science Information Service and Technology (1996) state:

"Conservative estimates made in the middle of October 1995 focus on about 10 million documents, with a phenomenal growth rate of up to one million per month. In February, 1996, the Alta Vista site (the largest web index) claimed it indexed 21 million documents containing more than 10 billion words." (Home Page)
There are two main elements to developing active search strategies for the web. The first is "knowing what you want to know." That is, knowing the goal of your search. What is it you want to know after searching the web? Once you know this you can search the web to see if it is going to move you towards your goal. The second is "knowing how to find what you want to know."

Where you only need the shallowest knowledge on the topic, you can use simple locating and sampling strategies (skimming). If you need moderate levels of information, you can use collecting (scanning) strategies. As your need becomes more detailed (i.e., research), you may find it necessary to use conceptual strategies (studying). Although all of these tools may be used for every level of information gathering, using the right tool for the job at hand may just make your job easier.

Active Searching

There are many tips and techniques to efficiently and effectively searching the web. To increase your web searching productivity try some or all of the active searching tips provided below.

Learn the capabilities of various search engines. Many search engines provide various search options such as boolean operators, zone or region specifications, domain specification, examining specific parts of documents (titles, content or link text), etc.
Most search engines will display found documents in a ranked order (number of occurrences of keyword(s), etc.) With the highest ranking shown at the top of the list. The probability is that the highest ranked hyperlink is related to your query. But, anyone who has used a search engine knows that many "treasures" may exist throughout the list. Briefly scan (the more you use search engines the better your scanning skills will become) the list of hits for how your keyword is used in the context of the listings. When you find a document that looks interesting, click on the hyperlink. When you go to the site, briefly browse the headings and content to see if it is helpful. Also, while you are examining the site, check for any hyperlinks to your topic that this page may provide. If time permits, check them out. If not, set a bookmark to the page and visit it when you do have more time.
Did you get too many hits or documents (20 or more) from your search? This is the common problem. Remember, the more focused or narrow your search...the more focused and narrow the results. Narrow your search by selecting different keywords such as synonyms, use boolean operators such as "and", use phrases to be more precise, or specifying domains or regions to localize.
There are a few good search planning aids available on the web. One such aid is the "Scoping the Search Worksheet" developed by the Maricopa Center for Learning and Instruction (1996). Take a look at it and print off a copy if you believe it will help you be more efficient in your searching of the web.

Basic Searching Aids

Boolean Operators

Most search engines now offer boolean capabilities. Boolean operators express different and specific relationships between words and phrases used in the search.

AND limits a search by requiring each term must be present. For example a search on learning AND cognition specifies that you want information on BOTH learning and cognition. If an article only has the term learning in it, it will not be matched. Using AND will usually produce fewer hits.

OR expands the search by combining discrete terms into a conditional set. Searching for learning OR cognition specifies that you want information either learning or cognition. Using OR usually produces the most hits.

NOT limits the search by specifying that a term not be present. Searching for learning NOT training will find matches with the term learning but not training.

Proximity Operators

With some search engines you can use proximity operators such as OpenText's NEAR operators or Webcrawler's ADJecent or the FOLLOWED BY operator. With each of these operators, word order is important. For example: if you place square brackets such as [learning theory] causes a hit if they are found within 100 words of each other (Gray, 1966).

Truncation (*)

You can use truncation on most search engines. That is, you can use the asterisk (*) operator to end a root word. For example: searching for teach* will find teacher, teaching, and teachers. Note: the asterisk can not be the first or second letter of a root word.

Wildcard (?)

You can find words that share some but not all characters using the question mark (?) operator. For example: Johns?n will find Johnson and Johnsen. Note: the ? can not be the first character in the search.

You may also use combinations of truncation (*) and single character wildcard (?) in your searches.

Bookmarks

Throughout the process of searching for information you will find many useful sites. If you do not have time to examine these sites in detail you may either print them for off-line review or simply set a bookmark to easily return to them later. Although bookmarks are simple to set and will certainly help your overall searching, organizing your bookmarks dramatically increases your efficiency. Netscape for example allows you to organize bookmarks into folders. I suggest that you make bookmark folders thematic according to your search topic(s). Now when you add a bookmark, you can "drag" and "drop" it in the thematic folder of your choice (Win95 and Mac). This will certainly make returning to these sites for further information or citations a much easier task. Additonally, you can export these files to share with your colleagues for collaborative working arrangements.

Search Strategies

Search tools are certainly proliferating on the web. These tools have grown from early naive indexing tools to those that now use a form of artificial intelligence algorithms termed heuristics. Heuristic searching tools are designed to aid the user in learning, discovering or problem solving through self-educating techniques (i.e., feedback) to improve performance.

It is important to note that there is no best tool for searching the web. Different search engines maintain different attributes. As stated earlier, the key to deciding which search tool you should use is dependant on what you want to know, then knowing which tool(s) will best help you efficiently and effectively find that information.

To determine which search engine(s) you should use to aid you in your task, you need to know a little about various strategies they use and features they provide.

Starting Points

I have grouped search strategies into five categories (rating, sampling, locating, collecting, and concept searching). Search engines may be segregated into one or more of these categories. As search engines continue to develop, many are integrating multiple strategies into their capabilities and thus blurring these categorical "lines." But for now, I have defined each of these categories to help you understand the uses of various search engines.

The decision regarding which search engine to use depends upon your knowledge of how an engine searchers and indexes web pages. To better understand this let's look at a few examples. The Lycos indexing search engine examines only specific parts of a web page such as the title, headings, and the most significant 100 words. Where Webcrawler examines every word on a web page (Webster & Paul, 1996). But these are not the only criteria to consider when selecting a search engine. The size of the database (i.e., listings) is also a major factor.

I have provided hyperlinks to examples for each of the five categories for your examination and better understanding of their uses. I recommend that after you have reviewed these search engines that you bookmark those that are most useful to you for this course and for your professional work. This way you will not have to continually return to this web page to access your preferred search tools.

Rating Strategy (rating and reviews) - Finding rated and reviewed sites.
Use: When you want to find out how others have rated topical sites.

 

Sampling Strategy (subject trees) - Finding a few high quality sources based on topics.
Use: When you are looking for broad "trailblazer" or topical pages.

 

Locating Strategies (indexes) - Finding a list of items (sites).
Use: When you need to find a list of sites in specific databases. Newsgroups, E-Mail Lists, Addresses and Software Archives Graphic Searching

 

Collecting Strategy - Metasearch (meta indexes/multi-threaded) - Find and catalog a high number available web documents on a subject.
Use: When a comprehensive simultaneous search of multiple databases is necessary.

 

Concept Searching (heuristics, fuzzy matching and relevancy matching) - Find information on topics using feedback.
Use: When unsure about the target. If you are interested in examining additional rating, sampling, locating, collecting, and concept searching sites, you can visit the Sherlock Internet Consulting Detective which lists a large number of search strategy sites by name and description, and provides hyperlinks to each.

Searching Educational Databases

Some of the most useful research tools to be introduced in the last ten years are the CD Database Search Systems. These tools allow you to search such databases as ERIC, Dissertation Abstracts, etc. Today, these tools are migrating to the web. If you would like to perform database searches on the web you can access the Reference Services Subject Guide on the OSU web site. This site will provide you with links to database search tools for numerous topics. Within these links is the new FirstSearch Web Service. This service will allow you to search education databases such as ERIC, Education Abstracts, and many others. Give them a try, you will find these links useful whenever you need to research a topic.

Building Your Own Search Page

SEARCH.COM provides a unique feature for a regular search engine user...it's a build a personalized search engine page. This is a useful tool in that the personalizer will guide you through the process of gathering search engines that are most helpful to your work.

You can put all your favorite search engines onto your own personal page. All you need is a few minutes and a cookie-capable browser (Netscape Navigator or Microsoft Internet Explorer). The three step are: 1) select the categories you're interested in; 2) from the next page, select the search engines you're interested in (you can also add a few of your favorite Web links); 3) from the next page, tell the developers the order in which you'd like to list your choices. That's it, you should find it a very useful tool. You can go to the SEARCH.COM page now if you would like to create your personalized grouping of search engines.

Robots, Spiders, Worms, WebAnts, and Agents

Most search engines create indexes that are compiled by computer programs known as robots, spiders, webants, or worms. Robots and spiders are the same thing, but worms are technically different in that they are a replicating program, where WebAnts are distributed cooperating robots. These resources traverse the web to examine documents and indexes or enters it into a database, and recursively retrieves all documents that are referenced (Koster, 1996). These robots will follow the hyperlinks to other documents and index those also. Agents have numerous meanings in the computing arena. Agents are programs which act autonomously on a task. The most common agents found on the web are: Autonomous Agents which are programs that travel between sites and decide based on algorithms when to move and what to do; and Intelligent Agents which are programs that help users with things such as forms and heuristics. They choose a product, or guide a user through forms, or help users find information.

There are those who believe that robots are bad for the web. In fact some poorly constructed robots can overload networks and web servers and therefore may need attention if you have set up a web server. If you are concerned that you are being visited by bots and they may be causing problems with your system, check you server logs. According to Koster (1996) robots may retrieve large numbers of documents from your site in a very short period of time. One way to check this is to examine your user-agent logging and when you notice a site repeatedly looking for the file /robots.txt' it is likely that is a robot. If you have a medium or high performance server, it should be able to deal with a high load of several requests per second.

If you would like to examine a database of web robots, you can visit the Webcrawler Robots Page.

Heuristics, Metaheuristics, and Knowledge Management Tools

A number of new search tools are starting to crop up that take search strategies to new levels. These new tools are based on heuristics, metaheuristics, and the management of knowledge. These differ from index search engines and even meta-index search engines. These are "smart" search tools that learn from your input and provide feedback to assist you with more efficiently and effectively finding and managing information. A few of these tools are: Saqqara's Step Search, the Inso Search Wizard, and QUARTERDECK's Web Compass.

Inso Search Wizard is a concept based search technology. The technology helps users figure out what they really mean and want to find. The Inso Search Wizard uses what is termed "computational linguistics" to conceptually search for specific information.

QUATERDECK's Web Compass searches all search engines with a single command, purges duplicates, publishes results, and keeps watch for updates. QUARTERDECK terms this tool "your personal research assistant on the World Wide Web, Internet, and Intranet." This is certainly the way new searching toolsets are going. It is not enough that search engines just perform searches anymore, the new tools will manage your data in an intelligent way.

Conclusion

These are just a few of the strategies and new tools teachers and trainers can use to make working on the web more productive. As teachers and trainers continue to use the web they will soon see the next generation of web "knowledge tools" begin to emerge. These will include multidimensional tools that are created to manage data on the web using factors such as "virtual neighborhoods of information," "organic structuring," and "mental model based searching and flying mechanisms" (Eichmann, 1994). These are tools which are intended to make the world wide web more manageable for the user.

Let us now go back to my original statement...that the goals of search strategies and engines should be to increase your efficiency and effectiveness when looking for information on the web. Only you can decide which search/knowledge management strategies and tools actually improve your productivity. It is my hope that this article helps you with making these decisions.



Page provided by:
Dr. Mark L. Merickel
V3.0
© Copyright 1998
All rights reserved
School of Education
Oregon State University
Corvallis, OR 97331-3502