Searching & Researching on the Internet & the World Wide Web
Karen
Hartman, Simpson
Library, Mary Washington
College
Ernest
Ackermann, Department
of Computer Science, Mary Washington
College
Critical thinking skills
have always been important to the process of searching
for and using information from media such as books, journal articles, radio
broadcasts, television reports, and so forth. With the advent of the Internet
and World Wide Web, these skills have become even more crucial. Traditional
books and journal articles need to pass some kind of editorial scrutiny
before being published. Web pages, however, can appear without a single
person ever reading them through to check for accuracy. Libraries have
collection development policies that govern what material they will and
will not buy; the Internet and Web, having no such policies, collect everything.
This isn't to say that there isn't quality on the Internet. There are thousands
of high caliber Web pages and well-regarded databases. In order to find
these quality resources, we must make it our responsibility to
-
evaluate our information needs
-
choose an appropriate search
tool
-
formulate a search expression
that will select the most relevant resources
-
decide, using well-established
guidelines, whether the Web page or Internet resource is worth using in
our research paper or project
The First Step: Evaluate Your Information
Needs
Before you get online and
start your search for information, think about what types of material you're
looking for. Are you interested in finding facts to support an argument,
authoritative opinions, statistics, evaluative reports, descriptions of
events, images, or movie reviews? Do you need current information or facts
about an event that occurred 20 years ago? When are you sure the Web is
a smart place to start? A reference book in your library may have the information
you need and you'll find it more quickly. It may seem that the Web would
contain all the information that you require but this is not always the
case.
Types of Information Most Likely
Found on the Internet and World Wide Web
-
Current information. Many newspapers
and popular magazines provide Web versions of their publications and news
updates throughout the day. Current financial and weather information is
also easily accessible. For an example, see TotalNEWS,
http://www.totalnews.com for links to
dozens of news sources.
-
U.S. government information.
Most federal, state, and local government agencies provide statistics and
other information freely and in a timely manner. The University
of Michigan's Document Center, http://henry.ugl.lib.umich.edu/libhome/Documents.center,
is an excellent starting point for government documents.
-
Popular culture. It's easy to
find information on the latest movie or best-selling book. Try Amazon.com,
at http://www.amazon.com
for current book reviews.
-
Full-text versions of books and other materials
that are not under copyright restriction. For example, there
are Shakespeare's plays, the Bible, the Canterbury Tales, and hundreds
of other full text literary resources available. The Internet
Public Library's Online Text Collection
has collected most of them at http://www.ipl.org/reading/books
-
Business and company information.
Many companies not only provide their Web pages and annual reports, there
are also several databases that provide in-depth financial and other information
about companies. Companies
Online, at http://www.companiesonline.com
provides a database of public and private company information.
-
Consumer information. The Internet
is a virtual gold mine of information for people who are interested in
buying a particular item and want opinions from people about the item.
Try searching Deja
News, http://www.dejanews.com, the next
time you want to get opinions about that new automobile or vacuum cleaner
you are thinking of buying.
-
Medical information. In addition
to several excellent sources of medical information provided by hospitals,
pharmaceutical companies, and non-profit organizations, The National Library
of Medicine has provided the MEDLINE database to the public for free since
late 1997. Check out
PubMed's MEDLINE, http://www.ncbi.nlm.nih.gov/PubMed,
for your medical research questions.
-
Unique archival sites. For example,
the Library
of Congress' American Memory Collection, http://lcweb2.loc.gov/ammem/mdbquery.html,
provides full-text documents, musical recordings, photographs, maps, and
more about certain periods of American history.
Some Reasons Why the World Wide Web Won't Have
Everything You Are Looking For
-
Publishing companies and authors
who make money by creating and providing information will choose to use
the traditional publishing marketplace and not make the information free
via the Internet.
-
Scholars most often choose to
publish their research in reputable scholarly journals and university presses
rather than use the Web to distribute their research. Surely more academic
journals are becoming Web-based, but these journals cost as much money
as subscribing to the paper form.
-
Several organizations and institutions
would like to publish valuable information on the Web but don't because
of a lack of staff or funding to allow it.
-
The Web tends to include information
that is in demand to a large portion of the public. The Web can't be relied
upon consistently for historical information. For example, if you needed
today's weather data for Minneapolis, Minnesota, the Web will certainly
have it. But if you wanted Minneapolis climatic data for November of 1976,
you might not find it on the Web.
Information Sources Available
on the Web
Directories
or Subject Catalogs |
|
Virtual
Libraries |
|
Specialized
Databases |
-
Specialized databases can be
comprehensive collections of hyperlinks in a particular subject area or
self-contained indexes that are searchable and available on the Web.
-
The Internet
Sleuth, http://www.isleuth.com, accesses
more than three thousand specialized databases and directories.
|
Proprietary
or Commercial Databases |
-
Proprietary or commercial databases
charge a subscription fee to use.
-
Proprietary databases have certain
value-added features that databases in the public domain do not have, for
example, databases on FirstSearch,
http://www.ref.oclc.org, have links to
library holdings information. This way you can find out which libraries
own the materials that are indexed.
-
Proprietary databases also allow
you to download information easily. For instance,
Dow Jones Interactive, http://www.djinteractive.com,
includes financial information that is commonly free to the public,
but it charges for the use of its database because it has made it much
easier for the user to download the information to a spreadsheet program.
-
Proprietary databases often index material that others do
not. The information is distinguished by its uniqueness, its historical
value, or its competitive value. For example, Dialog,
http://www.dialogweb.com includes difficult-to-find
private company financial information and Infotrac's
Searchbank http://library.iacnet.com and
Lexis-Nexis Academic Universe,
http://www.lexisnexis.com contain the
full-text of hundreds of journal articles.
-
Proprietary database systems
are more responsible to their users. Because they cost money, they are
more apt to provide training and other user support, such as distributing
newsletters that update their services.
-
There are also databases on
the Web that are free to the public but charge if you want the full text
of the articles indexed. The
Electric Library, http://www.elibrary.com
and Northern Light, http://www.northernlight.com,
are examples of this type of database.
|
Search
Engines |
|
Meta-search
Tools |
|
Library
Catalogs on the Web |
|
Email
Discussion Groups |
-
Email discussion groups are
sometimes called interest groups, listserv, or mailing
lists. Internet users join, contribute to, and read messages to the
entire group through email. Several thousand different groups exist.
-
Several services let you search
for discussion groups. One is Liszt,
http://www.liszt.com
|
Usenet
Newsgroups |
-
Usenet newsgroups are collections
of group discussions, questions, answers, and other information shared
through the Internet. The messages are called articles and are grouped
into categories called newsgroups. The newsgroups number in the thousands,
with tens of thousands of articles posted daily.
-
Many search engines include
the option of searching archives of Usenet articles, and some services—such
as Deja News, http://www.dejanews.com—keep
large archives of Usenet articles.
|
Learn the Features and Capabilities
of a Search Tool or Service
-
Get to know the features and
capabilities of the search tool you'll use.
-
Click on Help or Tips.
(Read it!)
-
See if there is a FAQ (Frequently
Asked Questions) -- Browse through it.
Features:
-
What type of Boolean expressions
does it support? (AND, OR, NOT, + -)
-
What about 'wild cards'? (What's
matched by comput* ?)
-
Does it support phrase searching?
-
Proximity? (Terms 'near' each
other.)
-
Field Searching? (title, URL,
domain, etc…)
-
Can you limit results by date
or domain?
-
Can you make choices about the
way results are reported?
-
How are results reported?
-
Is it possible to narrow or
revise a search?
-
Is help provided for forming
search expressions?
-
What's the coverage?
Common Search Features:
Boolean operators |
-
use AND to require that two terms be present, for example, global AND warming means that both global and warming be present
-
use OR to require that one or both of two terms be present, for example, global OR warming means that either global or warming, or both terms will be present
-
use NOT to require that a term not be present, for example, global NOT warming means that we will get results that include global but don't include the term warming.
|
Implied
Boolean operators |
-
use + to require a term
be present, +term means term must be present
-
use - to exclude a term,
-term means term must not be present
|
Phrases |
-
use two quotation marks to enclose
a phrase, terms must appear in the order given; for example "gibson acoustic
guitar"
|
Truncation
or Wild Cards |
-
use * to represent different
endings for a word; for example comput* would be used to match terms computer,
computing, computers, computation
|
Field Searching
|
-
Web pages can be broken down into many parts. These parts, or fields, include titles, URLs, text, summaries or annotations (if present), text, and so forth. Field searching is the ability to limit your search to certain fields. This ability to search by field can increase the relevance of the retrieved records. For example, to search the Web for an image of a comet, limit your search results to Web pages that contain images that have the word comet in their filenames.
|
Limiting by Date
|
-
Some search engines allow you to search the Web for pages that were added to the database between certain dates. In limiting by date, you can find only the pages that were entered in the past month, in the past year, or in a particular year
|
Relevancy Ranking:
Most search engines measure each Web page's relevance to your search query and arrange the search results from the most relevant to the least relevant. This is called relevancy ranking. Each search engine has its own algorithm for determining relevance, but it usually involves counting how many times the words in your query appear in the Web pages. In some search engines, a document is considered more relevant if the words appear in certain fields, for example, the title or summary field. In other search engines, relevance is determined by the number of times the keyword appears in a Web page divided by the total number of words in the page. This gives a percentage, and the page with the largest percentage appears first on the list.
Basic Search Strategy: The
Ten Steps
The following list provides a guideline for you to follow
in formulating search requests, viewing search results, and modifying search
results. These procedures can be followed for virtually any search request,
from the simplest to the most complicated. For some search requests, you
may not want or need to go through a formal search strategy. If you want
to save time in the long run, however, it's a good idea to follow a strategy,
especially when you're new to a particular search engine.
A basic search strategy can help you get used to each
search engine's features and how they are expressed in the search query.
Following the 10 steps will also ensure good results if your search is
multifaceted and you want to get the most relevant results.
-
Identify the important concepts of your search.
-
Choose the keywords that describe these concepts.
-
Determine whether there are synonyms, related
terms, or other variations of the keywords that should be included.
-
Determine which search features may apply,
i.e., truncation, proximity operators, Boolean operators, etc.
-
Choose a search engine.
-
Read the search instructions on the search
engine’s home page. Look for sections entitled help, advanced search, frequently
asked questions, etc.
-
Create a search expression, using syntax,
which is appropriate for the search engine.
-
Evaluate the results the results. Were the
results relevant to your query?
-
Modify your search if needed. Go back to steps
2-4 and revise your query accordingly.
-
Try the same search in a different search
engine, following steps 5-9 above.
Search Tips
For multi-faceted searches
a full-text database is best. For a search involving one facet like a person’s
name or a phrase without stop words, search engines that provide keyword
indexing will be sufficient.
After determining whether
your search has yielded too few Web pages (low recall), there
are several things to consider:
-
Perhaps the search expression
was too specific; go back and remove some terms that are connected by ANDs.
-
Perhaps there are more possible
terms to use. Think of more synonyms to OR together. Try truncating more
words if possible.
-
Check spelling and syntax (a
forgotten quotation mark or a missing parentheses)
-
Read the instructions on the
help pages again.
If your search has given you
too many results with many not on the point of your topic
(high recall, low precision), consider the following:
-
Narrow your search to specific
fields, if possible.
-
Use more specific terms; i.e.,
instead of sorting, use a specific type of sorting algorithm.
-
Add additional terms with AND
or NOT.
-
Remove some synonyms if possible.
A search example using the 10 steps
Information on the World Wide
Web About Evaluating Resources
Evaluating and Verifying Resources
When we access or retrieve something on the Internet
we need to be able to decide whether the information is useful, reliable,
or appropriate for our purposes.
Guidelines
Who
is the author or institution? |
-
If the author is a person, does
the resource give biographical information?
-
If the author is an institution,
is there information provided about it?
-
Have you seen the author’s or
institution’s name cited in other sources or bibliographies?
-
The URL can give clues to the
authority of a source. A tilde ~ in the URL usually indicates that it is
a personal page rather than part of an institutional Web site.
|
How
current is the information? |
-
Is there a date on the Web page
that indicates when the page was placed on the Web?
-
Is it clear when the page was
last updated?
-
Is some of the information obviously
out-of-date?
-
Does the page creator mention
how frequently the material is updated
|
Who
is the audience? |
-
Is the Web page intended for
the general public, scholars, practitioners, children, etc.? Is this clearly
stated?
-
Does the Web page meet the needs
of its stated audience?
|
Is
the content accurate and objective? |
-
Are there political, ideological,
cultural, religious, or institutional biases?
-
Is the content intended to be
a brief overview of the information or an in-depth analysis?
-
If the information is opinion
is this clearly stated?
-
If there is information copied
from other sources is this acknowledged? Are there footnotes if necessary?
|
What
is the purpose of the information? |
-
Is the purpose of the information
to inform, explain, persuade, market a product, or advocate a cause?
-
Is the purpose clearly stated?
-
Does the resource fulfill the
stated purpose?
|
Tips
-
Look for the name of the author
or institution at the top or bottom of a Web page.
-
Go to the home page for the
site that hosts the information to find out about the organization.
-
To find further information
about the institution or author use a search engine to see what related
information is available on the Web.
-
Use Deja News, http://www.dejanews.com,
or another tool to search archives of Usenet articles to find other information
about the author or institution, and in the case of an individual to see
what sorts of articles they’ve posted on Usenet.
-
Check the top and bottom of
a Web page for the date the information was last modified or updated. If
no date is present look at the Document Info if you’re using Netscape or
the Properties if you’re using Microsoft Internet Explorer.
Some
techniques you can apply to help with evaluation:
Who is the author or institution?
-
If the author is a person, does
the resource give biographical information?
Look for the name of the author or institution at the top or bottom of
a Web page.
-
If the author is an institution,
is there information provided about it?
Go to the home page for the site that hosts the information to find out
about the organization. You do this by extracting the first part of the
URL - the part starting with http:// up to the first slash (/).
-
The URL can give clues to the
authority of a source. A tilde ~ in the URL sometimes indicates that it
is a personal page rather than part of an institutional Web site.
-
Make note of the domain section
of the URL, as follows:
Domain |
Description |
.edu |
educational
(anything from serious research to zany student pages) |
.gov |
governmental
(usually dependable) |
.com |
commercial
(may be trying to sell a product) |
.net |
network
(may provide services to commercial or individual customers) |
.org |
organization
(non-profit institutions; may be biased) |
-
Use search tools for Web pages
and Usenet postings (www.dejanews.com)
to learn more about the author/institution.
-
Use WHOIS Service at rs.internic.net
to determine the registrant of the Web site. Use the domain name - not
the URL. For example to check the page listed above "Teen Violence" http://www.worldahead.org/wam/9807/w9807f1.html
, use worldahead.org for the WHOIS search
How current is the information?
-
Is there a date on the Web page
that indicates when the page was placed on the Web?
-
Is it clear when the page was
last updated?
-
If it's not clear from the Web
page the click on View in Netscape menu bar and select Page Info to see
if that tells when page was last updated.
-
Is some of the information obviously
out-of-date?
-
Does the page creator mention
how frequently the material is updated?
Last Modified Tuesday, October
13, 1998.