he Internet is resplendent with technologies well developed from a computing perspective but incompletely applied from an information/library perspective. Consider field searching. The use of field searching in quality assessment and traversing the Internet is poorly understood. Consider URL interpretation. So much more can be found buried in the URL than just country and organizational type. Consider another technology: Forms.
A form is the simple box for entering information on a Web page. Forms are found everywhere and used widely as doorways to databases. Delightfully, forms are portable and adaptable by nature. With minimal fuss, we can move, limit, restrict, and generally personalize the forms to publicly accessible databases - making it possible to improve and speed research.
Forms allow us to enter information on a Web page. We type our search terms into a form every time we visit Yahoo! or Google. Forms are an integral part of the Hypertext Markup Language (HTML), so all browsers support them. The underlying HTML tags are simple to understand. There are just five. Forms themselves are simple. It is only in connecting forms to something else that it quickly becomes complicated.
HIDDEN INFORMATION REVEALED
The first opportunity involves breaking the barriers separating us from non-standard Web material. As the fine cataloguing work of Gary Price demonstrates [www.resourceshofl.com], there is a vast section of the Internet beyond the reach of the search engines. Colloquially known as the hidden Web, the invisible Web, or the deep Web, this region of the Internet generally is of higher quality and has tighter organization. A perhaps typical example would be the contents of the MEDLINE database - a collection of articles and abstracts about medical research. Information is enfolded within, available for all to see, if we first negotiate the search function to find the information.
Time has shown the development of large public free databases can be enormously effective in distributing information. MEDLINE, ERIC, CRIS, LOCOC, EDGAR, SEDAR, US Patents, MOCAT: The list goes on of large commercial-quality databases that, when funded externally, are so very successful at reaching large audiences. These large publicly accessible databases are one of the few formats perfectly in tune with the Internet medium.
Unfortunately, their very existence as databases - accessed through a form - limits the way we can approach this information. To reach into and retrieve information from a database, we are restricted to the tools made available on the Web site hosting the database. As information professionals, we can do little more than point a patron to the database and exclaim: "Look! The information is in there somewhere."
Ah, but that is not quite accurate. If we can manipulate the forms to these databases, we can present patrons with a selection of just the material we think will interest them.
CONVERTING FORMS INTO LINKS
A simple approach to achieving this involves converting forms into static URLs. This is possible with forms that use Method=GET. Most notably, this includes the global search engines and MEDLINE.
For example, a search on Google.com, for form manipulation will generate a Web page with the Web address of:
This page lists the top ten matches from the Google database with the words: form AND manipulation. If we were to visit this address, we would directly repeat this search. Make the above address the destination of a link, and we have a link to the results of this search. A search on a form thus converts into a static URL.
My exploration of forms received a major boost from an article by Sunny Worel ("Integrating Medical Information into Web pages." EContent, August/September 1999) describing how MEDLINE information could be integrated more closely into Web pages. When the MEDLINE Database became available through PubMed, it became possible to craft a Web address that would point directly at a specific MEDLINE record. MEDLINE article abstracts can be treated just like Web pages. Link to them, frame them, and reference them, even though they are buried in a database.
URLs can also be created for specific PubMed searches. Like the Google search above, an important search can be cemented into a link. Anyone who clicks the link would repeat the search. As an elegant demonstration, a form of Current Awareness Service can be crafted: "Show me all the articles on corneal transplants in the last year." Such a search can be converted into an URL then used again a month or a year later.
MEDLINE allows for this kind of manipulation. The PubMed interface was designed with this in mind. Interestingly, MEDLINE is not alone. This interconnectedness is more a feature of the Internet than of MEDLINE. Almost all Method=GET databases can be manipulated in this manner.
We can do better. Many of our actions on the Internet are repetitive. If these actions are remotely related to a form, then moving forms offers a way to shorten and speed a search. We can often shave two or three steps from the research process. We can embed numerous related forms into one single page for our convenience. We can embed forms in information needed to use the tool effectively.
Consider a search of Google. If we could move the Google search form, we could place it on our own page, residing on our own computer. We would not need to wait the half-second for the page to return from Google. More significantly, on our page we can add information we need to effectively search Google; perhaps a reminder of Google's hidden field search terms.
A form starts with an HTML <form> tag. It ends with a </form> tag. The <form> tag has two primary attributes: a method (either GET or POST) and an action (a pointer to a Web address).
A basic form looks like this:
<form method=GET action=???>
There are a few simple rules. 1) Forms can't overlap. We must close one form before starting another. 2) GET and POST are different methods of sending information and the difference can be important, kind of. 3) The action=address is the location of the program that will interpret the information we are sending.
What information is sent? Another tag, called an <input> tag, comes in a variety of different flavors depending on how it is displayed on the Web page. We have the <input type=text> for a textbox. We also have the <input type=radio> for radio buttons, <input type=checkbox> for checkboxes, and then of course the <input type=submit> for a button that triggers the sending of the information to wherever it is going. One further input box is special: the <input type=hidden>. It holds information hidden from view, meaning not displayed on the Web page.
Each input tag has a name. Think of this as the variable name. Each input tag may have further attributes: perhaps a size (how long a textbox do we want), a check (which radio button do we want selected at the start), or a value (is there a word already in our textbox?). Don't concern yourself with the numerous additional attributes. Most are cosmetic or self-explanatory. Greater help with form tags can be found at "HTML 2.0: Forms and Obscurities" [www.cwru.edu/help/interHTML/toc.html].
Two other tags have a slightly different construction. The multiple line textbox looks like this: <textarea name= rows= cols=> </textarea>. The select box, where we select from a list of existing values, looks like this: <select name=> <option> <option> </select>.
Here is the form for Yahoo!, as retrieved in January 2003.
<form name=sf1 style="margin-bottom:0;margin-top:0"
<input size=30 name=p>
<input type=submit value=Search>
Don't worry about the style= on line 1. It's cosmetic. We may also notice there is no <form method=???. When not defined, it defaults to GET. Similarly when <input type=??? is not defined, as on line 2, it is assumed to be a textbox. Thus, we have a simple form starting with a <form action=???>, including a textbox (named 'p' and 30 characters long), a submit button (titled: Search) and an end to the form.
Keep in mind these tags are squeezed between other tags defining a table, some images, and perhaps a few words. I had to remove this unrelated information. To view the HTML of only the form, open the Web page with the form in Windows Explorer. Select the View drop-down menu, then select Source. The HTML for the page opens in a simple notepad. Remove everything above the <form> tag then manually delete everything not a form from there. The HTML to the form remains.
The form for the Google search, as found on Google.com, looks like this:
<form action="/search" name=f>
<input type=hidden name=hl value=en>
<input type=hidden name=ie value="UTF-8">
<input type=hidden name=oe value="UTF-8">
<input maxLength=256 size=55 name=q value="">
<input type=submit value="Google Search" name=btnG>
<input type=submit value="I'm Feeling Lucky" name=btnI>
Again there is some less than critical information. Basically it starts with a <form action="???"> (line 1), proceeds to three hidden variables (line 2, 3 & 4), a textbox (line 5), and then two submit buttons aptly called "Google Search" and "I'm Feeling Lucky". Then the form ends.
The form is a simple technique for communicating information from a Web page to a computer program. To move a form, all we must do is tease out the elements of the form from the HTML page then - very importantly - add in the destination domain, which is often left out. Thus, if it reads <form action=/search>, we replace it with <form action=http://google.com/search>. When the action=address is relative, we must make it absolute.
For a time, I set about embedding numerous forms as part of an effort to explore and present guidance on Internet research from the perspective of information research. The Spire Project once included 25 articles, each teasing out the better tools required to accomplish particular searches. Over 125 embedded forms brought similar tools together, each sunk into discussion about when to use specific tools and how to search most effectively. Further details are at "The Spire Project: innovative gateway on the process of finding information", The New Review of Information Networking, Volume 6, 2000.