1800CTO ... Helping Technology Companies Launch New Software Products ...
| Due Diligence and Strategy | Project Bootstrapping and Mentoring | Tools and Technology | Viewpoints | About Us |
Home > Viewpoints > Whitepapers > SIGIR98 > Overview
   
Sign up for the 1-800-CTO newsletter:

Enter keywords:


Find the information you need:
Services
Viewpoints
About
Clients
Events
Press

Enhancing the User Experience by Capitalizing on Link Traversal

Presented at SIGIR 98.

Andrea Michalek and Douglas Grundman

July 1998

Introduction

Hypertext information retrieval systems interact with users through two principal mechanisms: a search mechanism, wherein users give the system information describing document attributes and the system responds with documents, and a traversal mechanism, wherein users follow links that bind documents together in the hypertext corpus. These two mechanisms were developed separately, and historically have been treated independently.

More recently, with the advent of the World Wide Web, work has been done to try to merge aspects of these two functions in order to improve users’ experiences. Most of this work involves analyzing links and the directed graph of documents that forms a hypertext corpus in order to extract more information about the documents; this is typically done with an eye toward improving searching.

In this paper, we will take the opposite approach: we use information gleaned from an analysis of documents and queries in order to improve the experience of hypertext traversal. The simplest example is the highlighting of query words in documents that result from a search, but there are many others.

Search Information vs. Link Information

A user interacting with a hypertext information retrieval system accesses documents through two main methods – by searching for them and by following links to them. Accordingly, we view such a system as having two distinct kinds of navigational information that people can make use of. These are search information and link information.

The searching method of accessing documents consists of submitting a query and receiving a list of documents that satisfy it. Search information is the information that is provided by a user in order to perform a search over a corpus. People use search information by submitting it as a query with a goal of accessing documents identified by that information. The hypertext information retrieval system sees search information as a parameter of a "search" function that operates over some set of documents. The output of that search function is a collection of links to documents that meet criteria defined by the query.

The link traversal method of accessing documents consists of a user’s clicking on a link and receiving the document to which that link refers. Link information is defined as references within documents to the same or other documents – in other words, pointers within a hypertext data structure. Link information is the input parameter to a "graph traversal" function that operates over the directed graph defined by the body of hypertext. The output of the traversal function is a document.

There is a duality between these two functions that underlie the usage of hypertext systems: the search function takes nodes and generates links; the traverse function takes links and generates nodes. It follows that if link information can be used to improve a user’s search experience, then search information should be exploitable to improve a user’s traversal experience.

Improving the User Experience

Search information can be used to assist a user when traversing links. Under the assumption that the user submitting a query is interested in documents that are returned because of their relevance to the query, the following improvements to the user experience can be made:

  1. Exposing the reasons any particular document is returned from the search engine. Documents in a result list are selected by a search engine for various reasons. Displaying those reasons to the user can be helpful in permitting a quick evaluation of the value of the document. When displaying a result list, a single relevancy score per document is commonly used to convey this information. This score can easily be expanded to include per-query-term relevancy information. The concept of passing search information to the link traversal function can be exploited to achieve the next level of functionality along this path. When a user selects a document from the result list, the original query terms can be called-out to expose the same hit information at an additional level of granularity – not only what terms hit and how many times, but also where they hit.
  2. Showing the relevant part of a long document without user intervention. Documents can be very long. If a system can locate the paragraphs within a document that best match the given query and allow the user to navigate to those positions quickly and easily, the user may be able to extract what he or she finds to be the most relevant information from the document very quickly.
  3. Illuminating related metadata. It can be very useful to display metadata about a document that is relevant to the query that led to the document. For example, it is possible to build a system that knows about person names and about associations among them. If a document is located because of a query that contains the name of a person, it might be very interesting to highlight the names of other people related to the one named in the query. The same concept can be extended to other types of metadata.
  4. Propagating search information along subsidiary links. If search information is useful in illuminating a given document, it should also be useful for illuminating documents to which the first one links. The value provided by the above three improvements can be extended through the entire hypertext network that is rooted at the original document.

As the user traverses the hypertext network originating at a particular document, the search information that led to the first document becomes less important and can be de-emphasized. Additional information can be gleaned from the text of the link(s) being traversed, and after several hops the original search information can be phased out.

Technologies that Enable Implementation

Straightforward techniques can be used to exploit search information to improve the user’s traversal experience. To accomplish this, the traversal function must be extended to handle any search information that will be exposed. Typically, the traversal function is implemented by simple hyperlinks to documents, but the dynamic nature of the exposure of search information requires an active mechanism. This is most easily provided on the Web by Common Gateway Interface (CGI) programs.

To accomplish the four enhancements described in the previous section, static hyperlinks are wrapped by a CGI that implements the desired functionality. Thus on the Web, instead of having a simple URL, each hyperlink reference to a document is converted to a reference to a CGI that accepts the original URL as well as the desired search information as arguments. This CGI would fetch the referenced document and output HTML marked up to display the addit ional search information. This amounts to passing the search information to the traversal function (which is now implemented via the CGI) by annotating the link. The traversal function (CGI) uses the annotation to generate a marked-up representation of the document that displays the enhancements to the user. This same approach can be used if the original URL is a reference to a CGI as well.

Note that enhancing the traversal function in this way is dual to the way that link information can be used to enhance the search function. In both cases, the function is extended to accept additional arguments necessary for the extended functionality.

The starting place for implementation must be a dynamically generated HTML page. This requirement fits well with a traditional information retrieval system. The first result list in response to a given query is already dynamically generated by design.

Example Implementation – Calculating the Best Part of a Document

The Electric Library research product (http://www.elibrary.com) takes advantage of this type of information passing with its "Best Part" feature. The question the user asked to generate a result list is passed via a link annotation to each document the user views. The HTML document that the user will ultimately view is augmented with data about how the user arrived at the document. That augmentation is performed by the Electric Library’s "Best Part" module.

The Best Part module is handed a document and two sets of words. One set of words contains the original query terms. The second set of words contains other related words that may be of interest to the user. The module’s primary function is to locate all places in the document where any of those words (or permissible linguistic variants thereof) occur. We term all such occurrences matches. Matches are termed hits if the words they match are part of the query. The Best Part module’s second function is to distinguish one hit as special. That hit is called the Best Part of the document; it is the hit closest to the region of highest hit density in the document.

A separate module takes a document and the information provided by the Best Part module, and post-processes the document’s HTML to expose the Best Part information to the user. The hits are highlighted in the document and the Best Part is further emphasized to make it stand out from the other hits. A button is provided that contains an intra-document link to the Best Part; this allows the user to jump directly to the section of the document that is most relevant to the submitted query. In addition, intra-document links are added that lead from each hit to the next (and to the previous); this traversal mechanism allows the user to navigate from one hit to the next easily.

In addition to query terms, matches resulting from linguistic metadata analysis of documents may be used to further augment document viewing. These words can result from analysis of the document in light of the query and/or our preconceived notions of what might be important (for example, names of people or places), or from other analyses. One type of such data that we’ve found to be interesting is derived from a "what’s common" analysis of the entire result set stemming from a search. This type of metadata is called Recurring ThemesTM on the Electric Library products because it captures the prominent concepts that tie a result set together. Recurring ThemesTM such as people, places, subjects, etc. are detected by another Electric Library module, and are used to enhance the user experience in various ways; typically, users are given the option of limiting the result set based on the inclusion of a user-selectable Recurring ThemeTM. The selected Recurring ThemeTM can then be passed over document links to the Best Part and HTML markup modules for appropriate treatment. Each match of a Recurring ThemeTM word is highlighted in the document, and included in the traversal mechanism provided for hits. However, since matches from Recurring ThemesTM are of lesser importance than hits from the query terms, these matches are not included in the Best Part computation. This allows the document to be presented to the user in a way that relates to the initial question that was asked as well as the how the user limited his/her result set.

Conclusion

In a dynamic hypertext-based information retrieval system, significant value can be added by capturing relevant search-related data and using it to enhance the hypertext traversal process by passing it along the links that a user traverses. The documents themselves can behave as more than static pieces of content; each can be dynamically enhanced based upon what information is carried over links leading to it. This model can be used any place that a user is traversing a set of web pages that result (possibly indirectly, through subsidiary hyperlinks) from a search.


Ask 1-800-CTO to respond to an RFP.
About . Contact Us . Add URL . Newsletter . Site Map . Privacy Policy
1-800-CTO.com is owned by Topular LLC.
© Copyright 2011 Topular LLC. All Rights Reserved.