CHWP B.13 Tompa, "Experiences with the OED"

1. Background

The Oxford English Dictionary (OED, 1989) is the largest dictionary of written English, including over 290,000 entries and covering 20 volumes of print or 570 Mbytes of computer storage. Unlike most other dictionaries, the OED is based "on historical principles", and thus includes entries for all obsolete words, extensive etymological information, all historical written forms, chronological development of all word senses, and extensive citations giving historical evidence of each sense of each word (Murray, 1979; Berg, 1991) (Figure 1).

In May 1984, the Oxford University Press announced The New OED Project. In Phase 1, the Press was to capture the text of the original twelve-volume OED and its four-volume Supplement in machine-readable form, integrate the two parts to form one unified work, and publish the resulting dictionary. For this phase, the Press tendered a contract to the International Computaprint Corporation to capture the text; the University of Waterloo assisted the Press in converting the text from its data capture form to a form more suitable for subsequent processing; IBM United Kingdom Limited donated computing equipment, software, and personnel to aid with the integration effort; new materials were added to the text and some revisions made by the Press's lexicographers, with assistance from the University of Oxford Phonetics Laboratory; and Filmtype Services Ltd. was contracted to set the type for the book, which was manufactured by Rand McNally & Company. The publication of the second edition marked the completion of this phase in March 1989.

Subsequent phases of the project include plans for the Press's lexicographers to update, revise, and enhance the data that constitutes the OED. The agreement struck between the Press and the University of Waterloo in 1984 promises continuing cooperation in developing the OED database, with Waterloo designing and implementing a suitable database system for managing the text. As a result, the Waterloo Centre for the New OED and Text Research has taken responsibility for pursuing research in text database management and its applications to a wide spectrum of text databases, including the OED in particular.

In this report, we present an overview of the database design and software developed at the University of Waterloo for providing access to the OED. Readers interested in the conversion of the Dictionary from the original twelve volumes to the current form or in an overview of the Press's and Waterloo's expectations for the electronic Dictionary should refer to previous publications addressing these topics (Weiner, 1985; Stubbs & Tompa, 1988; Benbow et al., 1990).

