CHWP A.3 Winder, "Reading the text's mind"

3. Consultation and the meaning of lemmatisation

3.1 The structure of lemmatisation

In terms of these three worlds, lemmatisation might be described as a mapping, conditioned by tones, from a domain of tokens onto a range of types. That mapping can be viewed from several points of view.

3.1.1 Tag insertion

From a practical and computational point of view, a lemmatisation is realised by inserting a tag into a text. Thus, given the textual segment ".....a......", a lemmatisation is effected when the lemma is inserted into the segment and marked appropriately as a tag; "...A{a}..." might be the resulting sequence for this example, where the tag "A{}" is inserted into the original segment around its replica "a".

The kind of information the tagging conveys can be grammatical (as in "GO{gone}"), semantic ("SPHERE{ball}"), connotative ("DEATH{raven}") or other. The linguistic unit that is tagged is equally varied. It could be a morpheme, a word, a sentence or any other textual unit; titles are in fact lemmata of whole texts. Furthermore, tagging can form hierarchies: "SPHERE{NOUN{ball}}" would be an example of a complex lemmatisation.

In the model we are developing, tag insertion is the minimal interpretative act. It is the basis for generating from the source text a new text, a new attestation.

3.1.2 Reference of tags

That is the operational side of lemmatisation, but what are the consequences of that tagging? When we consult the dictionary we expand the source text; at the position of a given replica, we virtually insert the text of the definition. Our reading passes from the source text to the dictionary and back again. The two texts have been woven together, much as the passages of a hypertext are woven together by jumps. The reference from the lemma to the replica serves thus as the basis of a more general reference between two texts.

When we consult the dictionary, we are virtually mixing two texts to generate a third, hybrid text. The third text is in some sense more readable, i.e. it has more meaning than the first. That surplus of meaning is not simply poured from the dictionary into the source text, but rather develops from the reaction of the two.

The derivative text may be informationally richer than the source text, since it is often an expansion of the original text, but that is not a necessary condition for an increase in meaningfulness. It is more accurate to say that we recognise the tones in the hybrid text whereas we did not recognise them in the original text. This is particularly evident in the case where the dictionary expansion is taken from a bilingual lexicon, since the target word is not intrinsically richer than the source word. It simply belongs to a different language, or value system.

3.1.3 Naming

Finally, lemmatisation is a kind of naming. Though a tag is both paradigmatically and syntagmatically related to the replica (it sits "beside" and "above" the replica), lemmata represent their replicas metalinguistically.

What is the effect of naming? The simplest effect is to mark a position in the text, much as a library call number marks a position in a library. Named items can be manipulated on the level of their names, and not as complete units, which gives the name its peculiar efficiency.

But lemmata are not empty deictics; they have meaning in their own right. A lemma is more like a title in that it interacts with what it names. A title is an interpretative instruction, a sign-post for the reader's interpretation of the text. It thematises and systematises what it represents. A lemma represents an interpretative position, saying "this is what the interpreter calls an X". Thus there is a reciprocal relation between the lemma and what it names: the lemma derives its value from what it names and what it names is defined by the lemma.

3.2 Attestation generation

Lemmatisation is thus at the heart of several fundamental procedures: hypertextual linking of texts, generation of new texts, and creation of a surplus of meaning through naming.

It is also at the root of the dilemma that Lusignan pointed out. Whether it is done manually, interactively, or automatically, lemmatisation fixes the countable, objective, textual token between two worlds of essentially uncountable, subjective (i.e. non-token-like), co-textual tones and types. Laws are not discrete units, nor are qualities. One law or quality blends seamlessly into its neighbour, and because there is no limit to the shades of qualities and laws, early research was understandably lost on a sea of possible distinctions.

Though it exists of course in other fields, the problem of non-discrete tones and types is particularly severe for computational criticism, since computational critics necessarily take an extremely materialist view of the text, that places the token at the centre of its methodology. Computational critics also have at their disposal tools that are powerful enough to reveal in detail a vast spectrum of textual tones and types.

These aptitudes and constraints lead computational critics to adopt a particular approach to meaning in which quotations play the central role. Their methodology must ultimately be based on the procedures of lemmatisation similar to those that are used when consulting a dictionary.

Interpretation in computational criticism is implemented by generating a derivative text that has more meaning than the source text. The derivative text can be called a quotation, since it is a reorganised segment of the source text. A quotation in this extended sense is novel, however, in that it may contain words that are not in the source text (though editorial emendations are permitted even in orthodox quotations), and may not have the form of ordinary text at all, but remains quotation-like because it is purposefully derived from the source text. Thus, a frequency list, a concordance, or a distribution graph are all "quotations" in this extended sense.

A quotation has a particular interpretative force because it can be pointed at like a token (it is an instance), yet at the same time has the value of a type (it is generated in a "legal" fashion) and displays the qualities of tones (it has its own feel, its own "air de famille"). We will call the special quotations of computational criticism attestations.

The generation of attestations is analogous to the experimentation of chemists. Chemists too must deal with the continuity of qualities and laws found in nature. Meaningful units are generated through experiments in which samples are combined with known substances through standard procedures. Peirce gives a good example of how the terminology, meaning, and methods of a science are joined through experimentation:

If you look into a textbook of chemistry for a definition of lithium, you may be told that it is that element whose atomic weight is 7 very nearly. But if the author has a more logical mind he will tell you that if you search among minerals that are vitreous, translucent, gray or white, very hard, brittle, and insoluble, for one which imparts a crimson tinge to an unluminous flame, this mineral being triturated with lime or witherite rats-bane, and then fused, can be partly dissolved in muriatic acid; and if this solution be evaporated, and the residue be extracted with sulphuric acid, and duly purified, it can be converted by ordinary methods into a chloride, which being obtained in the solid state, fused, and electrolyzed with half a dozen powerful cells, will yield a globule of a pinkish silvery metal that will float on gasolene; and the material of that is a specimen of lithium. (¶2.330, quoted in Eco 1980: 86)

Peirce's definition of lithium is unique in that there is no attempt to establish an exhaustive inventory of the qualities that lithium may possess. There is no assumption that the definition can be substituted for the defined. Rather, the definition is a recipe for "baking up" some lithium, which the chemist may follow to come into direct existential contact with a replica of lithium.

I would like to suggest that chemists and computational critics share essentially the same methodology on this point: the extracted token of lithium has the same role as an attestation, and on the other hand, textual attestations are produced experimentally in the computer through the reactions of a source text in standard algorithms. In other words, textual attestations are kinds of precipitates and chemical precipitates are a kind of attestation.[6]

What is truly distinctive about computational interpretation, is that its procedures, like those of chemistry, are designed to lead the reader to a replica of a textual feature, the attestation. In this way, the problems associated with the non-discrete type and tone co-texts are resolved: there is no need to have an inventory of tones or types because they are packaged and displayed in attestations, and their terminological value is captured in the algorithms of attestation generation. In short, large-scale, systematic generation of attestations is ultimately the distinctive feature of computational interpretation. Electronic texts are simply the indispensable medium of this truly novel approach to textual meaning.

[Return to table of contents] [Continue]


[6] The methodology of such radically different fields can be the same because ultimately any science is metalinguistically --meta-interpretatively-- about exactly the same thing: the administration of meaning. Every method of enquiry must use its terminology to accumulate, order, and store meaning.