Here is described briefly how TACT 2.1 differs from earlier versions (especially 1.2). Apart from changes to the interface (e.g., the menu system) and improved memory management, there is a new database structure, as a result of which all TACT textual databases prior to version 2.0 will need to be recompiled.
The following filenames have changed:
old name new name ----------------------------- MAKBAS.EXE --> MAKEBASE.EXE TACT.EXE --> USEBASE.EXE
There are the following 10 new programs:
To determine how to use these programs, consult the online help in each. TACT.BAT is the main shell that allows access to all the TACT programs, you must set TACTPATH for this program to work correctly.
There are also a number of new files used by TACT.BAT; they are necessary, and should not be deleted. The new files are:
For the January 1994 gamma release, all programs except Usebase and Collgen utilize DOS extended memory. This will speed up some operations for computers with any extended memory (memory above 1MB on most computers) and, we hope, eliminate a number of bugs, memory limitations, and speed bottlenecks.
There are two new features.
A number of bugs in the processing of large files have been fixed.
Anagrams is a new program that produces a list of partial or complete anagrams for a given database.
TACTfreq is a new program that produces a list of all words that occur in a given database, with their frequency of occurrence in one of three different orders (alphabetical, reverse alphabetical, and descending frequency).
TACTstat is a new program that produces type-token statistics for a given database.
Preproc is a new program that produces a set of output files relating to an input source text. The first output file is a list of distinct words. The second file is a copy of the input file with all tags, non-retained diacritics, and continuation characters removed. The third output file is a listing of all lines with tags and continuation characters in them. At the end of this file is an alphabetical listing of all reference tags found in the text.
Makedct is a new program that is used to build dictionaries. The input file is a list of distinct words produced by Preproc. This list is compared to two optional existing dictionaries, to produce a dictionary for the given input file. This dictionary contains surface, lemmatized, and two other forms for each word as well as part-of-speech information.
Tagtext is a new program that supplements or replaces the word-forms in a text with the fields from that text's dictionary or ".DCT" file, which Makedct previously generated for you. It can add up to two tags for each word in the text.
Satdct is a new program that will generate a satellite dictionary for a given tagged text. The dictionary will consist of an alphabetical list of distinct forms, in user-selectable order, along with the number of occurrences of each distinct form.
Fcompare is a new program that compares two ASCII files, separating similar and dissimilar lines or fields. The user may choose to compare whole lines, or a particular tab delimited field within the input lines.
Buildbat now uses the familiar panel interface. You may specify whether to use DEFAULT.MKS or any another .MKS file. The name of the batch file created by Buildbat is the same as the input .LST file but with a .BAT suffix; thus VOLPONE.LST causes Buildbat to generate VOLPONE.BAT. You can include paths in the file-names.
Collgen has been substantially altered. It now allows the user optionally to produce a list of all repeating phrases and subphrases, or only the maximally occurring phrases. That is, if a subphrase occurs the same number of times as a larger phrase that contains it, the subphase will not be included in the list of repeating phrases. This eliminates much redundancy in the output. Collgen also can produce a list of pairs of words that co-occur within a user-specified word span, in any order. A numerical value is associated with each pair signifying the statistical likelihood of the words co-occurring in such a fashion. The size of the co-occurrence output file can be reduced by use of one of two optional input files. The user can supply an .INC (include) or .XCL (exclude) file, consisting of a list of words to be included/excluded from the output of pairs of words co-occurring. For example, one could use this feature, by providing an .XCL file consisting of prepositions and other function words. The output would exclude any pair consisting of one or two function words. Output can be produced with spaces or tab-delimited.
The program formerly called TACT has been renamed Usebase. The interface has been simplified in a number of areas, and new features have been added. The menu bar now has the following items: File, Select, Displays, Group, and Help. Current and New have now been merged into Displays. Exit is now within the File menu. Category is now called "Group." Some items within the various menus have also moved to other menus.