prevariants

Descriptions:

Prevariants, tell a lower cased form is or could be an acronym or abbreviation.

Fields:

FieldNameNotes
1FROMInflected variable, lower cased, unique
2FLAG
  • 0: means FORM is never an abbreviation or acronym (dog)
  • 1: means FORM is sometimes an abbreviation or acronym (aids, yes, bras)
  • 2: means FORM is always an abbreviation or acronym (aca)
3SCASyntactic category (combined value)

Notes:

Many of the ambiguities seems as the result of lower-casing (AIDS/aids). For examples:

dog|0|1152
aids|1|1152
nih|2|128

Algorithm:

  • Go through input file, Lexicon, and put all Lexical record into a Vector of LexRecord Java object.
  • Go through all lexical record objects:
    • Get lower cased inflectional variants and put into forms. Go through all forms:
      • If the form does not exist in a hash table, preVars:
        => Put formas key in the hash table.
        => Instantiate a preVar include form, cat, abb, and nonAbb
        => put the preVar into the hast table

      • It the form exists in the hast table,
        => Get the preVar out from the hash table
        => Update the preVar for logical or (|) on cat, abb, nonAbb
        => put the preVar into the hast table
    • Go through hash table, PreVar, to print:
      field 1: form (inflected, lower cased, unique)
      field 2:
      ValueConditionNotes
      0abb = falseNever an abbreviation
      1abb = true and nonAbb = trueSometimes an abbreviation
      2abb = true and nonAbb = falseAlways an abbreviation

      field 3: category (combined number)
  • Sort the result (Done by Unix command sort).
  • Generate ASCII only file