Synonyms ReTag From Previous Releases

By the latest requirements, all LexSynonyms should be cognitive synonyms. However, there are LexSynonyms, which are near-synonym were tagged incorrectly in 2017 release (due to the modified requirements) and requires re-tag. A new alogirthm should be developed for this effort, such as broader/narrower concepts for terms include their subterms as sPair (e.g. "white horse" and "horse" are not a sPair). This task should be done in the sClass level (not sPair level) so that all not-cognitive spair with bi-direciotnal, spVars, and nominalization can be all exlcuded. Due to the limited resource, it was done in the sPair level (suggested by Francois Lang) for 2018 release. The processes of this fixes are briefly descibed as belows:

    I. Inputs (Candidates for re-tag,provided by Francois)

    • sm.db.xaxb.txt
    • sm.db.xxy.txt
    • sm.db.xyx.txt
    • sm.db.xxy.new.txt
    • sm.db.xyx.new.txt
    • sm.db.xaxb.rev.txt
    • sm.db.xxy.rev.txt
    • sm.db.xyx.rev.txt

    II. Process

    • Directory: ${LEXICON_SYNONYMS}/bin
    • program: GetSynonyms ${year}

      OptionDescriptionsInputsOutputs
      23Retag synonyms from Meta-thesaurus
      • Synonym.GenNonCognitiveSpairs.java
      • ./ReTags/correctTagsOn2017/retagged.txt
      • ./Results/synonymFromMeta.data
      ./Results/synonymFromMeta.data.fixed
      • Remove not-cognitive sPairs from input file
      • In: 126,424; Removed: 1,916 (1.5155%); Final: 124,508 (98.4845%)
      24Retag synonyms from Nominalization
      • Synonym.GenNonCognitiveSpairs.java
      • ./ReTags/correctTagsOn2017/retagged.txt
      • ./Results/synonymFromNom.data
      ./Results/synonymFromNom.data.fixed
      • Remove not-cognitive sPairs from input file
      • In: 67,862; Removed: 32 (0.0472%); Final: 67,830 (99.9528%)
      25Retag synonyms from LVG
      • Synonym.GenNonCognitiveSpairs.java
      • ./ReTags/correctTagsOn2017/retagged.txt
      • ./Results/synonymFromLvg.data
      ./Results/synonymFromLvg.data.fixed
      • Remove not-cognitive sPair from input file
      • In: 4,780; Removed: 2 (0.0418%); Final: 4,778 (99.9582%)
      26Combine fixed synonyms from above steps: 23~25
      • ./Results/synonymFromMeta.data.fixed
      • ./Results/synonymFromNom.data.fixed
      • ./Results/synonymFromLvg.data.fixed
      ./Results/synonym.data.${YEAR}.fixed
      • The final release file

      III. Results

      • Total retag sPair candidates:
        TotalYesNo
        3,4051,601 (47.0191%)1,804 (52.9809%)
      • Removed sPairs:
        SourceOriginalRemovedRemaining
        CUI126,4241,916 (1.5155%)124,508 (98.4845%)
        EUI67,86232 (0.0472%)67,830 (99.9528%)
        LVG4,7802 (0.0418%)4,778 (99.9582%)
        Total199,0661,950 (0.9796%) 197,116 (99.0204%)