Generating Synonyms from Meta-Thesaurus

This page generate LexSynonyms from the Meta-Thesaurus. Most of LexSynonyms are from this sources. The candidate list were generated in 2017 through this model and scheduled to be completed in 2021. The following steps are used during these period of time for LexSynonym generation.

I. Pre-Process

  • Directory: ${LEXICON_SYNONYMS}
  • program: ./Synonym/GenSynonymFromMeta.java
  • Inputs:
    • ./Results/sClass.data.tag
    • ${IN_DIR}LRSPL
    • ${IN_DIR}LRNOM
  • Outputs:
    • ./Results/synonymFromMeta.data

II. Process

  • Directory: ${LEXICON_SYNONYMS}/bin
  • program: GetSynonyms ${year}

    OptionDescriptionsInputsOutputs
    10Validate and analyze synonym tag file(s)
    • Synonym.CheckTagSClassFile.java
    ./Tags/SynonymCan_Tagged.txt
    • Combine all tag synonym candidates to ./Tags/SynonymCan_Tagged_${YEAR}.txt.org
    • Fix issues and save to ./Tags/SynonymCan_Tagged_2020.txt.fixed
    • Update and copy SynonymCan_Tagged_2020.txt.fixed to SynonymCan_Tagged_2020.txt
    • link SynonymCan_Tagged_${YEAR}.txt to SynonymCan_Tagged.txt
    • ./Tags/SynonymCan_Tagged.txt.tbd:
      • Required manually add TBD tags to those synonyms missed tags
      • Tbd should be 0 to complete the tag (send to linguist for tagging)
    • ./Tags/SynonymCan_Tagged.txt.err:
      • Err no must be 0, sent to linguists to re-tag if not 0
    • ./Tags/SynonymCan_Tagged.txt.out
    11Tag synonym candidate file from tagged files
    • Synonym.TagSynonymClass
    • ./Candidates/synonymCan.data
      => link to ./Candidates/synonymCan.data.out.${YEAR}
    • ./Tags/SynonymCan_Tagged_${YEAR}.txt
    • ./Results/sClass.out.tag
    • ./Results/sClass.out.notTag
      • => Used for the next year candidates (until we complete this candidate list and generate a new one in the annually updates, estimated in 2021+)
    • ./Results/sClass.out.tag.tbd
      • Should be 0. If not, check each one, most likely are typos or non-ASCII chars, fix them in the ./Tags/SynonymCan_Tagged_${YEAR}.txt. Then, re-run it until it is 0

    The tagged sClass No. could be different than the final sTagClass No. that is generated.
    12Generate current year synonyms from same CUI in Meta-thesaurus
    • Go through all tagged sClass (Same CUI)
      • Collect all synonyms of [Pos|EUI|Base] with [Y] tag
      • Find all spVars and nominalizations of above [Y] tagged synonyms
      • Generate sPairs from all permutations of above synonyms, their spVars and noms
      • Use the CUI of the sClass for extra information
    • Print out sPairSet by alphabetical order
    • ./Results/sClass.out.tag
    • ./inData/LRSPL
    • ./inData/LRNOM
    • Print out No. of tags [Y|N|S] for [yes|No|Skip]
    • ./Results/synonymFromMeta.data.${YEAR}
      NpLc Synonym-1Synonym-1Pos-1Synonym-2Pos-2CUI
    13Combine previous year and current year synonyms from Meta-thesaurus
    • ./inData/synonymFromMeta.data.{PREV_YEAR}
      => link to ../../${PREVIOUS_YEAR}/outData/Results/synonymFromMeta.data.fixed
      This is the accumulated synonyms from Meta (check WC with dGrowth)
      This is the file release in LVG and Lexicon
    • ./outData/Results/synonymFromMeta.data.${YEAR}
    • ./outData/Results/synonymFromMeta.data (accumulated)