Generating Synonym Candidates from MetaThesaurus

I. Pre-Process

  • Directory: ${LEXICON_SYNONYMS}
  • program: ./Meta/GetSynonymCandidates.java
  • Inputs:
    • MRCONSO.RRF
    • MRSTY.RRF
    • cuiPreferredTerm.data
    • SemGroups.filter.txt
    • inflVars.data
    • LRABR -> LRABR.f1.uSort
    • LRNOM
  • Outputs:
    • synonymCan.data.*

II. Process

  • Directory: ${LEXICON_SYNONYMS}/bin
  • program: GetSynonyms ${year}

    OptionDescriptionsInputsOutputs
    1Get CUI|Preferred Terms from MRCONSO.RRF
    • Meta.GetCuiPtFromMrConso.java
    • Meta.MrConsoUtil.java
    • field-2: LAT = ENG (language of term)
    • field-3: TS = P (term status)
    • field-5: STT = PF (String type)
    • field-7: ISPREF = Y (Atom status, preferred: Y)
    • ./inData/MRCONSO.RRF
    • ./outData/Candidates/cuiPreferredTerm.data
      CUIPreferred Term
    2Get Synonym candidates from MRCONSO.RRF
    • Meta.GetSynonymCandidates.java
    • same CUI
    • English term: Filed-2, LAT = ENG
    • not disallowed STI, such as Chemicals & Drugs, defined in SemGroups.filter.txt, use MRSTY.RRF to map CUI to STI
    • Must known to Lexicon
    • Must have POS of adj, noun, or verb, infl is base
    • Remove acronym => it drops precision
    • Remove spVars => will add them in Post-process
    • Remove nominalization => will add them in Post-process
    • Remove class with only single candidates => remove pure spVar & nom
    • same CUI (definition of synonym, same concept)
    • Filed-2, LAT = ENG (English only)
    • Terms are normalized into lowercased core-terms (strip initial and final punctuation, then lowercased) as key in lookup mapping for Lexical rcord
    • known to Lexicon (design spec.)
    • have POS of adj, noun, or verb (design spec.)
    • infl is base (design spec.)
    • Base form are used n the output
    • ./outData/Candidates/SynonymCan.data
      #SYNONYM_CLASS|CUI|Preferred Term
      POS-1|EUI-1|Base-1
      POS-2|EUI-2|Base-2
      ...
      
    • Remove nominalization:

  • Sent the reuslt (SynonymCan.data) to linguists to tag [y|n] for valid and invalid synonyms.
  • This step is used only when all the synonym candidates are completely tagged. Accordingly, it is not used (skipped) for 2018 and 2019 releases (which we are still wroking on the first synonym candidate list generated for 2017. That is expected to be completed in the next couple years).