Synonym Enhancement - Motivation and Plan

I. Objective

Mapping a term to UMLS-Metathesaurus concept(s)/CUI(s) is an important step in NLP. Normalization and Query Expansion are two major technique used in concept mapping to increase recall. A better normalization and synonym list could increase the recall without dropping the precision of concept mapping.

  • Normalization is used to convert different representations of a term into a normalized representation so that a concept can be mapped for different representations of the same term. It requires preprocess to normalize all collected terms and use the normalized forms as keys in the database. It is based on the lexical form level - syntactic, morphological, and orthographic information.
  • Query expansion uses synonyms to substitute a subterm of a term to find mapping concepts (when no concept can be found by normalization of the original term). Spelling variants, inflectional variants, synonyms, acronym, abbreviations, expansion, derivational variant, and combination of above (fruitful variants) can be used for subterm substitution in the Query Expansion. It involves both lexical information and semantic information.

    The synonyms in the Lexical Tools were developed in early 90's. It is static. Only few updates have been made over the past decade. A systematic methodology is developed to establish a new system to generate synonyms from the annual release of the SPECIALIST Lexicon and UMLS Metathesaurus. A better recall is expected with the new developed synonym list in use. The objectives of this task are:

    • find related synonymous terms in the SPECIALIST Lexicon based on concepts in UMLS-Metathesaurus
    • include the synonyms list in the SPECIALIST Lexicon release
    • implement above result in Lexical Tools
    • implement in STMT/SMT for better recall rate for CUI mapping

II. Phases

The phases of this task are described as follows:

  • II-1. Find all English terms with same CUIs in Lexicon
    Terms with same CUI (concept) in the same Metathesaurus are synonyms. For example, headache|E0030922 and cephalgia|E0015916 are synonyms with the same concept (C0018681). Synonyms might have same or different part of speech (POS) as shown in the following two tables.
    • Examples - synonyms with same POS:

      adj-adjfarsightedadjlong sightedadjC0020490|Hyperopia
      verb-verbhappenverboccurverbC1709305|Occur (action)
      verb-verbacquireverbobtainverbC1706701|Acquisition (action)

    • Examples - synonyms with Different POS:

      adj-verbchoiceadjselectverbC1707391|Choose (action)
      noun-verbautograftnounautotransplantverbC0559189|Autograft Material
      noun-verbabdominal painnounbellyacheverbC0000737|Abdominal Pain

    • Application example:
      In the query expansion, "inner" can be replaced by "internal" and find "inner ear" as "labyrinth" with C0022889|Labyrint. More examples are listed in the following table:
      Synonym-1Synonym-2QE ExampleCUI|PT
      inner|C0205102|Internalinternal|C0205102|Internalinner ear|internal earC0022889|Labyrinth
      heart|C0018787|Heartcardiac|C0018787|Heartheart abnormalities|cardiac abnormalitiesC0018798|Congenital Heart Defects
      kidney|C0022646|Kidneyrenal|C0022646|Kidneykidney disease|renal diseaseC0022658|Kidney Diseases

    • Criteria of synonyms:
      • English term
      • terms with same CUIs
      • terms are known to Lexicon
      • terms with part of speech (POS) of adj, noun, and verb Please note that a synonym pair might have same or different POS
      • find synonyms to the preferred terms of the CUIs

    • Synonym generating procedures:
      • Find all English terms with same concepts (CUI) in MRCONSO
      • Go through each CUI|preferred term
        • Find all above terms with category of "adj (1)", "noun (128)", and "verb (1024)" in Lexicon
        • tag synonyms (Y|N) to the CUI|preferred term

  • II-2. Find related terms in Lexicon
    Terms have similar or related concepts (CUIs) are related words. Related words have slight differences in meaning. However, they are exchangeable in query expansion to obtain better recall rate without dropping precision. For example, ache|E0006853|C0234238 and pain|E0045029|C0030193 do not have same CUI. However, they can be considered as related words in query expansion. “ache” can be substituted by “pain” in the input term of “head ache” to have query expanded term of “head pain” because they have same concept (C0018681). Some research focus on the semantic similarity and relatedness between concepts in UMLS [3-5]. These methods should be reviewed, evaluated, and tested to determine the related words.
    • Application Examples:
      Synonym-1Synonym-2Example termCUI|PT
      ache|C0234238|Achepain|C0030193|Painhead ache|head painC0018681|Headache
      bladder|C0005682|Urinary Bladdervesical|Nonebladder fistula|vesical fistulaC0005690|Urinary Bladder Fistula

III. Model for Test and Analysis
A model need to be established as a measure metric to test the performance (precision and recall) of the new derived synonym list. The Sub-Term Mapping Tools (STMT), developed by Lexical Systems Group (LSG), is a generic tool set, with fully configurable options (corpus, synonyms, etc.), which provides comprehensive sub-term related features for query expansion and other NLP applications with Java APIs and command line tools [6]. The Synonym Mapping Tool (SMT) is one of the most commonly used tools in the STMT package and is designed to find concepts in the UMLS-Metathesaurus using synonym substitutions. The performance (precision and recall) of SMT in finding the mapped concepts mainly depends on the synonym list if the number limit of substituted synonyms is fixed. The synonyms and related words derived from the above two phases can be configured easily as corpus trees in SMT for concept mapping test as follows.

  • collect a list of terms with verified (reviewed by experts) CUIs as the gold standard for the test
  • apply MetaMap [7] to find concepts of all terms as baseline-1
  • apply STMT/SMP [6] with default synonym list to find concept of all terms as baseline-2
  • configure STMT/SMP to utilize the new derived synonym list to find concepts of all terms
  • compare the recall and precision rate from results of above steps