Synonym Enhancement - Motivation and Plan
Mapping a term to UMLS-Metathesaurus concept(s)/CUI(s) is an important step in NLP. Normalization and Query Expansion are two major technique used in concept mapping to increase recall. A better normalization and synonym list could increase the recall without dropping the precision of concept mapping.
- Normalization is used to convert different representations of a term into a normalized representation so that a concept can be mapped for different representations of the same term. It requires preprocess to normalize all collected terms and use the normalized forms as keys in the database. It is based on the lexical form level - syntactic, morphological, and orthographic information.
- Query expansion uses synonyms to substitute a subterm of a term to find mapping concepts (when no concept can be found by normalization of the original term). Spelling variants, inflectional variants, synonyms, acronym, abbreviations, expansion, derivational variant, and combination of above (fruitful variants) can be used for subterm substitution in the Query Expansion. It involves both lexical information and semantic information.
The synonyms in the Lexical Tools were developed in early 90's. It is static. Only few updates have been made over the past decade. A systematic methodology is developed to establish a new system to generate synonyms from the annual release of the SPECIALIST Lexicon and UMLS Metathesaurus. A better recall is expected with the new developed synonym list in use. The objectives of this task are:
- find related synonymous terms in the SPECIALIST Lexicon based on concepts in UMLS-Metathesaurus
- include the synonyms list in the SPECIALIST Lexicon release
- implement above result in Lexical Tools
- implement in STMT/SMT for better recall rate for CUI mapping
The phases of this task are described as follows:
- II-1. Find all English terms with same CUIs in Lexicon
Terms with same CUI (concept) in the same Metathesaurus are synonyms. For example, headache|E0030922 and cephalgia|E0015916 are synonyms with the same concept (C0018681). Synonyms might have same or different part of speech (POS) as shown in the following two tables.
- Examples - synonyms with same POS:
Type synonym-1 category-1 synonym-2 category-2 CUIs|Pt adj-adj ureteral adj ureteric adj C0041951|Ureter adj-adj emetic adj emetogenic adj C0013973|Emetics adj-adj farsighted adj long sighted adj C0020490|Hyperopia adj-adj inner adj internal adj C0205102|Internal noun-noun wrist noun carpal noun C0043262|Wrist noun-noun calculus noun stone noun C0006736|Calculi noun-noun headache noun cephalgia noun C0018681|Headache noun-noun tumor noun neoplasms noun C0027651|Neoplasms verb-verb happen verb occur verb C1709305|Occur (action) verb-verb autopsy verb necropsy verb C0004398|Autopsy verb-verb acquire verb obtain verb C1706701|Acquisition (action)
- Examples - synonyms with Different POS:
Type synonym-1 category-1 synonym-2 category-2 CUIs|Pt adj-noun cardiac adj heart noun C0018787|Heart adj-noun farsighted adj hyperopia noun C0020490|Hyperopia adj-noun renal adj kidney noun C0022646|Kidney adj-verb choice adj select verb C1707391|Choose (action) adj-verb mad adj anger verb C0002957|Anger noun-verb autograft noun autotransplant verb C0559189|Autograft Material noun-verb abdominal pain noun bellyache verb C0000737|Abdominal Pain
- Application example:
In the query expansion, "inner" can be replaced by "internal" and find "inner ear" as "labyrinth" with C0022889|Labyrint. More examples are listed in the following table:
Synonym-1 Synonym-2 QE Example CUI|PT inner|C0205102|Internal internal|C0205102|Internal inner ear|internal ear C0022889|Labyrinth heart|C0018787|Heart cardiac|C0018787|Heart heart abnormalities|cardiac abnormalities C0018798|Congenital Heart Defects kidney|C0022646|Kidney renal|C0022646|Kidney kidney disease|renal disease C0022658|Kidney Diseases
- Criteria of synonyms:
- English term
- terms with same CUIs
- terms are known to Lexicon
- terms with part of speech (POS) of adj, noun, and verb Please note that a synonym pair might have same or different POS
- find synonyms to the preferred terms of the CUIs
- Synonym generating procedures:
- Find all English terms with same concepts (CUI) in MRCONSO
- Go through each CUI|preferred term
- Find all above terms with category of "adj (1)", "noun (128)", and "verb (1024)" in Lexicon
- tag synonyms (Y|N) to the CUI|preferred term
- Examples - synonyms with same POS:
- II-2. Find related terms in Lexicon
Terms have similar or related concepts (CUIs) are related words. Related words have slight differences in meaning. However, they are exchangeable in query expansion to obtain better recall rate without dropping precision. For example, ache|E0006853|C0234238 and pain|E0045029|C0030193 do not have same CUI. However, they can be considered as related words in query expansion. “ache” can be substituted by “pain” in the input term of “head ache” to have query expanded term of “head pain” because they have same concept (C0018681). Some research focus on the semantic similarity and relatedness between concepts in UMLS [3-5]. These methods should be reviewed, evaluated, and tested to determine the related words.
- Application Examples:
Synonym-1 Synonym-2 Example term CUI|PT ache|C0234238|Ache pain|C0030193|Pain head ache|head pain C0018681|Headache bladder|C0005682|Urinary Bladder vesical|None bladder fistula|vesical fistula C0005690|Urinary Bladder Fistula
- Application Examples:
III. Model for Test and Analysis
A model need to be established as a measure metric to test the performance (precision and recall) of the new derived synonym list. The Sub-Term Mapping Tools (STMT), developed by Lexical Systems Group (LSG), is a generic tool set, with fully configurable options (corpus, synonyms, etc.), which provides comprehensive sub-term related features for query expansion and other NLP applications with Java APIs and command line tools . The Synonym Mapping Tool (SMT) is one of the most commonly used tools in the STMT package and is designed to find concepts in the UMLS-Metathesaurus using synonym substitutions. The performance (precision and recall) of SMT in finding the mapped concepts mainly depends on the synonym list if the number limit of substituted synonyms is fixed. The synonyms and related words derived from the above two phases can be configured easily as corpus trees in SMT for concept mapping test as follows.
- collect a list of terms with verified (reviewed by experts) CUIs as the gold standard for the test
- apply MetaMap  to find concepts of all terms as baseline-1
- apply STMT/SMP  with default synonym list to find concept of all terms as baseline-2
- configure STMT/SMP to utilize the new derived synonym list to find concepts of all terms
- compare the recall and precision rate from results of above steps