Exclusive Filter: A Term Ends with Parenthetic Abbreviation

  • Description:
    If a term ends with parenthetic abbrevication is not a valid multiword. These terms are filtered out from the MEDLINE n-gram set. For examples, the following terms are invalid multiwords:
    • activating transcription factor 3 (ATF3)
    • acute coronary syndrome (ACS)
    • angiotensin-converting enzyme (ACE)
    • Alzheimer's disease (AD)

  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Get the end-termFT_TBD
      • Aaa bbb bbc (ABB): => (ABB):
      Remove ,.: from end-term if it existFT_TBD
      • (ABB): => (ABB)
      Check if match pattern of (ABB)
      • end-term starts with (
      • end-term ends with )
      • abbreviation are all upper case
      FT_TBD
      Check if an abbreviation
      • The 1st character of input is the same as the 2nd character of end-term (ABB)
      FT_END_TERM_NOT_ABB
      • Exceptions: valid terms end with pattern of (ABB), but ABB is not an abbreviation.
      Check if not an abbreviationFT_END_TERM_INV_ABB
      • Filtered invalid terms ends with (ABB)

    • source code: FilterEndTermAbb.java
    • FilterType: FilterType.FT_END_TERM_INV_ABB

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2014FT_END_TERM_INV_ABB8750908750900 10100.0000%
      2015FT_END_TERM_INV_ABB8962138962130 15100.0000%
      2016FT_END_TERM_INV_ABB9155839155830 18100.0000%