Exclusive Filter: A Term Ends with Absolute Invalid-End-Terms (AIET)

  • Description:
    If a term ends with absolute invalid-end-terms (IET), it is not a valid multiword. These terms are filtered out from the MEDLINE n-gram set.

  • Examples:
    • away from that
    • the source, and
    • the tumors, but

    The absolute invalid-end-terms (AIET) are derived from Lexicon. Some end-terms from the invalid lead-end-term candidate list are absolute invalid end-terms, such as "about", "across", "across from", etc.. N-grams end with any of these absolute invalid end-terms are not valid multiwords. In 2014, there are 407 abosulute invalid end-terms. Please refer to design documents of Lead-End-Terms Model for details.

  • Input Term: core-term
  • Filter Algorithm:
    • Logics:

      Norm: strip punctuation except for '/.-FT_TBD
      • Optional
      • in addition, the => in addition the
      • tissues (such => tissues such
      • Go through all abs-inv-end-terms (AIETs)
        • Case-1: if AIETs is not upper case and inTerm is uppercase
          => lowercase, use inTerm.lc
        • Case-2.1: if AIETs is upper case
        • Case-2.2: if AIETs is not upper case and inTerm is lowercase
        • Case-2.3: if AIETs is not upper case and inTerm is mixed case
          => use inTerm (no change in case)
      CaseAIETinTerminTerm converted
      1theFROM THEfrom the
      Keep case
      2.1W/Oissue W/Oissue W/O
      2.2thefrom thefrom the
      2.3.1theFrom theFrom the
      2.3.2norin the USin the US
      2.3.3asHex AsHex As
      • Check if inTerm is abs-inv-end-terms (AIET)
      • Expcetions: AIETs are valid terms
      • Check if ends with " " + AIET
      • 3D US => invalid

    • source code: FilterEndTermAbs.java
    • FilterType: FilterType.FT_END_TERM_INV_ABS

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
      • ${OUT_DATA}/03.LeadEndTerm/invalidEndTerms.data.abs
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2018FT_END_TERM_INV_ABS955564955558 6 44299.9994%
      2017FT_END_TERM_INV_ABS935276935273 3 44299.9997%
      2016FT_END_TERM_INV_ABS915583915580 3 43899.9997%
      2015FT_END_TERM_INV_ABS896213896210 3 43599.9997%
      2014FT_END_TERM_INV_ABS875090875087 3 43699.9997%

      Please note three valid words are filtered out by mistake:
      • 3-D US: three dimensional ultrasound
      • 3D US: three dimensional ultrasound
      • PD US: power Doppler ultrasound