Exclusive Filter: A Term contains pattern of indefinite article pattern

  • Description:
    If a term starts with pattern of "a " or "A ", it must have spelling variants co-exist in the nGram set to be a valid MWE. The spVar pattern of indefinite article are:
    • pattern-1: [a term], [a-term], [aterm]
    • pattern-2: [A term], [A-term], [Aterm]

  • Examples:
    • a total
    • A total

    Also, a term starts with "a disintegrin and metalloprot" might have spVar in a varienty of form. Thus, it is an exception for this pattern. Such as:

    • a disintegrin and metalloprotease 3
    • a disintegrin and metalloprotease-3
    • A disintegrin and metalloprotease 3
    • A disintegrin and metalloprotease-3

  • Input Term: core-term
  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Get invalid termFT_TBD
      • Collect all terms that starts with:
        • [a term], [a-term], [aterm]
        • [A term], [A-term], [Aterm]

        and save to HashMap<normTerm, termSet>
      • go through HashMap and find invalid term set
        • Size of termSet = 1
        • term starts with "a " or "A "
        • term.lc( ) does not starts with "a disintegrin and metalloprot"
      Check if invalid term set contains termFT_INDEF_ARTICLE_PAT
      • filtered invalid terms

    • source code: FilterIndefArt.java
    • FilterType: FilterType.FT_INDEF_ARTICLE_PAT

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2018FT_INDEF_ARTICLE_PAT955564955545 19 099.9980%
      2017FT_INDEF_ARTICLE_PAT935276935259 17 099.9982%
      2016FT_INDEF_ARTICLE_PAT915583915570 13 099.9986%
      2015FT_INDEF_ARTICLE_PAT896213896200 13 099.9985%
      2014FT_INDEF_ARTICLE_PAT875090875077 13 099.9985%