Exclusive Filter: A Term is digit or stopword

  • Description:
    If a term contains nothing but digit or stopword, it is not a valid multiword. The default stopwords include: of, and, with, for, nos, to, in, by, on, the, (non mesh).

  • Examples:
    • in the
    • the 8:2
    • 1, 2, and
    • 2003 to 2007
    • 50% of

  • Input Term: core-term.lc
  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Get words from inTermFT_TBD
      Norm: lowercase adn strip punctuationFT_TBD
      Check words if digit or stopwordFT_DIGIT_STOPWORD
      • filtered invalid terms - composed of digit and stopwords only

    • source code: FilterPipe.java
    • FilterType: FilterType.FT_DIGIT_STOPWORD

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
      • ${IN_DATA}/stopWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2018FT_DIGIT_STOPWORD955564955555 9 099.9991%
      2017FT_DIGIT_STOPWORD935276935268 8 099.9991%
      2016FT_DIGIT_STOPWORD915583915575 8 099.9991%
      2015FT_DIGIT_STOPWORD896213896205 8 099.9991%
      2014FT_DIGIT_STOPWORD875090875085 6 099.9993%