Exclusive Filter: A Term is punctuation or space

  • Description:
    If a term contains nothing but punctuation or space, it is not a valid multiword.

  • Examples:
    • "-"
    • $
    • +/-
    • %
    • (+)
    • [...]

  • Input Term: core-term.lc
  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Norm: Strip punctuationFT_TBD
      Norm: TrimFT_TBD
      Check if an empty stringFT_PUNCTUATION
      • filtered invalid terms - punctuation and space

    • source code: FilterPunctuation.java
    • FilterType: FilterType.FT_PUNC_SPACE

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2018FT_PUNC_SPACE95556495556400100.0000%
      2017FT_PUNC_SPACE93527693527600100.0000%
      2016FT_PUNC_SPACE91558391558300100.0000%
      2015FT_PUNC_SPACE89621389621300100.0000%
      2014FT_PUNC_SPACE87509087509000100.0000%