Exclusive Filter: A Term is all Digit

  • Description:
    If a term contains nothing but digits, punctuation, and space, it is not a valid multiword. Two normalization (strip punctuation and strip space) are performed in this filter.

  • Examples:
    • "3 + 1"
    • $1,500
    • 192.168.1.1
    • (+/- 0.05)
    • (+15%),
    • [0-5]
    • [192, 168]

  • Input Term: core-term.lc
  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Get words from inTermFT_TBD
      Norm: strip punctuation and spaceFT_TBD
      Check if all digitFT_DIGIT
      • filtered invalid terms - all digit after strip punctuation and space

    • source code: FilterDigit.java
    • FilterType: FilterType.FT_DIGIT

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2018FT_DIGIT955564955563 1 099.9999%
      2017FT_DIGIT935276935275 1 099.9999%
      2016FT_DIGIT915583915583 1 099.9999%
      2015FT_DIGIT896213896212 1 099.9999%
      2014FT_DIGIT875090875089 1 099.9999%

      There is a valid word "20/20" in the Lexicon, which is trapped by this filter.