Exclusive Filter: A Term is a Number
If a term is a number, it is not a valid/new multiword (already known in Lexicon). Two normalization (lowercase and strip punctuation) are performed in this filter.
- five hundred and fifty five
- Input Term: core-term.lc
- Filter Algorithm:
Description FilterType Notes Get words from inTerm FT_TBD Norm:lowercase and strip punctuation FT_TBD Check if all words are numbers FT_LEX_NUMBER
- filtered invalid terms - numbers after lowercase and strip punctuation
- source code: FilterNumber.java
- Accuracy Test on Lexicon:
Lexicon Filter Sample No Pass No Trap No Exp No Pass-Rate 2018 FT_LEX_NUMBER 955564 955521 43 0 99.9955% 2017 FT_LEX_NUMBER 935276 935233 43 0 99.9954% 2016 FT_LEX_NUMBER 915583 915540 43 0 99.9953% 2015 FT_LEX_NUMBER 896213 896170 43 0 99.9952% 2014 FT_LEX_NUMBER 875090 875049 41 0 99.9953%
There are 41/43 numbers are recorded in Lexicon and are trapped by this filter.