Exclusive Filter: A Term is a Single Word

  • Description:
    If a term has no space, it is a single word (not a multiword). Such as:
    • See
    • whatever

  • Filter Algorithm:
    • Logics:

      Check if contain spaceFT_SINGLE_WORD
      • filtered single word

    • source code: FiltersingleWord.java
    • FilterType: FilterType.FT_SINGLE_WORD

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2018FT_SINGLE_WORD955564479329 476235 050.1619%
      2016FT_SINGLE_WORD915583446928 468655 048.8135%
      2015FT_SINGLE_WORD896213431432 464781 048.1394%
      2014FT_SINGLE_WORD875090417755 457335 047.7385%

      These filter should not be applied until the very last step because the inclusive filter - spelling variant pattern need them. For example, "clubfeet", "club-feet", and "club feet" are spelling variants. Multiwords "club feet" can't be found if both single word spelling variants are removed.