Precision, Recall, and F1 Analysis for LMW Candidates from (ACR) Model - Paper On (ACR) matcher

I. Introduction

All multiwords (LMWs) from an interested domain must be identified to find the recall rate. In this analysis, we used LMW candidates from the Parentheic Acronym Pattern matcher (ACR) to calculate precision, recall, and F1 score. The example illustrated below is based on 2015 data.

II. Data-1 (Table-3 in 2016 AMIA paper initial version)

  • Setup
    • Apply Parentheic Acronym Pattern matcher on the MEDLINE n-gram set (2015)
    • The lowercased core-terms of these acronym expansions are used for LMW candidates
    • 14,400 LMW candidates are retrieved and tagged automatically by programs (if known in Lexicon and previous invalid-tags), and then manually by linguists.
    • 13,170 are valid (TP) ; 1,230 are invalid (FP), as shown in case-1 in the table below
    • This is used as gold standard for further analysis of other combination of filters and matchers

    • Cases 2: test on filters
      In real practice, apply the distilled MEDLINE n-gram set as domain filter instead of applying all 16 filters in sequential
    • Cases 3 ~ 5: test on a single matcher
    • Cases 6 ~ 7: test on combination of filters and matchers

  • Results

    CaseDescriptionTPFPT. RetrievedT. RelevantPrecisionRecallF1
    1Parenthetic Acronym - Gold Standard13170123014400131700.91461.00000.9554
    Filters or a single matcher
    2Distilled MEDLINE N-gram Set (16 filters)1316579513960131700.94310.99960.9705
    3Spelling Variant Pattern Matcher68372937130131700.95890.51910.6736
    4Metathesaurus CUI Pattern matcher86785129190131700.94430.65890.7762
    5EndWord Pattern matcher15871081695131700.93630.12050.2135
    Combination of filters and matchers
    6SpVar + CUI + Distrilled51081295237131700.97540.38790.5550
    7SpVar + CUI + EndWord + Distrilled7035708131700.99290.05340.1013

III. Data-2 (Table-3 in 2016 AMIA paper final)

  • Setup
    • Same as in Data-1 (completed all n-grams for 2015 MEDLINE n-gram set)
    • 16,675 LMW candidates are retrieved and tagged automatically by programs (if known in Lexicon and previous invalid-tags), and then manually by linguists.
    • 14,805 are valid (TP - total relevent) ; 1,870 are invalid (FP - total irrelevent), as shown in case-1 in the table below
    • This is used as gold standard for further analysis of other combination of filters and matchers

  • Results

    CaseDescriptionTPFPFNTNPrecisionRecallF1Accuracy
    1Parenthetic Acronym - Gold Standard148051870000.88791.00000.94060.8879
    Filters or a single matcher
    2Distilled MEDLINE N-gram Set (16 filters)14796130595650.91890.99940.95750.9212
    3Spelling Variant Pattern Matcher7509482729613880.93970.50720.65880.5336
    4Metathesaurus CUI Pattern matcher9488752531711180.92660.64090.75770.6360
    5EndWord Pattern matcher (top 20)17101801309516900.90480.11550.20490.2039
    Combination of filters and matchers
    6SpVar + CUI + Distrilled5510206929516640.96400.37220.53700.4302
    7SpVar + CUI + EndWord (20) + Distrilled727111407818590.98510.04910.09350.1551

III. Data-3 (Table-2 in 2017 HealthInf paper final)

  • Setup
    • Similar to Data-1 and 2 (completed all n-grams for 2016 MEDLINE n-gram set)
    • 17,707 LMW candidates are retrieved and tagged automatically by programs (if known in Lexicon and previous invalid-tags), and then manually by linguists.
    • 15,850 are valid (TP - total relevent) ; 1,857 are invalid (FP - total irrelevent), as shown in case-1 in the table below
    • This is used as gold standard for further analysis of other combination of filters and matchers

  • Results

    CaseDescriptionTPFPFNTNPrecisionRecallF1Accuracy
    1Parenthetic Acronym - Gold Standard158501857000.89511.00000.94470.8951
    Filters or a single matcher
    2Distilled MEDLINE N-gram Set (16 filters)158401299105580.92420.99940.96030.9261
    3Spelling Variant Pattern Matcher8094499775613580.94190.51070.66230.5338
    4Metathesaurus CUI Pattern matcher10056755579411020.93020.63440.75440.6301
    5EndWord Pattern matcher (top 20)18041781404616790.91020.11380.20230.1967
    5AEndWord Pattern matcher (top 33)23462511350416060.90340.14080.25440.2232
    Combination of filters and matchers
    6SpVar + CUI + Distrilled5892212995816450.96530.37170.53680.4257
    7SpVar + CUI + EndWord (20) + Distrilled777111507318460.98600.04900.09340.1481
    7ASpVar + CUI + EndWord (33) + Distrilled992151485818420.98510.06260.11770.1600
    8CUI + EndWord (33) + Distrilled17661131408417440.93990.11140.19920.1982

IV. Automatic Tagging Model

  • Input: LMW candidates
  • Algorithm:
    TagNotes
    valid
    • known in Lexicon: ${MULTIWORDS}/data/current/inData/inflVars.data.current (inflVars.data from the latest Lexicon)
    • ${MULTIWORDS}/data/${YEAR}/outData/7.MatcherParAcr/acronymExp.tag.data.tag.${YEAR}.yes
    invalid
    • known in the previous ACR tag: ${MULTIWORDS}/data/current/inData/invalidMwForParAcr.data.current (all invalid LMWs from previous tagging)
    • ${MULTIWORDS}/data/${YEAR}/outData/7.MatcherParAcr/acronymExp.tag.data.tag.${YEAR}.no
    tbd
    • unknonw: used as LMW candidates
    • ${MULTIWORDS}/data/${YEAR}/outData/7.MatcherParAcr/acronymExp.tag.data.tag.${YEAR}.tbd
    • Sent to linguists for manually tagging
  • Next Steps:
    Use the auto-tags and manual-tags to calculate Precision, Recall, and F1. This is used as gold standard.