Distilled Process Log - Walk Through, 2017

This page describes the filter processes from the MEDLINE n-gram set to generazte the distilled MEDLINE n-gram and candidate multiword list:

ID/ProgramIn No.Filtered No. (%)Out No.Pass RateAcc. Pass RateFilter example and notes
Generate the MEDLINE n-gram set
Generate MEDLINE n-gram set4,962,844,216
  • N=1: 27,261,960
  • N=2: 258,150,841
  • N=3: 887,664,290
  • N=4: 1,650,912,612
  • N=5: 2,138,854,513
4,940,881,179 (99.56%)
  • N=1: 26,285,088 (96.42%)
  • N=2: 252,428,631 (97.78%)
  • N=3: 879,567,758 (99.09%)
  • N=4: 1,645,868,459 (99.69%)
  • N=5: 2,136,731,243 (99.90%)
21,963,037
  • N=1: 976,872
  • N=2: 5,722,210
  • N=3: 8,096,532
  • N=4: 5,044,153
  • N=5: 2,123,270
0.4403%N/A From MEDLINE TI & AB to the MDELINE n-gram set
  • filter out n-grams with length > 50
  • filter out n-grams with word count < 30

  • Calculated by Excel (manualy input In and Out No.)
Basic operation: Sort nGrams by DC|WC|Terms
ID-01
  • NGramFilter: SortNGramByDcWcTerm
  • Param: 1, 01
  • Run Time: 1 Min.
21,963,037021,963,037100.0000%100.0000%
  • Create link: ./05.ApplyFilters/nGram.${YEAR}
Apply General Exclusive Filters
ID-10 21,963,0371421,963,02399.9999%99.9999%
  • |
  • (|r|
  • ||
  • Ag|AgCl
  • |D|
  • |E|
  • lambda(||)
ID-11 21,963,02348521,962,53899.9978%99.9977%
  • =
  • <
  • +/-
  • >
  • -

  • -->
  • (+)
  • (%)
  • "+"
  • ((-/-))

  • ==>
  • [...]
  • *}
  • *//
ID-12
  • Filter: Digit
  • InTerm: core-term.lc
  • Param: 2, 12
  • Run time: 2 Min (norm - strip punc and space)
21,962,538151,76421,810,77499.3090%99.3067%
  • 2
  • 1
  • 3
  • 10
  • 4

  • 95%
  • 2,
  • 2000
  • 3-5
  • +/-0.5
  • (+/-0.05)
  • $1,500
  • "3 + 1"
  • 55834

  • 192.168.1.1
  • [192, 168]
  • (+15%),
ID-13
  • Filter: Number
  • InTerm: core-term.lc
  • Create link: ./inData/NRVAR
  • Param: 2, 13
  • Run time: 2 Min
21,810,7744,62721,806,14799.9788%99.2857%
  • and
  • two
  • one
  • first
  • three

  • first and second
  • one third
  • twenty-eight
  • NINE
  • zeroth and
  • Four hundred and forty-seven

  • zero-one
  • 'half'
  • One"
ID-14 21,806,147175,76921,630,37899.1939%98.4854%
  • of the
  • in the
  • to the
  • and the
  • on the

  • In the
  • and/or
  • 50% of
  • 1, 2, and
  • 2003 to
  • 2003 to 2007
  • for >=50%
  • the 8:2
  • -196 to -174

  • OR-462
  • AND-34
  • IN-1130
  • And-1
Apply Exclusive Filters - pattern
ID-20 21,630,378237,01421,393,36498.9043%97.4062%
  • tomography (CT)
  • imaging (MRI)
  • resonance imaging (MRI)
  • oxide (NO)
  • reaction (PCR)

  • chain reaction (PCR)
  • polymerase chain reaction (PCR)
  • magnetic resonance imaging (MRI)
  • computed tomography (CT)
  • enzyme-linked immunosorbent assay (ELISA)
  • single nucleotide polymorphisms (SNPs)
  • magnetic resonance (MR) imaging

  • "Standards, Options and Recommendations" (SOR)
  • (CREB)-binding protein (CBP)

  • kinase (ASK)
  • proline-rich polypeptide (PRP)
  • semi-permeable membrane devices (SPMDs)
ID-21 21,393,364390,42521,002,93998.1750%95.6286%
  • a significant
  • a single
  • a high
  • a novel
  • a case

  • a very
  • a group
  • a dose-dependent
  • A series
  • A and B
  • a meta-analysis
  • a SIF
  • A alpha C
  • A nonseminomatous

  • a delivery rate per
  • A beta 2m
  • a beta ab
ID-22 21,002,939136,73820,866,20199.3490%95.0060%
  • RESULTS:
  • METHODS:
  • CONCLUSIONS:
  • CONCLUSION:
  • BACKGROUND:

  • OBJECTIVE:
  • OBJECTIVE: To
  • MATERIALS AND METHODS:
  • SETTING:
  • PURPOSE: To
  • INTRODUCTION:
  • AIM: The
  • L: -DOPA
  • 95% PI:

  • PHPT:
  • months [95% CI:
  • vs N:
  • mode MIC:
  • [95 % CI:
ID-23 20,866,201165,06920,701,13299.2089%94.2544%
  • (n =
  • (P <
  • (P =
  • (p <
  • P <

  • (P < 0.05)
  • 95% CI =
  • P<0.001),
  • CI},
  • US$
  • VSL#3
  • N^N
  • group (n=6) received
  • CYP3A7*1C

  • studies; average
  • n.; Trichoteleia
  • sp. n.; Trichoteleia
ID-24 20,701,132372,67020,328,46298.1998%92.5576%
  • two groups
  • 6 months
  • 24 h
  • (ABSTRACT TRUNCATED AT 250 WORDS)
  • the two groups

  • 5 years
  • at 37 degrees
  • 3 times
  • 100 mg
  • January 1,
  • 10 mg/kg
  • 12-year-old
  • at -20 degrees C
  • September 2006
  • 65 years or older with
  • 20 cigarettes per day
  • 3 - 6 months

  • 6 hours plus
  • minutes) per day, 5 days
  • MMR + V
  • 3 mg/EE
  • 317615 x
ID-25 20,328,462193,55920,134,90399.0478%91.6763%
  • group (P
  • significant (P
  • years) with
  • significantly (P
  • years) and

  • interval [95%
  • see text] The
  • lt; 0.05) lower
  • CENTRAL) (The
  • nM (SD
  • pOGH (ANG

  • cB72.3(gamma
  • new species (type
Apply Exclusive Filters - Lead-End-Terms
ID-30 20,134,9035,313,42914,821,47473.6109%67.4837%
  • of a
  • that the
  • from the
  • is a
  • of this

  • The results
  • was observed
  • this study was
  • about 50%
  • - but not
  • "what is
  • AND COURSE

  • iT reg
  • of FoxM1b
  • or spinal or conduction
  • or spinal or conduction block,
ID-31 14,821,4743,068,81311,752,66179.2948%53.5111%
  • patients with
  • associated with
  • at the
  • suggest that
  • between the

  • in patients with
  • results suggest that
  • MATERIALS AND
  • cross-reacted with
  • (ST 36) and
  • Zusanli (ST 36) and
  • determine whether this could
  • primarily composed of the

  • tilt-in-space and
  • systems, assays and
  • ppm Cu as
  • epidural or spinal or
ID-32 11,752,6612,96311,749,69899.9748%53.4976%
  • in a
  • to be
  • with a
  • as a
  • may be

  • In a
  • in A.
  • For one
  • on NO

  • anti-NOR
  • plus AT
  • I/a
  • AS-ON
  • anti-OF
ID-33 11,749,6981,639,75510,109,94386.0443%46.0316%
  • to determine
  • In addition,
  • to evaluate
  • to assess
  • to investigate

  • in the presence
  • AT 250
  • As a result,
  • ON THE TREATMENT
  • as a possible treatment for
  • in details,
  • - for example,
  • within working memory
  • for various chronic
  • in 0.1% trifluoroacetic
  • in threatened preterm labor

  • with the MIC90S
  • On PTD
  • plus LHRH-A
  • with the MIC90S of
ID-34 10,109,9431,647,9718,461,97283.6995%38.5282%
  • effects of
  • number of
  • use of
  • presence of
  • used to

  • Comparison of
  • low cost of
  • HPV) in
  • NUMBER OF
  • zymography was used to
  • loss of two or more

  • 1 goes to
  • active with the MIC90s of
  • syn. nov. of
  • microg/mmol of
The final results of above is used as the distilled MEDLINE n-gram set
Apply Exclusive Filters - Project domain
ID-40 8,461,972785,1057,676,86790.7220%34.9536%
  • of
  • the
  • in
  • to
  • a

  • The
  • We
  • "The
  • linear,
  • "normal"
  • {systematic name:
  • systematic name
  • anterior intermeniscal ligament
  • regional low-flow perfusion

  • Neo.
  • Cannon &
  • Polycentropus
  • Penneys &
  • % (month