• Lexicon - The MEDLINE N-Gram Set


The MEDLINE n-gram set is used to retrieve multiwords for building the SPECIALIST lexicon. Lexical Systems Group (LSG) would like to share this n-gram set (n = 1 ~ 5) with NLP|MLP community. Please download from the following links.

YearDocument CountSentence CountWord CountN-gramsDistilled N-gramsDNg/Ng %Download
202131,850,051209,685,5174,365,354,06028,103,25211,127,80239.60%The MEDLINE n-gram set 2021
202030,420,660196,566,5134,080,670,96726,310,80810,354,02139.35%The MEDLINE n-gram set 2020
201929,138,919185,619,8873,824,268,99724,666,8169,595,60638.90%The MEDLINE n-gram set 2019
201827,837,540174,395,2093,585,789,82023,171,1338,979,89538.75%The MEDLINE n-gram set 2018
201726,759,399163,021,6403,386,661,35021,963,0378,461,97238.53%The MEDLINE n-gram set 2017
201624,358,442143,471,7762,971,013,23619,325,3387,402,84838.31%The MEDLINE n-gram set 2016
201523,343,329134,834,5072,786,085,15818,148,6926,793,56137.43%The MEDLINE n-gram set 2015
201422,356,869126,612,7052,610,209,40617,023,8196,351,39237.31%The MEDLINE n-gram set 2014

References: