• Lexicon - The MEDLINE N-Gram Set

The MEDLINE n-gram set is used to retrieve multiwords for building the SPECIALIST lexicon. Lexical Systems Group (LSG) would like to share this n-gram set (n = 1 ~ 5) with NLP|MLP community. Please download from the following links.

YearDocument CountSentence CountWord CountN-gramsDistilled N-gramsDownload
202030,420,660196,566,5134,080,670,96726,310,80810,354,021The MEDLINE n-gram set 2020
201929,138,919185,619,8873,824,268,99724,666,8169,595,606The MEDLINE n-gram set 2019
201827,837,540174,395,2093,585,789,82023,171,1338,979,895The MEDLINE n-gram set 2018
201726,759,399163,021,6403,386,661,35021,963,0378,461,972The MEDLINE n-gram set 2017
201624,358,442143,471,7762,971,013,23619,325,3387,402,848The MEDLINE n-gram set 2016
201523,343,329134,834,5072,786,085,15818,148,6926,793,561The MEDLINE n-gram set 2015
201422,356,869126,612,7052,610,209,40617,023,8196,351,392The MEDLINE n-gram set 2014