- UTF-8, XML, ASCII
- , 2018 Release:
The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English, designed/developed to provide the lexical information needed for the SPECIALIST Natural Language Processing System (NLP) which includes SemRep, MetaMap, and the Lexical Tools. It is intended to be a general English lexicon that includes many biomedical terms. Coverage includes both commonly occurring English words and biomedical vocabulary from a variety of sources. These include (not limited to) MEDLINE citation records, terms in the Dorland's Illustrated Medical dictionary, the 10,000 most frequent words listed in the American Heritage Word Frequency book and the 2,000 lexical items used in the controlled definitions of Longman's Dictionary of Contemporary English. The lexicon entry for each lexical item (word or term) records the syntactic, morphological (inflection and derivation), and orthographic (spelling variants) information needed by the SPECIALIST NLP System.
The SPECIALIST LEXICON (unit lexical record formatted file) along with relational files are released annually as one of the UMLS Knowledge Sources since 1994. In addition to its distribution with the UMLS, it is available as an open source resource subject to these terms and conditions. The XML format of unit lexical record was first available in 2003 through LexAccess. In 2006, XML schemas and JAXB (Java Architecture XML Binding) APIs are released. In addition, all files are released in UTF-8 format. In 2009, a pure ASCII file, LEXICON.ascii, is added to the annual release for NLP projects interests only in ASCII. In 2013, all derivations in Lexicon (including zeroD, suffixD, and prefixD) along with negation information are added to annual release (derivation.data, DM.DB) by a systematic methodology. In 2017, a new system is developed to add all synonymous terms in the Lexicon (lexSynonyms) to the synonym database file (SM.DB).