• Lexicon
  • UTF-8, XML, ASCII
  • , 2014 Release:

The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English, designed/developed to provide the lexical information needed for the SPECIALIST Natural Language Processing System (NLP) which includes SemRep, MetaMap, and the Lexical Tools. It is intended to be a general English lexicon that includes many biomedical terms. Coverage includes both commonly occurring English words and biomedical vocabulary. The lexicon entry for each lexical item (word or term) records the syntactic, morphological, and orthographic information needed by the SPECIALIST NLP System.

The SPECIALIST LEXICON (unit lexical record formatted file) along with relational files are released annually as one of the UMLS Knowledge Sources since 1994. In addition to its distribution with the UMLS, it is available as an open source resource subject to these terms and conditions. The XML format of unit lexical record was first available in 2003 through LexAccess. In 2006, XML schemas and JAXB (Java Architecture XML Binding) APIs are released. In addition, all files are released in UTF-8 format. In 2009, a pure ASCII file, LEXICON.ascii, is added to the annual release for NLP projects interests only in ASCII. In 2013, all derivations in Lexicon (including zeroD, suffixD, and prefixD) along with negation information are added to annual release (derivation.data) by a systematic methodology.