# ----------------------------------------------
# SPECIALIST Tagset (T1)
# This is the tagset based on the SPECIALIST
# categories.
#
# See http://lexsrv3.nlm.nih.gov/SPECIALIST/
# Projects/lexicon/2006/release/
# LEXICON/DOCS/techrpt.pdf
# for a definition of the principle categories
#
# The punctuation categories tags came primarily
# from the conversion that Larry Smith does to convert
# his tagset to the SPECIALIST Tags. [Need a cit here]
#
# The Shape tags come from shapes that the Xerox
# Parc tagger identified, or from shapes that
# the textTools identifies or hopes to identify.
#
# A "1" in the open class column indicates this tag
# is an open class. This info is useful to know when
# guessing a class - we only want to guess open classes.
# An open class is defined to be a lexical category whose
# membership is typically large and which can easily accept
# new members. In English, this includes the categories of
# noun, verb, and adjective. [A Dictionary of Grammatical
# Terms in Linguistics, R.L. Trask, c. 1993, pg. 195,]
#
# We will presume that we have a lexicon filled with
# all the closed class words and tokens.
#
# Those tags that have a 1 in the Shape column,
# indicates tags that won't be seen identifying
# a term within an official lexicon. Rather
# these tags will be put on terms recognized
# by shape identifiers such as numbers, url ...
#
# This tagger heavily relys upon the END tag. This should
# be present in all tagsets associated with this tagger.
# The END tag is a tag that is implisitly put before
# the beginning and after the end of an utterance (sentence).
#
# The java code needs to know about numbers and punctuation.
# The num and punt tags should always remain in any tagset,
# as represented here - or alter the TagSet.getNumberTagId()
# and TagSet.getPunctuationTagId() methods to correspond
# to the
#
#-+----------+---------------------------------+-----+-----+-----------
# | POS | |Open |Shape|Example
# | tag | Name |Class| |Character
#-+----------+---------------------------------+-----+-----+-----------
end |END |0| |
noun |noun |1| |
adj |adjective |1| |
adv |adverb |1| |
verb |verb |1| |
aux |auxilliary verb "be", "do" |0| |
modal |modal verb "have" |0| |
to |infinitive marker to |0| |
conj |conjunction |0| |
pron |pronoun |0| |
compl |complementizer(that) |0| |
det |determiner |0| |
pos |genitive marker |0| |
prep |preposition |0| |
num |number or numeric |0|1|
real |real number |0|1|
unknown |unknown |0|1|
punct |punctuation |0|1|
pd |end of sentence period |0| |.
cm |comma |0| |,
hy |hyphen |0| |-
cl |colon |0| |:
; |semiColon |0| |;
ap |right quote or double quote |0| |'"
bq |left quote (backquote) |0| |`
lp |left parenthesis |0| |(
rp |right parenthesis |0| |)
~ |tilda |0| |~
! |bang |0| |!
@ |at sign |0| |@
pound |pound sign |0| |#
$ |dollar sign |0| |$
% |percent sign |0| |%
^ |carrot sign |0| |^
& |and sign |0| |&
* |asterisk |0| |*
= |equal sign |0| |=
_ |underBar sign |0| |_
+ |plus sign |0| |+
{ |left curly bracket |0| |{
} |right curly bracket |0| |}
bar |bar |0| ||
[ |left bracket |0| |[
] |right bracket |0| |]
\ |backslash |0| |\
/ |slash |0| |/
< |lessThan |0| |<
> |greaterThan |0| |>
? |questionMark |0| |?
tab |tab |0| |
shape |shape |0|1|
prefix |prefix |0|1|
money |money |0|1|
phone |phonenumber |0|1|
date |date |0|1|
url |URL |0|1|
email |EMAIL address |0|1|
unitOfMeasure|unit of measure |0|1|
chem |chemical |0|1|
propername |proper name |0|1|
acronym |acronym |0|1|
localAcronym |local acronym |0|1|
percent |percent number |0|1|
fraction |fraction |0|1|
range |range |0|1|
glob |glob |0|1|
equation |equation |0|1|
levelOfSignificance|level of significance |0|1|
experimentSize|experiment size |0|1|
none |none |0|0|