[features total 134]
Apart from POS, following features are specified and allotted in the five domains of each English word or MWE in the square brackets, they are:
1. 1st domain – word structural information
the numbers represent: [7]
[1+(space)+single word token (“1” -- most high-frequency sense)
[2+(space)+single word token (“2” -- more high-frequency sense
[3+(space)+single word token (“3”-- general high-frequency sense )
[7+(space)+single word token (“7” -- never-chosen for E-C MT
[4+(1~10/11/12...)+head of MWE (“4” -- most high-frequency sense)
[5+(1~10/11/12...)+head of MWE (“5” -- general requency sense
[6+(1~10/11/12...)+head of MWE (“6” – never-chosen for E-C MT
(1~10/11/12...) represents the place of the head in the MWE
2. 2nd domain – POS information [17]
noun
pron
verb
adj
adv
num
det – determiner (the)
aux – auxiliary verb (be,have,do,can, may…)
wh – wh-words (who,when,as,though)
conj – coordinate conjunctions
prep – prepositions (in,for, according to, on behalf of…)
pp – preposition phrase(on bad terms…)
infs – infinitive sign (to, in order to, so as to)
expr – an expression, syntactically independent
echo – a sound
prefix
suffix
3. 3rd domain -- Word-ending information: [11]
-0
-s (children)
-0s (police)
-v (wrote)
-ed (bought)
-en (witten)
-ing (feeling, meeting)
-ings
-aaa (put)
-aba(come-came-come)
-aab (beat-beat-beaten)
4. 4th domain – syntactic information [4]
neg – negative (no, none…)
present – present time
past – past time
future – future time
4.1 Word/MWE form information: [6]
fup – first letter in upper-case
aup – all letters in upper-case
acro – acronym
endthat -- MWE ended in "that" (it is said that,
endprep -- MWE ended in a preposition (be confident in, rely on…)
endto -- MWE ended in an infinitive-to (be determined to)
4.2 for nouns: [9]
action – action noun
ofnpa— in “action-noun + X”: X normally as its patient
ofnag-- in “action-noun + X”: X normally as its agent
static—static noun
uncount—uncountable noun
unit – a unit (metre, feet, gram…)
proper – a proper noun (Canada, Tokyo, Obama, IBM…)
nomodinoun – almost never be used to modify a noun (means)
moneysign – a moneysign ($,¥..)
4.3 for numerals [19]
ordi—ordinal (first, 2nd, tenth…)
card—cardinal (1,two,million…)
predet – can come before determiners (all, both, either, none of)
singular—singular (1)
plural—plural (2,10,100…)
indef – indefinite amount (some)
demo – demonstrative (this, those)
date— indicates “date” (1- 31)
hour— indicates “hour” (1- 24)
minute — indicates “minute” (1 - 60)
bage— indicates “baby’s age” (1 - 3)
tage— indicates “troddler’s age” (4 - 5)
cage— indicates “child’s age” (6 - 9)
tnage—indicates “teeage’s age ”(10 - 15)
yage – indicates “youth’s age” (16 - 41)
mage – indicates “middle-age”(42 - 69)
oage—indicates “old age” (70 -112)
1digit – indicates one-digit number (1,2,3…9)
2digit – indicates two-digit number (10,11,12…99)
3digit – indicates three-digit number(100,101,110…999)
4digit – indicates four-digit number (1000,1001,1110…1999)
4.4 for pronouns [5]
accusat – accusative case (him)
nominat -- nominative case (we)
reflex – reflexive (oneself)
singular—singular
plural-- plural
4.5 for verbs:[15]
vt – transitive
vi – intransitive
copula – copula
linkall – copula which can take n/adj/ved/ven as its compliment (be)
linkadj – copula which can take adj/ved/ven as its compliment (get)
sobj – can take a single obj (destroy)
dobj – can take a double obj (give)
ingobj – can take a Ving obj (start, keep, stop…)
thatobj – can take a that-clause as its obj (say)
whobj – can take a wh-clause as its obj (ask)
toobj – can take an to-infinitive-verb as its obj (want)
nouncomp – can take a noun as its compliment (call)
tocomp – can take an to-infinitive-verb as its compliment (ask)
adjcomp – can take an adj as its compliment (paint)
ingcomp – can take a ving as its compliment (keep)
4.5 for adjectives: [5]
ender -- for adj/adv takes "-er" ending when in comparative
endmore -- for adj/adv takes more" when in comparative
prepo – usually comes before its modified
postpo -- usually comes after its modified
infmodi – often followed by an infinitive-phrase (difficult, eager…)
4.6 for adverbs: [6]
ender -- for adj/adv takes "-er" ending when in comparative
endmore -- for adj/adv takes more" when in comparative
minf -- can be used to modify infinitives (merely, just…)
mnum -- can be used to modify numerals (only, about, just…)
madjonly -- can only be used to modify adjectives
sentent – usually used as a sentential adverb (it be said that)
4.7 for conjunctions: [3]
conjand -- conjuction "and"
conjbut -- conjunction "but"
conjor -- conjunction "or"/’nor’
4.8 for infinitive sign: [2]
infw – single word (to)
infp – MWE (in order to, ready to, so as to…)
4.9 for auxiliaries: [6]
auxdo -- auxiliary "do"
auxheve -- auxiliary "have"
auxbe -- auxiliary "be"
modal – modal auxiliaries (can, may, should, would rather…)
tense – tense auxiliaries (will, would, shall, should)
quasiaux – quasi-auxiliaries (may just as well, be more likely to)
4.10 for wh-words: [7]
question – can be used as a question-word (who, which,why…)
objclaus – can introduce an object clause (which, what, if…)
whinf – can introduce an infinitive-phrase (how, which, what…)
person – a person (who)
advsubj – can be the subject of the clause (who, which, that…)
anonsubj – can not be the subject of the clause (why,) adverbial – can introduce an adverbial clause (though, if…)
4.11 for prepositions or pp: [3]
advb – mostly used to be an adverbial (during, by…)
attr -- mostly used to be an attributive (like, regarding…)
ingpred – when a ving after it, ving mostly is a quasipredicate (by…)
4.12for punctuations: [9]
symbolic – a symbolic (&, #...)
fullstop – fullstop (“.”. “…”)
quesmark – question mark (?)
bracket – brackets (“”, (, )
hyphen – hyphen (-)
comma – comma (,)
colon – colon (:)
title – (#)
semicolon – (:)