Database of American English -- useful.

4 views
Skip to first unread message

Chris

unread,
May 7, 2008, 11:12:14 PM5/7/08
to sswl.linguistics
Michael showed me this:

http://www.americancorpus.org/

I don't think we will ever have pure text in our database
(other than examples),
since it is not structured enough to allow comparative searches
(e.g., find all the languages such that X, where X is a
syntactic feature).

Chris

Michael

unread,
May 8, 2008, 12:32:03 PM5/8/08
to sswl.linguistics
Wouldn't it be cool, though, if all of our examples were tagged
syntactically for parts of speech? We're *sort* of doing this, with
our glossing, but not in such a way that you could say, give me all
the verbs for Icelandic that co-occur with prepositions, etc. (Jim
will have to tell me if what I just said even makes sense)

I have no idea how that would even be possible. I know that Mark
Davies uses automated parsing software that relies on machine
learning. That sort of seems unfeasible for us, as we'd have to train
the parser on every imaginable language, or use existing dictionaries
for the like, but we'd run out of those quickly as soon as we start
entering data for less common languages.
Reply all
Reply to author
Forward
0 new messages