extracting financial information from text

41 views
Skip to first unread message

Ryan

unread,
May 8, 2017, 3:29:48 PM5/8/17
to nltk-users

This rate environment also produced investment portfolio buying opportunities
resulting in a $102.1 million increase in average investments. Investment
interest income in 2000 was $10.4 million higher than the prior year as a result
of the higher outstanding as well as an increase in the average investment
yield from 6.88% to 7.35%. Rising market rates in the latter half of 1999 and
first half of 2000 increased the yield on new investments and were the primary
cause of the increase in average investment yield.


e.g 2000, $10.4, 7.35%

I want to be able to pull out the data from text including year/values/commentary/etc.  I've been messing around with a pos tagging/ner (stanford)/regex, does anyone have any other options that I should look into?  Came across Conditional Random Fields today, haven't looked into.

Thanks


Alex Rudnick

unread,
May 8, 2017, 4:47:22 PM5/8/17
to nltk-...@googlegroups.com
As a first step, you should collect a small test set, so you can
experiment with it and see how well different methods do. Build the
simplest extractor that you can manage, then see what kinds of errors
it makes on your test set. Then tweak it so that it does better! Maybe
you'll decide that you need to try different methods -- maybe just
regexes (or taggers + regexes) don't get you as far as you want. Or
maybe you can get great accuracy with the methods you already know
about!

Eventually, collect another test set and see how your best method does
on that one.
--
-- alexr
Reply all
Reply to author
Forward
0 new messages