extracting financial information from text

41 views

Skip to first unread message

Ryan

unread,

May 8, 2017, 3:29:48 PM5/8/17

to nltk-users

This rate environment also produced investment portfolio buying opportunities

resulting in a $102.1 million increase in average investments. Investment

interest income in 2000 was $10.4 million higher than the prior year as a result

of the higher outstanding as well as an increase in the average investment

yield from 6.88% to 7.35%. Rising market rates in the latter half of 1999 and

first half of 2000 increased the yield on new investments and were the primary

cause of the increase in average investment yield.

e.g 2000, $10.4, 7.35%

I want to be able to pull out the data from text including year/values/commentary/etc. I've been messing around with a pos tagging/ner (stanford)/regex, does anyone have any other options that I should look into? Came across Conditional Random Fields today, haven't looked into.

Thanks

Alex Rudnick

unread,

May 8, 2017, 4:47:22 PM5/8/17

to nltk-...@googlegroups.com

As a first step, you should collect a small test set, so you can
experiment with it and see how well different methods do. Build the
simplest extractor that you can manage, then see what kinds of errors
it makes on your test set. Then tweak it so that it does better! Maybe
you'll decide that you need to try different methods -- maybe just
regexes (or taggers + regexes) don't get you as far as you want. Or
maybe you can get great accuracy with the methods you already know
about!

Eventually, collect another test set and see how your best method does
on that one.