Active Learning papers focused on "skewed labels" or "rare events" (Pr[Y=1] < 0.01)?

60 views
Skip to first unread message

shira.mit...@gmail.com

unread,
Jan 22, 2019, 9:47:23 AM1/22/19
to Active Learning (Machine Learning)
Hi Active Learning folks!

I did my PhD/postdoc in statistics, and now work at the NYC Mayor's Office. We are frequently faced with an Active Learning scenario:
  • Variables X are available for all buildings. (e.g. X = water usage, building complaints, etc.)
  • Variable Y is collected by inspectors. (e.g. Y = 1 if the building is vacant, rare in NYC!)
  • Each week, we can send inspectors to some buildings to collect Y.
  • Where do we send the inspectors?
I read Burr Settles' awesome textbook (http://active-learning.net/). Section 7.4 Skewed Label Distributions is very relevant for many of our scenarios (e.g. Y = vacancy, Pr[Y=1] is tiny). I would like to sample based on expected error reduction (Chapter 4) rather than heuristics (Chapters 2-3).

What are some papers to read? software packages? (Our team uses R and Python)

Thank you so much!
Shira
Reply all
Reply to author
Forward
0 new messages