Hi Active Learning folks!
I did my PhD/postdoc in statistics, and now work at the NYC Mayor's Office. We are frequently faced with an Active Learning scenario:
- Variables X are available for all buildings. (e.g. X = water usage, building complaints, etc.)
- Variable Y is collected by inspectors. (e.g. Y = 1 if the building is vacant, rare in NYC!)
- Each week, we can send inspectors to some buildings to collect Y.
- Where do we send the inspectors?
I read Burr Settles' awesome textbook (
http://active-learning.net/). Section 7.4 Skewed Label Distributions is very relevant for many of our scenarios (e.g. Y = vacancy, Pr[Y=1] is tiny). I would like to sample based on expected error reduction (Chapter 4) rather than heuristics (Chapters 2-3).
What are some papers to read? software packages? (Our team uses R and Python)
Thank you so much!
Shira