http://downloads.cloudmade.com/europe/western_europe/belgium#downloads_breadcrumbs
-- André
3) Average delta of price per word in the description compared to the average price of its city:
bathroom -> +0k
sauna -> +100k
2 bedrooms -> -150k
Based on all these mappings, you get a formula (start with all sums) for a single add which results into a "predicted price".
Try that formula on every add in the database and compare it with the real price of the add.
For 80% of the cases, this should be withing 5% correct.
Those other 20% of the cases, are the bargains and anti-bargains.
Now, when a new add is added to immoweb, we know if it's probably a bargain or not :)
Filter out all adds without description or price or ...
This kind of stuff works. If and only if there's enough data. There is a critical amount of data we need for this to work.Having 100 times more data can mean having a accuracy improvement from 70% (useless) to 99% (useful).I am not sure if enough houses are being sold for there to be enough data.
1) Watch out for outliers. Pretty much all possible ways of computing an average tend to be quite sensitive to outliers, so everything could get messed up by a relatively small number of prices that somehow have a few too many zeros for instance. Throwing more data at the problem should mitigate this issue, but even with substantial amounts of data you might still want to make sure your model is robust enough to deal with outliers.
2) Take trends into consideration. A price that would've been considered a bargain a few months or years ago might not be a bargain anymore today, so I'm guessing it would make sense to take trends into account by, e.g., using something like exponential smoothing instead of simply computing averages.
3) Maybe even take seasonality into consideration. I'm not sure to what extent is is true, but I've heard that december is a good month for getting an apartment in some cities because most people tend to wait for the new year when planning to move. Such seasonal effects could be taken into account by using a more advanced modeling technique such as exponential smoothing as well.
Just my 2 cents,
-Klaas