The big improvements here are to improve and stabilize top-N recommendation and evaluation APIs. We now have a 'candidate selector' component, and the top-N recommender implementation uses an 'unrated items' candidate selector by default. This means you can now ask for recommendations without needing to specify a candidate list yourself - it will memorize users' rated items from data passed to 'fit'. Consequence: you need to 'adapt' a predictor into a recommender before you call 'fit' to train it.
We also have the new RecListAnalysis API that makes it easy to correctly compute top-N evaluation metrics. It was possible to compute nDCG before, but it was difficult to so correctly; computing recall required a lot of manual work. Now that we have more internal experience with the APIs, we have settled on this to make it straightforward to compute a wide range of top-N metrics. Pull requests are welcome for new metrics!
Enjoy :)