User features to tailor recs in UR queries?

Noelia Osés Fernández

unread,

Dec 5, 2017, 10:59:54 AM12/5/17

to actionml-user, us...@predictionio.incubator.apache.org

Hi all,

I have seen how to use item properties in queries to tailor the recommendations returned by the UR.

But I was wondering whether it is possible to use user characteristics to do the same. For example, I want to query for recs from the UR but only taking into account the history of users that are female (or only using the history of users in the same county). Is this possible to do?

I've been reading the UR docs but couldn't find info about this.

Thank you very much!

Best regards,

Noelia

Pat Ferrel

unread,

Dec 5, 2017, 11:38:47 AM12/5/17

to Noelia Osés Fernández, actionml-user, us...@predictionio.incubator.apache.org

The User’s possible indicators of taste are encoded in the usage data. Gender and other “profile" type data can be encoded a (user-id, gender, gender-id) but this is used and a secondary indicator, not as a filter. Only item properties are used a filters for some very practical reasons. For one thing items are what you are recommending so you would have to establish some relationship between items and gender of buyers. The UR does this with user data in secondary indicators but does not filter by these because they are calculated properties, not ones assigned by humans, like “in-stock” or “language”

Location is an easy secondary indicator but needs to be encoded with “areas” not lat/lon, so something like (user-id, location-of-purchase, country-code+postal-code) This would be triggered when a primary event happens, such as a purchase. This way locaiton is accounted for in making recommendations without your haveing to do anything but feed in the data.

Lat/lon roximity filters are not implemented but possible.

One thing to note is that fields used to filter or boost are very different than user taste indicators. For one thing they are never tested for correlation with the primary event (purchase, read, watch,…) so they can be very dangerous to use unwisely. They are best used for business rules like only show “in-stock” or in this video carousel show only video of the “mystery” genre. But if you use user profile data to filter recommendation you can distort what is returned and get bad results. We once had a client that waanted to do this against out warnings, filtering by location, gender, and several other things known about the user and got 0 lift in sales. We convinced they to try without the “business rules” and got good lift in sales. User taste indicators are best left to the correlation test by inputting them as user indicator data—except where you purposely want to reduce the recommendations to a subset for a business reason.

Piut more simply, business rules can kill the value of a recommender, let it figure out whether and indicator matters. And always remember that indicators apply to users, filters and boosts apply to items and known properties of items. It may seem like genre is both a user taste indicator and an item property but if you input them in 2 ways they can be used in 2 ways. 1) to make better recommendations, 2) in business rules. They are stored and used in completely different ways.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAMysefu-8mOgh3NsRkRVN6H6bRm6hR%2B1HuryT4wqgtXZD3norg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Noelia Osés Fernández

unread,

Dec 12, 2017, 4:29:26 AM12/12/17

to Pat Ferrel, actionml-user, us...@predictionio.incubator.apache.org

Thank you Pat!

So if I'm understanding correctly, I could set a user profile property as follows:

{
   "event" : "$set",
   "entityType" : "user",
   "entityId" : "u1234",
   "properties" : {
      "gender": "female"
   },
   "eventTime" : "2015-10-05T21:02:49.228Z"
}

Although this is not recommended. Right?

To unsubscribe from this group and stop receiving emails from it, send an email to actionml-user+unsubscribe@googlegroups.com.

To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAMysefu-8mOgh3NsRkRVN6H6bRm6hR%2B1HuryT4wqgtXZD3norg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

Noelia Osés Fernández, PhD
Senior Researcher |
Investigadora Senior

no...@vicomtech.org
+[34] 943 30 92 30

Data Intelligence for Energy and
Industrial Processes | Inteligencia
de Datos para Energía y Procesos
Industriales

member of:

Legal Notice - Privacy policy

Pat Ferrel

unread,

Dec 12, 2017, 11:53:14 AM12/12/17

to Noelia Osés Fernández, actionml-user, us...@predictionio.incubator.apache.org

In our experiments profile attributes have very little benefit if at all. Yes you can do that but you have to do use some advanced techniques to choose an LLR threshold or the model is likely (with default tuning values) to have a 100% density, meaning both genders like the item. This is an effect of the default tuning, which bypasses threshold calculation because it is no needed in most data but a Gender has only 2 possible values and the default tuning allows 50. Even if you said choose only 1, the difference in LLR score may be insignificant.

If you have a strong gender preference for items in your data it might be worth the t-digest & cross-validation tests but again in our experiments there are 2-3 very helpful secondary indicators and a whole lot of useless ones.

Pick a few things that show a user’s taste, like search terms, browsing behavior (detailed product page views), along with your primary indicator and start there. Create a baseline cross-validation score with a gold-standard dataset. Then add to it to see if the score improves or not. You should A/B test even when cross-validation seem to improve.

In several experiments it seems the more indicators you have the more you see diminishing returns. We got a 26% lift by using several indicators on the rottentomatoes movie review recommender but the las few only gave fractions of a %. 26% over using “like” alone.

https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occurences/

dni...@gmail.com

unread,

Dec 27, 2017, 12:33:03 PM12/27/17

to actionml-user

El martes, 5 de diciembre de 2017, 13:38:47 (UTC-3), pat escribió:

Location is an easy secondary indicator but needs to be encoded with “areas” not lat/lon, so something like (user-id, location-of-purchase, country-code+postal-code) This would be triggered when a primary event happens, such as a purchase. This way locaiton is accounted for in making recommendations without your haveing to do anything but feed in the data.

How do I feed the data? For example, if I record the user's browser, how should I perform queries for a new/anonymous user to take advantage of the browser field? Should I create a new user with a unique id and register a "browser event" before performing the query?

Pat Ferrel

unread,

Dec 27, 2017, 2:18:15 PM12/27/17

to dni...@gmail.com, actionml-user

Using things you think of as User profile properties is tricky but can be done as “secondary events”:

1) Secondary events like (browser-preference) or any secondary event must be tied to the same users as the primary event. You don’t need to check every event, this is done internally so that there is one user collection of all users with a primary event used to calculate the model, and all users’s events are stored for the recommendation query.

2) Secondary events are trickier to use than Primary ones. All event types are assumed to have a large number of possible items. Since each secondary event type can have a different set of items (like browser-type for browser-preference) this rule may be violated. We have ways to tune for far less items per event type but it is not automatic and involves some analytics.

TLDR; send a new secondary event composed of (user-id, “browser-preference”, browser-id) but this will not be useful without finding an LLR threshold to yield the right density in the model produced.

BTW for event types with about as many items as the primary event, like “detail-view” or “search-terms, no special tuning is required since the default tuning is likely to work fine. Only when you are using something you think of as “user profile” type data (or events with few items) as a secondary events do you need to tune specially. All data used for collaborative filtering MUST be encoded as the single primary event type of as a secondary event, so all data is tied to a user even if it is also tied to an item like an item property. Example so user property (user-id, “gender”, gender-id), item property: (user-id, “category-preference”, category-id). The latter may be triggered at the same time as the primary event just so we know that the category-preference indicates a true preference. So in some cases many events will be triggered by the same user action, like a purchase, in fact that would be a goo time to trigger a browser preference too.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/7d3dabf4-51f3-43fb-86c5-49871cb6a19e%40googlegroups.com.

dni...@gmail.com

unread,

Dec 27, 2017, 4:46:20 PM12/27/17

to actionml-user

Uhm, what I need is something like asking universal recommender "hey, I'm user x with this complementary information (browser, locale, time, etc), can you give me a recommended product of this category?" and as a result of that query a view event is generated on the recommended product.

Anyway, I'd better come back later when I'm able to parse all this stuff, hehe. Sorry for hijacking this thread, I somehow thought this was related to my use case.

Pat Ferrel

unread,

Dec 27, 2017, 5:50:46 PM12/27/17

to dni...@gmail.com, actionml-user

It may be the appropriate thread.

Do you have some type of conversion you are planning to promote with a recommender? Like a purchase, read, view, etc? This is what I was calling the “primary event”. The other information about the user may be of week value compared to the primary event. Don’t look at the user profile-ish info first. See what actions a user takes that you can record. For instance it you want a user to see a recommended item and you have other people’s purchases, use purchases as a primary event, it is what you want to promote, to get the user to do. Then you may also have view events (since you mention showing the user some item) this “view” if all users can browse or search for items also will be a secondary event. If you have search then search terms can also be a secondary event. These are all much more predictive and easier to use than user-profile-ish data. But you can use that too as I say below—if you are willing to go to the trouble of some analytics or experiments.

For a recommender deciding on the “conversion event” that you want to promote, is the vey first question. The others fall out of that and what you are able to record.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/b76ceb07-9e02-4e4e-881c-151111feb6a0%40googlegroups.com.

Diego Nieto Cid

unread,

Dec 27, 2017, 6:52:05 PM12/27/17

to Pat Ferrel, actionml-user

2017-12-27 19:50 GMT-03:00 Pat Ferrel <p...@occamsmachete.com>:

It may be the appropriate thread.

Cool :)

Do you have some type of conversion you are planning to promote with a recommender? Like a purchase, read, view, etc?

Yes, I do have buy events. The flow is like this:

1. I got a visit from a user (either returning or new) and offer him some product in a given category, recording a view of that product.

2. The user then decides to either buy it or not. If he buys, I'll record the purchase event.

My expectation was that the CCO algorithm would allow me to incorporate user metadata, like device, browser, locale, timezone, etc, to improve the decision at step 1.

But, even after reading the PredictionIO site and quick starts, I could not determine whether the secondary events sent before the query to establish that metadata would be available to the query being performed afterwards.

This is what I was calling the “primary event”. The other information about the user may be of week value compared to the primary event. Don’t look at the user profile-ish info first. See what actions a user takes that you can record. For instance it you want a user to see a recommended item and you have other people’s purchases, use purchases as a primary event, it is what you want to promote, to get the user to do.

So, purchases will be the primary event. Although, I didn't expect view to be secondary; I'll have to re-read the documentation.

Then you may also have view events (since you mention showing the user some item) this “view” if all users can browse or search for items also will be a secondary event. If you have search then search terms can also be a secondary event. These are all much more predictive and easier to use than user-profile-ish data. But you can use that too as I say below—if you are willing to go to the trouble of some analytics or experiments.

We'll surely do some experiments. What makes me confused is what I said above about the time since which events are taken into account in queries because I won't have a previous "login" event to record that profile-ish data; it all happens at the time when the recommendation is made (in step 1).

Pat Ferrel

unread,

Dec 28, 2017, 12:37:00 PM12/28/17

to Diego Nieto Cid, actionml-user

PredictionIO is a ML framework. Every Engine/Template has it’s own input and query spec. You will find no such docs for the Universal Recommender on the PIO site. In the Gallery where you see mention of the UR, there is a link to docs and support, which leads here: http://ActionML.com/docs, which is part PIO docs and part UR.

The primary event is what you want to happen when you recommend. In your case it is a “buy” right? Who cares about views if they don’t end up buying. So the user behavior that is most important is “buy” behavior—the primary event. You want a recommender to learn what the user would prefer to buy. View preference may be due to flash images, curiosity, or other motivations but they often do not lead to something they buy. The UR will use the user’s buys and find views that lead to buys. Likewise with otehr secondary events. Caveats about secondary event mentioned below still apply.

You may be violating ML input requirements if a visiting user only sees recommendations. This will “overfit” meaning the user will see self-fulfilling recommendations based off the first purchase and never see other items. The choice of how users discover items must be more open involving search and browse. Or you can mix in random recommendations to a significant degree. ML must learn the users preferences and if you only offer one choice there will be very little chance to learn.

Diego Nieto Cid

unread,

Dec 28, 2017, 1:29:12 PM12/28/17

to Pat Ferrel, actionml-user

2017-12-28 14:36 GMT-03:00 Pat Ferrel <p...@occamsmachete.com>:

You may be violating ML input requirements if a visiting user only sees recommendations. This will “overfit” meaning the user will see self-fulfilling recommendations based off the first purchase and never see other items.

Oh, I see. Yes, this overfiting would be a problem.

The choice of how users discover items must be more open involving search and browse. Or you can mix in random recommendations to a significant degree. ML must learn the users preferences and if you only offer one choice there will be very little chance to learn.

Currently what we are doing is picking a random product from the required category where each of this choices have a weight parameter. We will be gathering conversions with this approach for some time. The final goal is to improve the selection and in my attempt at using the recommender I'm probably failing to achieve it.

Since the user won't be able to pick the product voluntarily, one way would be to learn and adjust those weights as new data comes in. I'm not sure if I can do that with PredictionIO. However, it comes to mind the possibility of using as a quick hack the recommendation scores as weights and keep the random selection.

Anyway, it's starting to feel like I'm adventuring in too deep and unknown waters that requires me to read a lot about ML. Maybe some other template is better suited to this purpose. :)

Pat Ferrel

unread,

Dec 28, 2017, 2:37:04 PM12/28/17

to Diego Nieto Cid, actionml-user

This is a valid use of the CCO algorithm and I don’t know of a better solution. You just have to allow random recommendations to keep from overfitting and this must be constant, not just for some period of time. Other recommenders do not allow more than one event type so until the user makes a purchase there will be nothing to recommend. The UR will recommend for any primary of secondary event recorded for a user.

You need either unsupervised or semi-supervised learning, because you can only train from user’s behaviors (or profile data) and that narrows down the field of ML considerably. In any case for these types of learning, you have to gather unbiased data from users. So you are back to mixing random recs and collaborative filtering recs.

Adding some method of browsing or searching inventory would help solve the overfitting problem and give you more secondary events. but random recs + CF recs would also do the trick.

BTW “random” I mean a mix of truly random, popular, trending, promoted, or based on item properties, anything not an actual CF rec.

We have run into this kind of application before and see the same issues here. You have a concept for an app but the data science does not support the idea exactly as defined. Always follow the data science or you may get an app that has no value. When designing a new UX not all will work. Try it out by doing an A/B test where one cohort gets nothing but the way you recommend now (random?) and the other will recommend based on CCO with mixed “random”. If you get lift in conversion rate, demand a raise ;-) But also remember that it may take some time to show significance.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAK5adC4xodXiK-5BU0JRvAU7dizJh-qwKcSLFPOFSd1S53G6-A%40mail.gmail.com.

Reply all

Reply to author

Forward