Universal Recommender: How to Test Recommendation Results?

mat...@aydus.com

unread,

Aug 14, 2016, 10:21:34 PM8/14/16

to actionml-user

PredictionIO 0.9.7 and Universal Recommender are setup and technically seems to be working okay. However, I'm having a hard time testing recommendation results. The current recommendations seem slightly better than random. Any advice greatly appreciated.

Simplified model has three events: buy, like, view.

Buy is the primary event.

Total items = 100.

To test, I've created 10 users. Each user is shown two items. To keep it very simple -- each user has a color preference (white or black). The user chooses the item's color that most matches their preference.

Two "view" events are fired for each item pair displayed. One like event is fired (based on color preference). Only 2-3 buy events are fired on very specific black or white items.

Each user is shown a total of 100 items.

Train/deploy done after 2 users worth of data.

To start, set events are fired on all 100 items. All items have the same category.

A set user event is created once for each user.

For each additional user, we expect to be presented with more and more relevant color choices. For example, the first couple of users have zero data to draw from and use popularity score = 0. But these users add data showing a trend to either white or black colored items. So that when subsequent users are served recommendation results with score > 0 they see more and more black items or more and more white items.

In the recommendations being served up we noticed two major things:

1. A buy is such a strong indicator (and we have so few) that it tends to trump color preference. Indicating a preference for white items leads quickly to black items (with a buy event).

2. The first few selections seem skewed towards color. However, even the 10th user with a white color preference will be presented with white items right to the end (so many black items are being shown first).

Also tried using just like and view events to keep it even simpler but got similar results.

query.json (use bias instead of filter for popularity results if no recommendations exist)

{
  "user": "u0",
  "fields": [
        {
            "name": "categories",
            "values": ["shoes"],
            "bias": 2
        }
    ]
}

engine.json

{
  "comment":" This config file uses default settings for all but the required values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "org.template.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "sample-daydream-data.txt",
      "appName": "DaydreamUniversal",
      "eventNames": ["buy", "view", "like"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "spark.executor.memory": "4g",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
      "name": "ur",
      "params": {
        "appName": "DaydreamUniversal",
        "indexName": "urindex",
        "typeName": "items",
        "eventNames": ["buy", "view", "like"],
     "blacklistEvents": ["view"],
     "num": 2,
        "backfillField": {
            "eventNames": ["buy", "like"]
         }
      }
    }
  ]
}

Pat Ferrel

unread,

Aug 15, 2016, 11:30:40 AM8/15/16

to mat...@aydus.com, actionml-user

The recommender effect can’t happen on such a small scale. Recommenders are big-data apps and need it to work. They find people of similar taste and use this to recommend, with only a few people there won’t be a way to find these groups with similar taste.

Nothing wrong with your choice of setup and events afaict. But I don’t see how you are using color as an indicator. If you mean “color-preference” instead of “like” then maybe you should rename it. A convention with the UR is to name an event to describe the item-type, so if you are sending a color when you get a like, you would be capturing a “color-preference” on a color-id.

2 different colors are also impossible to use, the UR is set to expect many different id values for an indicator. This is being addressed UR 0.4.0, which will be released later this Summer. It should detect (with a larger number of users) useful indicators with very small numbers of ids.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/b71abd79-b127-4908-b187-7fb65588ab15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mat...@aydus.com

unread,

Aug 15, 2016, 11:59:41 AM8/15/16

to actionml-user, mat...@aydus.com, p...@occamsmachete.com

Thanks as always Pat! Very helpful. Understood that our amount of data is simply too small. The example above is simplified/contrived, so like is the appropriate event since color preference was just a way to test results.

One significant change we're thinking of is including dislikes. However, in this case, does it make sense? If only two items are displayed and one liked, then we could fire the like (and then a dislike on the other item). However, we're not sure what effect (if any) this will have on recommendations.

Pat Ferrel

unread,

Aug 15, 2016, 1:39:42 PM8/15/16

to mat...@aydus.com, actionml-user

Not sure what you mean “only 2 items are shown”. You should be aware that recommenders will not work unless there are other ways for users to find things. You must have ways to browse or search or enter the site from Google.

In the real app the user will have many opportunities to see many items and in this case a “dislike” can be very useful. Search this forum or Google it I’ve written about using dislikes in reference to Rottentomatoes “rotten” reviews.

mat...@aydus.com

unread,

Aug 15, 2016, 3:31:13 PM8/15/16

to actionml-user, mat...@aydus.com

To clarify "only 2 items are shown". The idea is to show people two items at a time and have them choose which one they like best. Then show them another two (and another two etc.). The expectation is that as we learn about what a person likes, we can serve more and more relevant items. In an ideal scenario, the person is being shown more and more relevant items until they choose to buy. Is there any reason why Universal Recommender can't handle this?

The dislike/rottentomatoes posts are interesting. This makes sense when the dislike is implicit. In our case, our dislike is implicit i.e. "given a choice between item A and B which do you prefer?" doesn't explicitly mean one item is disliked. But if the more data UR get's the merrier it makes sense that this would work in this case (or at worst be ignored).

Pat Ferrel

unread,

Aug 16, 2016, 10:52:24 AM8/16/16

to mat...@aydus.com, actionml-user

first the RT example is explicit (maybe you meant that)

No reason at all it won’t handle this if the choices are not recommendations. If you only show recommendations you will get an overfit model. How do you choose what 2 to show?

We did a demo of a video trainer here: guide.actionml.com. You need to signup so we can remember choices. Once signed up go to the trainer page and see how we’ve done it. The choice of what to show was clustered popular videos. Each page of videos you can like or dislike explicitly and you can do that for several at a time. Each page is a cluster so the variance between selections on each page are the most differentiating, this helps make recommendations better quicker. They are popular so people are more likely to know about them. They are not recommendations so the model does not get over fit.

"Over fitting" is like "self-fulfilling". Recommenders *all* (not just the UR) need to be fed data where the user was not shown only recommendations or they will overfit. This happens in the demo by browsing and searching videos with user specified criteria, see the gear icon for options.

On Aug 15, 2016, at 12:31 PM, mat...@aydus.com wrote:

To clarify "only 2 items are shown". The idea is to show people two items at a time and have them choose which one they like best. Then show them another two (and another two etc.). The expectation is that as we learn about what a person likes, we can serve more and more relevant items. In an ideal scenario, the person is being shown more and more relevant items until they choose to buy. Is there any reason why Universal Recommender can't handle this?

The dislike/rottentomatoes posts are interesting. This makes sense when the dislike is implicit. In our case, our dislike is implicit i.e. "given a choice between item A and B which do you prefer?" doesn't explicitly mean one item is disliked. But if the more data UR get's the merrier it makes sense that this would work in this case (or at worst be ignored).

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/8d6191cf-d407-4353-b1ef-156fad8c6323%40googlegroups.com.

mat...@aydus.com

unread,

Aug 17, 2016, 1:48:50 PM8/17/16

to actionml-user, mat...@aydus.com, p...@occamsmachete.com

Very insightful and helpful. Then our model is definitely "over fitting" because our choices are recommendations. The video trainer looks really interesting and we'll have to completely re-think our project based on this. After all our UR reading, we never really appreciated the training vs recommendation separation. Note that the video trainer site looks to be down. The account verification link in the email times out.

Pat Ferrel

unread,

Aug 17, 2016, 2:27:04 PM8/17/16

to mat...@aydus.com, actionml-user

it’s up, we just let the cert expire—I’ll fix that today:

On Aug 17, 2016, at 10:48 AM, mat...@aydus.com wrote:

Very insightful and helpful. Then our model is definitely "over fitting" because our choices are recommendations. The video trainer looks really interesting and we'll have to completely re-think our project based on this. After all our UR reading, we never really appreciated the training vs recommendation separation. Note that the video trainer site looks to be down. The account verification link in the email times out.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/e8028ee1-cf73-4859-8e0a-a2c05f444c22%40googlegroups.com.

mat...@aydus.com

unread,

Aug 18, 2016, 10:52:20 AM8/18/16

to actionml-user, mat...@aydus.com

I was thinking of ways to solve the over-fit model problem. My understanding is if training is done with 100% recommendations then the result is an over-fitting model (and bad recommendations). To avoid this, could training be done with a mixture of recommendations, popularity (or other back-fill), and random?

For example, we could train with a blend of data:
1. Back-fill since these are likely to be generally relevant.
2. Random data since this is a requirement for training.

3. New items since we want to push new items into recommendations more quickly than old items.

4. Recommendations since we also want to get an explicit like/dislike on recommended items.

And maybe adjust the mix of these to 40% back-fill, 40% random, 10% new, 20% recommended.

Pat Ferrel

unread,

Aug 18, 2016, 1:22:13 PM8/18/16

to mat...@aydus.com, actionml-user

Training uses events triggered by user actions (or sometimes user attributes) for instance “buy". In any case we expect that some of the events will be because of user’s interacting with a recommendation, if not we did not do our job :-) The recommender optimizes to help users find what to “buy” by capturing all “buy” events and recommending new things to buy. If a user buys a recommendations, that is good data we want, but there must be some other way for a buy to happen.

On a web site there may be 90% of the traffic coming in through Google searching. Then the user browses or searches on the site. Eventually they buy and the event is recorded. As long as this buy, which is not influenced by the recommender can happen then recording all buys is perfectly fine. This case covers 99% of cases since discovery is multi-faceted, not all discivery comes through recommendations. Even Netflix throughs in carousels for “recently added” or “popular on Netflix” etc.

I can’t quite picture your application but if it has other ways for people to find things, like search or browsing by category you are probably fine. Only in the case where you show nothing but recommendations, will you become overfit.

In this last case there is another problem; what do you show the new user? Clearly not recommendations since we know nothing about their preferences. So what will you show?

Using the UR out of the box, a new user will get popular items recommended. but as soon as they take a few actions from then on they will get recommendations. If they only see these recommendations then the only non-recommender-influenced likes will be the first few and these will be self re-enforcing since the user is only ever shown recommendations from then on.

I explain all this because your wording below is odd. To solve this with a UI that shows 2 items and lets the user pick one, you would want to mix in popular items, promoted items, and maybe random items and not show only recommendations. But this is necessary only for an app that has no other form of discovery.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/f1a771cc-cc20-4092-9dc7-a67c68906323%40googlegroups.com.

Federico Reggiani

unread,

Aug 18, 2016, 1:57:20 PM8/18/16

to actionml-user, p...@occamsmachete.com

Can we see in any place the new features of UR 0.4?

Pat Ferrel

unread,

Aug 18, 2016, 4:40:40 PM8/18/16

to Federico Reggiani, actionml-user

No page if that’s what you mean. Request features on the gihub repo or here.

Expected in 0.4.0:

1) itemsets for mixing shopping cart recs with user and item-based recs all using the same data. Shopping cart works in 0.3.0 but requires a separate model from user and item recs

2) will allow custom ranking of items as a fallback from recommendation, only allows popularity now. We will supply random ranking but user can implement their own that will not require events for an item (popularity require events). A custom method might be based on some promotional or commercial value or any other user-defined ranking. A long explanation for making all items recommendable in some way, where items without events are not possible to recommend now.

3) pagination

4) definable thresholds for indicators with small ranges like “gender” with only 2 possible values. Thresholds will allow indicators with small dimensionality to affect recs, they are not very useful now.

#1 and #3 are working in branches #2 is yet to be done and I’m working on #4 today. There will also be some changes to our analysis suite that accompanies the UR that do a little of what is called hyper-parameter search for tuning.

We’re thinking mid-September.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/012827c7-e9e8-46e2-bd69-55d0be21aca3%40googlegroups.com.

mat...@aydus.com

unread,

Aug 19, 2016, 11:37:48 AM8/19/16

to actionml-user, mat...@aydus.com, p...@occamsmachete.com

On Thursday, August 18, 2016 at 10:22:13 AM UTC-7, Pat Ferrel wrote:

Training uses events triggered by user actions (or sometimes user attributes) for instance “buy". In any case we expect that some of the events will be because of user’s interacting with a recommendation, if not we did not do our job :-) The recommender optimizes to help users find what to “buy” by capturing all “buy” events and recommending new things to buy. If a user buys a recommendations, that is good data we want, but there must be some other way for a buy to happen.

***This is clear!

On a web site there may be 90% of the traffic coming in through Google searching. Then the user browses or searches on the site. Eventually they buy and the event is recorded. As long as this buy, which is not influenced by the recommender can happen then recording all buys is perfectly fine. This case covers 99% of cases since discovery is multi-faceted, not all discivery comes through recommendations. Even Netflix throughs in carousels for “recently added” or “popular on Netflix” etc.

***Understood that this is a typical use case for PredictionIO.

I can’t quite picture your application but if it has other ways for people to find things, like search or browsing by category you are probably fine. Only in the case where you show nothing but recommendations, will you become overfit.

***The idea is to show a pair of items. The items are to be as relevant as possible. The goal is to generate a buy from one of the items. To begin, items will be not be relevant. But each indicator we get (e.g. like this, don't like that) allows the two items shown to be more and more relevant.

In this last case there is another problem; what do you show the new user? Clearly not recommendations since we know nothing about their preferences. So what will you show?

***We were thinking popularity but it might be better to start with random (or a blend of popular + random + new).

Using the UR out of the box, a new user will get popular items recommended. but as soon as they take a few actions from then on they will get recommendations. If they only see these recommendations then the only non-recommender-influenced likes will be the first few and these will be self re-enforcing since the user is only ever shown recommendations from then on.

***Understood.

I explain all this because your wording below is odd. To solve this with a UI that shows 2 items and lets the user pick one, you would want to mix in popular items, promoted items, and maybe random items and not show only recommendations. But this is necessary only for an app that has no other form of discovery.

***Thanks! I think we're generally on the same page now. Our app does not have another form of discovery. The trick seems to be to create a model that is able to quickly start making good recommendations. For example, is showing one recommendation for every ten random items a good ratio for training.

Pat Ferrel

unread,

Aug 19, 2016, 1:37:17 PM8/19/16

to mat...@aydus.com, actionml-user

How many items do you have?

You are getting close to a situation that requires sampling choices like the sampling in a multi-armed bandit. If there is no other way for the user to browse or discover items then the 2 you show will always need to include non-recommended items at some rate. This rate will diminish over time and in MABs we use different algorithms for this. If you don’t you will quickly overfit and show boring repetitive items to the users so purchase will increase at first then decrease possibly to 0. So this sampling method is super important or the recommender will eventually be a throttle to sales.

Note that the sampling method should be based on individual users, not just a global % of “shows” that are random or popular. It will show lots of non-recommended things at first, then taper off to show them less often as the user builds up a good history profile. There is science behind this but I can’t say more without a lot more detail.

I think I understand what you are trying to do, which is to ideally show only what the user wants, but the recommender has to have a way to observe changing wants.

Pat Ferrel

unread,

Aug 19, 2016, 1:50:19 PM8/19/16

to Matthew Valenti, actionml-user

BTW a good way to get the recommender trained quickly is what we used in the guide.

We clustered items by user buying patterns so a cluster would include things bought by the same people. Then we sorted the cluster by popularity and chose to show the top few—on the theory that popular items are more likely to be known and preferred by the new user.

The clustering yields collections of items that differ from other clusters by the users who bought them and so should give you items that have the most differentiating value. When the user chooses one they are in effect showing a tendency to have the same taste as the users who bought the clustered items. This should have the effect of training the model quicker than showing random items, though in your case you would want to add random too.

mat...@aydus.com

unread,

Aug 22, 2016, 11:51:45 AM8/22/16

to actionml-user, mat...@aydus.com

On Friday, August 19, 2016 at 10:37:17 AM UTC-7, Pat Ferrel wrote:

How many items do you have?

*** 10,000 but this is arbitrary. We could have as few or as many products as we need to make this model work.

You are getting close to a situation that requires sampling choices like the sampling in a multi-armed bandit. If there is no other way for the user to browse or discover items then the 2 you show will always need to include non-recommended items at some rate. This rate will diminish over time and in MABs we use different algorithms for this. If you don’t you will quickly overfit and show boring repetitive items to the users so purchase will increase at first then decrease possibly to 0. So this sampling method is super important or the recommender will eventually be a throttle to sales.

*** Very interesting! The muli-armed bandit problem looks complex.

Note that the sampling method should be based on individual users, not just a global % of “shows” that are random or popular. It will show lots of non-recommended things at first, then taper off to show them less often as the user builds up a good history profile. There is science behind this but I can’t say more without a lot more detail.

It seems a good sampling method is the key to getting recommendations working for our specific case. Is there any reference material or example code available? Not sure if sampling method algorithms are somewhat standard or vary widely and we'll need to figure this out on our own.

I think I understand what you are trying to do, which is to ideally show only what the user wants, but the recommender has to have a way to observe changing wants.

Yes - exactly and understood there needs to be a balance of recommendations and non-recommendations.

mat...@aydus.com

unread,

Aug 22, 2016, 12:18:34 PM8/22/16

to actionml-user, mat...@aydus.com

On Friday, August 19, 2016 at 10:50:19 AM UTC-7, Pat Ferrel wrote:

BTW a good way to get the recommender trained quickly is what we used in the guide.

We clustered items by user buying patterns so a cluster would include things bought by the same people. Then we sorted the cluster by popularity and chose to show the top few—on the theory that popular items are more likely to be known and preferred by the new user.

*** This is a clever approach.

The clustering yields collections of items that differ from other clusters by the users who bought them and so should give you items that have the most differentiating value. When the user chooses one they are in effect showing a tendency to have the same taste as the users who bought the clustered items. This should have the effect of training the model quicker than showing random items, though in your case you would want to add random too.

*** I'm wondering how to implement something like this? Do any of the current back-fills take into account user properties? It seems like this could be a good way to get a decent segment.

2) will allow custom ranking of items as a fallback from recommendation, only allows popularity now. We will supply random ranking but user can implement their own that will not require events for an item (popularity require events). A custom method might be based on some promotional or commercial value or any other user-defined ranking. A long explanation for making all items recommendable in some way, where items without events are not possible to recommend now.

*** Have been trying to figure out how to use PIO to return random results (and to respect blacklistEvents). Will this be a feature of the upcoming version 0.4.0?

Pat Ferrel

unread,

Aug 22, 2016, 7:12:35 PM8/22/16

to mat...@aydus.com, actionml-user

On Aug 22, 2016, at 9:18 AM, mat...@aydus.com wrote:

On Friday, August 19, 2016 at 10:50:19 AM UTC-7, Pat Ferrel wrote:
BTW a good way to get the recommender trained quickly is what we used in the guide.

We clustered items by user buying patterns so a cluster would include things bought by the same people. Then we sorted the cluster by popularity and chose to show the top few—on the theory that popular items are more likely to be known and preferred by the new user.

*** This is a clever approach.

The clustering yields collections of items that differ from other clusters by the users who bought them and so should give you items that have the most differentiating value. When the user chooses one they are in effect showing a tendency to have the same taste as the users who bought the clustered items. This should have the effect of training the model quicker than showing random items, though in your case you would want to add random too.

*** I'm wondering how to implement something like this? Do any of the current back-fills take into account user properties? It seems like this could be a good way to get a decent segment.

User properties can be used by the recommender as preference indicators. They are often pretty weak so I’d check them against you “buy” indicator with our MAP@k analysis tool. They will require some degree of search for either LLR threshold or # of indicators per user-property. For instance # or indicators of gender should be 1 or an LLR value that is even more restrictive. But it’s a possible help for cold start. I’d also compare them to random or popular with the MAP@k tool. If they are little better than random they may not be worth using.

2) will allow custom ranking of items as a fallback from recommendation, only allows popularity now. We will supply random ranking but user can implement their own that will not require events for an item (popularity require events). A custom method might be based on some promotional or commercial value or any other user-defined ranking. A long explanation for making all items recommendable in some way, where items without events are not possible to recommend now.

*** Have been trying to figure out how to use PIO to return random results (and to respect blacklistEvents). Will this be a feature of the upcoming version 0.4.0?

Yes, 0.4.0, which will also allow user specified ranking of all items, like a commercial or promotion value or whatever. Random will be supplied, the others will be supported by telling the UR which item property to use in fallback ranking then setting that property with $set events.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/da392b7a-66ca-4090-8ec0-f4f310dc2c86%40googlegroups.com.

mat...@aydus.com

unread,

Aug 26, 2016, 7:05:29 PM8/26/16

to actionml-user, mat...@aydus.com

Thanks for your continued help/patience Pat! This has been a lot to think about. Here's our new proposed approach (which is really just a summary of this post):

1. Get as much information about the person as possible to start.

2. Send user events to PIO. This will hopefully help seed/cluster Universal Recommender for better/faster future recommendations. Noted that these may not perform better than random -- we'd have to test this. The MAP@k tool looks really interesting.

To clarify your note: "We clustered items by user buying patterns so a cluster would include things bought by the same people". Does this mean we can somehow query PIO to get a list of clusters?

3. At this stage, PIO has no user recommendations. So start by showing combination of back-fill popular + random. Show 25/75 split. Showing 25% popular items will at least show people some reasnable items to start.

Showing 75% random will allow for required Universal Recommender training -- we want more training up front and we'll reduce this as we get user event data.

In theory, we could do 100% random and just make this step a training step. But I have no idea how to decide how many training items to show before we would get relevant results.

4. At this stage, we have recommendations. So show a blend of random + recommended (+ new?).

5. Over time we'll want to use adjust the sampling rate of random vs recommended. It sounds like this gets complicated quickly. Haven't been able to find a specific algorithm but am thinking of a logarithmic curve that increases recommendations based on the number of item events (for a user). i.e. the more preferences a user has provided -- the better the recommendations -- so increase the number of recommendations.

Questions:

a. Do new items need to be treated in a particular way? If there's a trickle of new items every week or a large number of items added each month -- is it sufficient to show these as part of random shows? Or do we need to get new products into ciriculation quickly and generate a signficant number of events early? The concern is new products (even if they have the potential to be highly popular/recommended) will generally lag behind existing items.

b. I know random items are coming in PIO version 4.0. But if we wanted to build something similar (e.g. new items, promotional items) -- is there any documentation on how to do this? It looks like we need to customize the serving component as described here: http://predictionio.incubator.apache.org/templates/recommendation/customize-serving/

Matthew Valenti

unread,

Aug 29, 2016, 2:10:15 AM8/29/16

to Pat Ferrel, actionml-user

Agreed!

1) Understood that random will only be available as a back-fill. Is it possible to use a query with a dummy user and explicitly set the backfillField? e.g.

{

"user": "xxx",

"backfillField": {

"backfillType": "random"

}

}

If not, is the best way to implement random (with blacklist) by adding custom business logic to the serving component as mentioned here?
http://predictionio.incubator.apache.org/customize/

I did do some reading on MAB but it made zero sense when it came to sampling specifics. Will dig deeper on this.

2) Thanks for highlighting the pitfalls in more detail. I’m hoping to find specific MAB sampling recommendations to follow. While the concept her is simple – the specific mix of items is not.

Even though the app is showing only two items at a time I don’t see that as being a core problem. The first X items shown could be 100% training and then as you recommended – taper off training items and trickle in recommendations. And in theory, if people are not getting good results, put them through another round of training.

But noted that a careful balance here is required.

From: Pat Ferrel [mailto:p...@actionml.com]
Sent: Saturday, August 27, 2016 3:32 PM
To: Matthew Valenti <mat...@aydus.com>
Cc: actionml-user <action...@googlegroups.com>
Subject: Re: Universal Recommender: How to Test Recommendation Results?

I can’t give precise answers without doing calculations and a little research. The code you need will involve some customizing. But I can say that the UR and your app leave you with 2 problems unsolved:

1) how to calculate random or popular sampling with recs. This is being addressed in UR 0.4.0 but only as backfill, not as a diminishing % of recs. If you ask for n recs it fills with recs first, then if it doesn’t have enough, it adds popular items, then if still not enough it adds from random. This IMO does not solve your problem. I would have to do some research on MAB sampling to suggest a specific mix solution. You can do the same.

2) optimizing training. As I understand things you do not have a separate trainer part of the app. If so yo may want the first few choices the user makes to be part of a virtual training session. They would be shown things in the same way as normal, 2 items but the items will be chosen from training items, not recs. So the user wouldn’t know they were training the system. To do what I did on the guide, you have to cluster the input data with a clustering algorithm and I don’t think there is an existing template for PredictionIO though MLlib in Spark has the raw algo. You would need to write your own template. In my case I also added the step of sorting by popularity to show only the top few from each cluster so this would be custom code anyway.

Your app has a heavy reliance on Data Science/ML or it may fail to do what you want and actually do the opposite. You may need to invest in learning these algos more deeply or get a data scientist familiar with ML to spend some time on the solution. These complexities arise from the app UI requirements, showing only 2 things from recs, with no other form of discovery. This is certainly possible but the UI will lead to a degenerate situation if you don't address #1 correctly.

You can get away without training but it will take longer to get good broad data from the user. The training method guarantees broad data. The amount of random/popular mixed in will either lead to less sales than are possible (if you mix in too much) or degenerate into an over fit situation and lead to quickly diminishing sales.

--

You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/dba78592-ae5c-475d-adae-2a011276bca9%40googlegroups.com.

Pat Ferrel

unread,

Aug 29, 2016, 9:08:55 PM8/29/16

to Matthew Valenti, actionml-user

Periodic training may be a good idea on it’s own because it will catch individual’s changing taste or their current interests. It will also solve the overfit problem to some extent but is still not showing unpopular or new items that the user may actually like.

On Aug 29, 2016, at 8:21 AM, Pat Ferrel <p...@actionml.com> wrote:

1) it doesn’t support this yet. You can get all backfill as a cascade, popular first then other forms but you won’t be able to tell which is which. A fairly simple mod so file an enhancement issue on Github, do the mod (a PR would be appreciated), or you can pay us to do it quicker if you need it right away.

2) I shouldn’t have called this a "problem", it poses problems to the machine learning part such as how fast can you learn and only one discovery method needs to avoid overfit. The UI is fine and may be a key app feature. I only meant to say that the UI makes it more important to address certain problems than would be needed witjh a UI like Amazon.

RE sampling, can’t offer much help without more research. Thompson sampling is used. Also search for Bayesian Bandits which use a form of this. Greedy sampling is most often used with MABs and will lead to convergence too-fast to trust (for your case) and has been pointed to as an example of how NOT to do A/B test sampling with MABs

Pat Ferrel

unread,

Aug 29, 2016, 9:08:55 PM8/29/16

to Matthew Valenti, actionml-user

1) it doesn’t support this yet. You can get all backfill as a cascade, popular first then other forms but you won’t be able to tell which is which. A fairly simple mod so file an enhancement issue on Github, do the mod (a PR would be appreciated), or you can pay us to do it quicker if you need it right away.

2) I shouldn’t have called this a "problem", it poses problems to the machine learning part such as how fast can you learn and only one discovery method needs to avoid overfit. The UI is fine and may be a key app feature. I only meant to say that the UI makes it more important to address certain problems than would be needed witjh a UI like Amazon.

RE sampling, can’t offer much help without more research. Thompson sampling is used. Also search for Bayesian Bandits which use a form of this. Greedy sampling is most often used with MABs and will lead to convergence too-fast to trust (for your case) and has been pointed to as an example of how NOT to do A/B test sampling with MABs

On Aug 28, 2016, at 11:10 PM, Matthew Valenti <mat...@aydus.com> wrote:

Pat Ferrel

unread,

Aug 29, 2016, 9:08:55 PM8/29/16

to Matthew Valenti, actionml-user

I can’t give precise answers without doing calculations and a little research. The code you need will involve some customizing. But I can say that the UR and your app leave you with 2 problems unsolved:

1) how to calculate random or popular sampling with recs. This is being addressed in UR 0.4.0 but only as backfill, not as a diminishing % of recs. If you ask for n recs it fills with recs first, then if it doesn’t have enough, it adds popular items, then if still not enough it adds from random. This IMO does not solve your problem. I would have to do some research on MAB sampling to suggest a specific mix solution. You can do the same.

2) optimizing training. As I understand things you do not have a separate trainer part of the app. If so yo may want the first few choices the user makes to be part of a virtual training session. They would be shown things in the same way as normal, 2 items but the items will be chosen from training items, not recs. So the user wouldn’t know they were training the system. To do what I did on the guide, you have to cluster the input data with a clustering algorithm and I don’t think there is an existing template for PredictionIO though MLlib in Spark has the raw algo. You would need to write your own template. In my case I also added the step of sorting by popularity to show only the top few from each cluster so this would be custom code anyway.

Your app has a heavy reliance on Data Science/ML or it may fail to do what you want and actually do the opposite. You may need to invest in learning these algos more deeply or get a data scientist familiar with ML to spend some time on the solution. These complexities arise from the app UI requirements, showing only 2 things from recs, with no other form of discovery. This is certainly possible but the UI will lead to a degenerate situation if you don't address #1 correctly.

You can get away without training but it will take longer to get good broad data from the user. The training method guarantees broad data. The amount of random/popular mixed in will either lead to less sales than are possible (if you mix in too much) or degenerate into an over fit situation and lead to quickly diminishing sales.

On Aug 26, 2016, at 4:05 PM, mat...@aydus.com wrote:

Pat Ferrel

unread,

Sep 8, 2016, 8:51:46 PM9/8/16

to Matthew Valenti, actionml-user

On Sep 5, 2016, at 10:46 AM, Matthew Valenti <mat...@aydus.com> wrote:

2.
As usual, you’re 3 steps ahead (if not 300)! The additional info you provided was very helpful. I think I’ve finally gotten my head around using Bayesian Bandits in this context.
My current thinking is to use 3x arms: recommendations, popularity, random. In theory, it doesn’t matter how many arms. We could also add additional arms: trending, hot, new.
The Bayesian Bandits algorithm will be responsible for selecting items (to the show the user) from each of the arms. A reward in our case would be the user clicking the buy button for an item.
This should result in a good number of popular, random, etc. items being shown to start. As recommendations become stronger, then more recommendations would be shown (and less popular, random, etc. items)
This would also continually create events on non-recommended items. Solving the over train issue.

Yes, but remember this is per individual and during the training period ignore “exploit” feedback. You can try many arms but my intuition (without having seen the app) is to use a small number or the MAB as a whole will take too long to converge. Think of A/B testing, which is one of the more popular uses. Intuition again would lead me to try recs & (popular | trending | hot) & random. I’d pick from the pop models based on your product types. Are they typically showing quick popularity changes—then try hot, if not then use popular for some period—use the method that matches the popularity volatility of your items.

BTW you need to think about flood too, which is showing the same thing over and over. There are protections built into the UR for this but require you to flag items to not show in some way. You can tell the UR to blacklist some items in the query or tell the UR that any item the user has performed certain actions on should not be shown.

For a new user, start by showing ~10-20 items that are random so that we are collecting a good number of events across all items (before we begin showing popular, trending, etc.)
Then perform this type of random training periodically as you’ve suggested.
Segmenting (as you suggested) would be better than random but for phase 1 random will be much easier to implement.

For the initial 10-20 items used to get a sense of user test, it would be better to cluster and pick the items from clusters, this will maximize the differentiation value of the items. And you may want to allow people to perform the primary action without buying. In other words, just ask them if they have bought or would strongly consider buying. If you ask them to buy this may be too high a bar for training and so you may get no data.

1.
Now for the hard part! I’ve spent some time in the code/doco but can’t find a way to do (what I think) you’re recommending here.
Are you saying there’s a way to return one query result that includes popular, trending, hot (and in v0.4 random) items? I can’t find a way to do this.
Ideally, the query would return X num of each. i.e. 1 popular, 1 hot or 2 popular, 2 hot etc.
But let’s say it’s possible. Then the modification needed is to produce results including the backfillType e.g.
{
                "itemScores": [{
                                "item": "i1",
                                "score": 0,
                                "backfillType": "popular"
                }, {
                                "item": "i2",
                                "score": 0,
                                "backfillType": "hot"
                }]
}
I thought that in esClient.scala > search > sr variable: might return where results are coming from. But I can’t relate any fields to the backfillType.
I also thought it might be possible for the query to override the backfillType but that looks like it’s all being setup in the training step (not in the query).

Yes, to use all the filters and boosts most if not all items must be rankable at query time so this is setup at train time. That doesn’t mean it can’t be controlled at query time. You can ask for purely pop model-based recs buy omitting both a user-id and item-id from the query. Also in the result score any item with a score of 0 is from the pop model. But yes, there is only one pop-model allowed and yes, if we did allow more than one the query would need to specify which to use. All doable.

I like the idea of being able to contribute this modification but I’m not sure I’m capable. Who would I talk to about the cost of funding this customization?

Sure send a private email.

Matthew Valenti

unread,

Sep 8, 2016, 8:51:47 PM9/8/16

to Pat Ferrel, actionml-user

2.
As usual, you’re 3 steps ahead (if not 300)! The additional info you provided was very helpful. I think I’ve finally gotten my head around using Bayesian Bandits in this context.

My current thinking is to use 3x arms: recommendations, popularity, random. In theory, it doesn’t matter how many arms. We could also add additional arms: trending, hot, new.
The Bayesian Bandits algorithm will be responsible for selecting items (to the show the user) from each of the arms. A reward in our case would be the user clicking the buy button for an item.
This should result in a good number of popular, random, etc. items being shown to start. As recommendations become stronger, then more recommendations would be shown (and less popular, random, etc. items)
This would also continually create events on non-recommended items. Solving the over train issue.

For a new user, start by showing ~10-20 items that are random so that we are collecting a good number of events across all items (before we begin showing popular, trending, etc.)
Then perform this type of random training periodically as you’ve suggested.
Segmenting (as you suggested) would be better than random but for phase 1 random will be much easier to implement.

1.

Now for the hard part! I’ve spent some time in the code/doco but can’t find a way to do (what I think) you’re recommending here.
Are you saying there’s a way to return one query result that includes popular, trending, hot (and in v0.4 random) items? I can’t find a way to do this.
Ideally, the query would return X num of each. i.e. 1 popular, 1 hot or 2 popular, 2 hot etc.
But let’s say it’s possible. Then the modification needed is to produce results including the backfillType e.g.

{
                "itemScores": [{
                                "item": "i1",
                                "score": 0,
                                "backfillType": "popular"
                }, {
                                "item": "i2",
                                "score": 0,
                                "backfillType": "hot"
                }]
}

I thought that in esClient.scala > search > sr variable: might return where results are coming from. But I can’t relate any fields to the backfillType.
I also thought it might be possible for the query to override the backfillType but that looks like it’s all being setup in the training step (not in the query).

I like the idea of being able to contribute this modification but I’m not sure I’m capable. Who would I talk to about the cost of funding this customization?

Pat Ferrel

unread,

Sep 17, 2016, 11:07:32 AM9/17/16

to Matthew Valenti, actionml-user

Something very similar is already slated for UR v0.4.0 to be released next week. The sampling is not included for and would be the thing to add.

On Sep 16, 2016, at 7:19 PM, Matthew Valenti <mat...@aydus.com> wrote:

Hi Pat, just following up on this. Who would I talk to about funding this customization? Thanks!

1) it doesn’t support this yet. You can get all backfill as a cascade, popular first then other forms but you won’t be able to tell which is which. A fairly simple mod so file an enhancement issue on Github, do the mod (a PR would be appreciated), or you can pay us to do it quicker if you need it right away.

---

Matthew Valenti

unread,

Sep 17, 2016, 6:10:03 PM9/17/16

to Pat Ferrel, actionml-user

Thanks Pat! That’s very exciting. Will upgrade to v4 and check it out! Thanks! Matthew

gcs...@gmail.com

unread,

Sep 23, 2016, 2:31:39 PM9/23/16

to actionml-user, p...@actionml.com, mat...@aydus.com

I'm thinking of ways to build artificial training and testing datasets. I have to dust off my statistics books, but the idea is to figure out some statistical properties a priori and generate the datasets. I was going to take a look at some Markov chains.
Any suggestions on how probability theory can help? Any recommendation of packages (R?)

Thanks

Gustavo

Pat Ferrel

unread,

Sep 24, 2016, 11:58:53 AM9/24/16

to gcs...@gmail.com, actionml-user, mat...@aydus.com

What would be the purpose of the artificial datasets? There a some existing reals datasets but they very seldom come with multiple events.

Gustavo Frederico

unread,

Sep 24, 2016, 1:44:18 PM9/24/16

to Pat Ferrel, actionml-user, mat...@aydus.com

I couldn't find real datasets with purchase events. Did you find any?

Gustavo

Pat Ferrel

unread,

Sep 25, 2016, 11:39:36 AM9/25/16

to Gustavo Frederico, actionml-user, mat...@aydus.com

There is one called epinions that has 2 events. It is from an ancient dating site where people “liked" profiles and “trusted” people’s opinions. It’s hosted in a couple places on the web—google it. We use it in Mahout to test cross-ocurrence. You can feed it into the UR with something like the Python SDK. We also have private datasets that we’ve gotten the rights to use but can’t share.

What use do you have for this?

gcs...@gmail.com

unread,

Sep 25, 2016, 11:25:05 PM9/25/16

to actionml-user, mat...@aydus.com

Well, the PIO+UR sample dataset seemed a bit small and simple. I was looking for more evidence that "the thing works" since I don't have access to real data. If I knew some bias parameters (some user-product correlation measurements) in the artificial dataset, I could have either an empirical evaluation of recommendations or run analysis-tools.

Has anyone run analysis-tools on the PIO+UR sample dataset? I know it's small, but also e-commerce data tends to be sparse.
Or maybe someone can answer: in a real dataset, what's an approximate overall ratio of the view per buy events?

Thanks

Gustavo

Kristin Sulap

unread,

Jan 24, 2024, 4:05:57 AMJan 24

to actionml-user

✅🔴▶️▶ Really Amazing ️You Can Try This ◀️◀️🔴✅

✅▶️▶️ CLICK HERE Full HD✅720p✅1080p✅4K✅

WATCH ✅💻📺📱👉https://co.fastmovies.org

ᗪOᗯᑎᒪOᗩᗪ ✅📺📱💻👉https://co.fastmovies.org

🔴WATCH>>ᗪOᗯᑎᒪOᗩᗪ>>LINK>👉https://co.fastmovies.org

✅WATCH>>ᗪOᗯᑎᒪOᗩᗪ>>LINK>👉https://co.fastmovies.org

💚WATCH>>ᗪOᗯᑎᒪOᗩᗪ>>LINK>👉https://co.fastmovies.org

🔴💚 Really Amazing ️You Can Try This💚ᗪOᗯᑎᒪOᗩᗪ LINK >👉https://co.fastmovies.org

🔴💚CLICK HERE Full HD✅720p✅1080p✅4K💚WATCH💚ᗪOᗯᑎᒪOᗩᗪ LINK >👉https://co.fastmovies.org

Reply all

Reply to author

Forward