{
"user": "u0",
"fields": [
{
"name": "categories",
"values": ["shoes"],
"bias": 2
}
]
}
{
"comment":" This config file uses default settings for all but the required values see README.md for docs",
"id": "default",
"description": "Default settings",
"engineFactory": "org.template.RecommendationEngine",
"datasource": {
"params" : {
"name": "sample-daydream-data.txt",
"appName": "DaydreamUniversal",
"eventNames": ["buy", "view", "like"]
}
},
"sparkConf": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"spark.executor.memory": "4g",
"es.index.auto.create": "true"
},
"algorithms": [
{
"comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
"name": "ur",
"params": {
"appName": "DaydreamUniversal",
"indexName": "urindex",
"typeName": "items",
"eventNames": ["buy", "view", "like"],
"blacklistEvents": ["view"],
"num": 2,
"backfillField": {
"eventNames": ["buy", "like"]
}
}
}
]
}
Training uses events triggered by user actions (or sometimes user attributes) for instance “buy". In any case we expect that some of the events will be because of user’s interacting with a recommendation, if not we did not do our job :-) The recommender optimizes to help users find what to “buy” by capturing all “buy” events and recommending new things to buy. If a user buys a recommendations, that is good data we want, but there must be some other way for a buy to happen.
On a web site there may be 90% of the traffic coming in through Google searching. Then the user browses or searches on the site. Eventually they buy and the event is recorded. As long as this buy, which is not influenced by the recommender can happen then recording all buys is perfectly fine. This case covers 99% of cases since discovery is multi-faceted, not all discivery comes through recommendations. Even Netflix throughs in carousels for “recently added” or “popular on Netflix” etc.
I can’t quite picture your application but if it has other ways for people to find things, like search or browsing by category you are probably fine. Only in the case where you show nothing but recommendations, will you become overfit.
In this last case there is another problem; what do you show the new user? Clearly not recommendations since we know nothing about their preferences. So what will you show?
Using the UR out of the box, a new user will get popular items recommended. but as soon as they take a few actions from then on they will get recommendations. If they only see these recommendations then the only non-recommender-influenced likes will be the first few and these will be self re-enforcing since the user is only ever shown recommendations from then on.
I explain all this because your wording below is odd. To solve this with a UI that shows 2 items and lets the user pick one, you would want to mix in popular items, promoted items, and maybe random items and not show only recommendations. But this is necessary only for an app that has no other form of discovery.
How many items do you have?
You are getting close to a situation that requires sampling choices like the sampling in a multi-armed bandit. If there is no other way for the user to browse or discover items then the 2 you show will always need to include non-recommended items at some rate. This rate will diminish over time and in MABs we use different algorithms for this. If you don’t you will quickly overfit and show boring repetitive items to the users so purchase will increase at first then decrease possibly to 0. So this sampling method is super important or the recommender will eventually be a throttle to sales.
Note that the sampling method should be based on individual users, not just a global % of “shows” that are random or popular. It will show lots of non-recommended things at first, then taper off to show them less often as the user builds up a good history profile. There is science behind this but I can’t say more without a lot more detail.
I think I understand what you are trying to do, which is to ideally show only what the user wants, but the recommender has to have a way to observe changing wants.
BTW a good way to get the recommender trained quickly is what we used in the guide.We clustered items by user buying patterns so a cluster would include things bought by the same people. Then we sorted the cluster by popularity and chose to show the top few—on the theory that popular items are more likely to be known and preferred by the new user.
The clustering yields collections of items that differ from other clusters by the users who bought them and so should give you items that have the most differentiating value. When the user chooses one they are in effect showing a tendency to have the same taste as the users who bought the clustered items. This should have the effect of training the model quicker than showing random items, though in your case you would want to add random too.
2) will allow custom ranking of items as a fallback from recommendation, only allows popularity now. We will supply random ranking but user can implement their own that will not require events for an item (popularity require events). A custom method might be based on some promotional or commercial value or any other user-defined ranking. A long explanation for making all items recommendable in some way, where items without events are not possible to recommend now.
On Aug 22, 2016, at 9:18 AM, mat...@aydus.com wrote:
On Friday, August 19, 2016 at 10:50:19 AM UTC-7, Pat Ferrel wrote:BTW a good way to get the recommender trained quickly is what we used in the guide.We clustered items by user buying patterns so a cluster would include things bought by the same people. Then we sorted the cluster by popularity and chose to show the top few—on the theory that popular items are more likely to be known and preferred by the new user.*** This is a clever approach.The clustering yields collections of items that differ from other clusters by the users who bought them and so should give you items that have the most differentiating value. When the user chooses one they are in effect showing a tendency to have the same taste as the users who bought the clustered items. This should have the effect of training the model quicker than showing random items, though in your case you would want to add random too.*** I'm wondering how to implement something like this? Do any of the current back-fills take into account user properties? It seems like this could be a good way to get a decent segment.
2) will allow custom ranking of items as a fallback from recommendation, only allows popularity now. We will supply random ranking but user can implement their own that will not require events for an item (popularity require events). A custom method might be based on some promotional or commercial value or any other user-defined ranking. A long explanation for making all items recommendable in some way, where items without events are not possible to recommend now.*** Have been trying to figure out how to use PIO to return random results (and to respect blacklistEvents). Will this be a feature of the upcoming version 0.4.0?
--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/da392b7a-66ca-4090-8ec0-f4f310dc2c86%40googlegroups.com.
1) Understood that random will only be available as a back-fill. Is it possible to use a query with a dummy user and explicitly set the backfillField? e.g.
{
"user": "xxx",
"backfillField": {
"backfillType": "random"
}
}
If not, is the best way to implement random (with blacklist) by adding custom business logic to the serving component as mentioned here?
http://predictionio.incubator.apache.org/customize/
I did do some reading on MAB but it made zero sense when it came to sampling specifics. Will dig deeper on this.
2) Thanks for highlighting the pitfalls in more detail. I’m hoping to find specific MAB sampling recommendations to follow. While the concept her is simple – the specific mix of items is not.
Even though the app is showing only two items at a time I don’t see that as being a core problem. The first X items shown could be 100% training and then as you recommended – taper off training items and trickle in recommendations. And in theory, if people are not getting good results, put them through another round of training.
But noted that a careful balance here is required.
From: Pat Ferrel [mailto:p...@actionml.com]
Sent: Saturday, August 27, 2016 3:32 PM
To: Matthew Valenti <mat...@aydus.com>
Cc: actionml-user <action...@googlegroups.com>
Subject: Re: Universal Recommender: How to Test Recommendation Results?
I can’t give precise answers without doing calculations and a little research. The code you need will involve some customizing. But I can say that the UR and your app leave you with 2 problems unsolved:
1) how to calculate random or popular sampling with recs. This is being addressed in UR 0.4.0 but only as backfill, not as a diminishing % of recs. If you ask for n recs it fills with recs first, then if it doesn’t have enough, it adds popular items, then if still not enough it adds from random. This IMO does not solve your problem. I would have to do some research on MAB sampling to suggest a specific mix solution. You can do the same.
2) optimizing training. As I understand things you do not have a separate trainer part of the app. If so yo may want the first few choices the user makes to be part of a virtual training session. They would be shown things in the same way as normal, 2 items but the items will be chosen from training items, not recs. So the user wouldn’t know they were training the system. To do what I did on the guide, you have to cluster the input data with a clustering algorithm and I don’t think there is an existing template for PredictionIO though MLlib in Spark has the raw algo. You would need to write your own template. In my case I also added the step of sorting by popularity to show only the top few from each cluster so this would be custom code anyway.
Your app has a heavy reliance on Data Science/ML or it may fail to do what you want and actually do the opposite. You may need to invest in learning these algos more deeply or get a data scientist familiar with ML to spend some time on the solution. These complexities arise from the app UI requirements, showing only 2 things from recs, with no other form of discovery. This is certainly possible but the UI will lead to a degenerate situation if you don't address #1 correctly.
You can get away without training but it will take longer to get good broad data from the user. The training method guarantees broad data. The amount of random/popular mixed in will either lead to less sales than are possible (if you mix in too much) or degenerate into an over fit situation and lead to quickly diminishing sales.
--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
actionml-use...@googlegroups.com.
To post to this group, send email to
action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/dba78592-ae5c-475d-adae-2a011276bca9%40googlegroups.com.
My current thinking is to use 3x arms: recommendations, popularity, random. In theory, it doesn’t matter how many arms. We could also add additional arms: trending, hot, new.
The Bayesian Bandits algorithm will be responsible for selecting items (to the show the user) from each of the arms. A reward in our case would be the user clicking the buy button for an item.
This should result in a good number of popular, random, etc. items being shown to start. As recommendations become stronger, then more recommendations would be shown (and less popular, random, etc. items)
This would also continually create events on non-recommended items. Solving the over train issue.
For a new user, start by showing ~10-20 items that are random so that we are collecting a good number of events across all items (before we begin showing popular, trending, etc.)
Then perform this type of random training periodically as you’ve suggested.
Segmenting (as you suggested) would be better than random but for phase 1 random will be much easier to implement.
1.
Now for the hard part! I’ve spent some time in the code/doco but can’t find a way to do (what I think) you’re recommending here.
Are you saying there’s a way to return one query result that includes popular, trending, hot (and in v0.4 random) items? I can’t find a way to do this.
Ideally, the query would return X num of each. i.e. 1 popular, 1 hot or 2 popular, 2 hot etc.
But let’s say it’s possible. Then the modification needed is to produce results including the backfillType e.g.{
"itemScores": [{
"item": "i1",
"score": 0,
"backfillType": "popular"
}, {
"item": "i2",
"score": 0,
"backfillType": "hot"
}]
}I thought that in esClient.scala > search > sr variable: might return where results are coming from. But I can’t relate any fields to the backfillType.
I also thought it might be possible for the query to override the backfillType but that looks like it’s all being setup in the training step (not in the query).
I like the idea of being able to contribute this modification but I’m not sure I’m capable. Who would I talk to about the cost of funding this customization?
My current thinking is to use 3x arms: recommendations, popularity, random. In theory, it doesn’t
matter how many arms. We could also add additional arms: trending, hot, new.
The Bayesian Bandits algorithm will be responsible for selecting items (to the show the user) from each of the arms. A reward in our case would be the user clicking the buy button for an item.
This should result in a good number of popular, random, etc. items being shown to start. As recommendations become stronger, then more recommendations would be shown (and less popular, random, etc. items)
This would also continually create events on non-recommended items. Solving the over train issue.
For a new user, start by showing ~10-20 items that are random so that we are collecting a good number of events across all items (before we begin showing popular, trending, etc.)
Then perform this type of random training periodically as you’ve suggested.
Segmenting (as you suggested) would be better than random but for phase 1 random will be much easier to implement.
1.
Now for the hard part! I’ve spent some time in the code/doco but can’t find a way to do (what I think) you’re recommending here.
Are you saying there’s a way to return one query result that includes popular, trending, hot (and in v0.4 random) items? I can’t find a way to do this.
Ideally, the query would return X num of each. i.e. 1 popular, 1 hot or 2 popular, 2 hot etc.
But let’s say it’s possible. Then the modification needed is to produce results including the backfillType e.g.
{
"itemScores": [{
"item": "i1",
"score": 0,
"backfillType": "popular"
}, {
"item": "i2",
"score": 0,
"backfillType": "hot"
}]
}
I thought that in esClient.scala > search > sr variable: might return where results are coming from.
But I can’t relate any fields to the backfillType.
I also thought it might be possible for the query to override the backfillType but that looks like it’s all being setup in the training step (not in the query).
I like the idea of being able to contribute this modification but I’m not sure I’m capable. Who would I talk to about the cost of funding this customization?