Graph based recommendation engine

215 views
Skip to first unread message

Michael

unread,
Nov 20, 2017, 3:36:18 AM11/20/17
to ArangoDB
Hi,

I am trying to model a simple search/recommendation engine for finding property listings based on price and location but also on how important these two variables are to the person searching. 
So the idea is that even if I say I want a lower price but value location more it will recommend a property that is close to my location but maybe more expensive than I want. A sort of fuzzy search.

From what I understand a graph db should be very well suited for this?

Any feedback that can point me in the right direction would be much appreciated!

Thanks!
Michael

Wilfried Gösgens

unread,
Nov 21, 2017, 8:32:22 AM11/21/17
to ArangoDB
Hi,
a general understanding of what graphs are and what can be done with them can be found here: https://docs.arangodb.com/3.2/Manual/Graphs/#coming-from-a-relational-background---whats-a-graph

The ArangoDB pattern matching algorithm offers a wide range of tools to achieve what you are searching for.

You may find this case study interesting, as its discussing exactly your topic:
https://arangodb.com/why-arangodb/case-studies/aboutyou-data-driven-personalization-with-arangodb/

Cheers,
Willi

Michael

unread,
Nov 21, 2017, 9:05:38 AM11/21/17
to ArangoDB
Hi Willi,

Thanks for taking your time to reply!

That use case is indeed interesting but as they don't go into specifics it doesn't help me that much.

From what I have understood so far is that I should do a content-based filtering approach where each listing attribute is a vertex. I can then weight users relationship with these attributes and get some sort of listing score/distance.
So if London is an important location to the user it will get a higher score. But I guess I would have to create relationships with all neighbouring locations as well with a lower weight for these to also be "recommended"?
I'm not sure how I would do this approach when it comes to prices? Maybe create a price vertex where I would create "price interval" docs that have weighted relationships with the user depending on how close the price is to what the user wants?

I haven't found that many examples online so any advice is much appreciated! Most examples always do a simple collaborative based approach using social interactions.

Thanks!
Michael

Simran Brucherseifer

unread,
Nov 22, 2017, 12:08:49 PM11/22/17
to ArangoDB
Hi Michael,

while I don't have ready-made instructions how to model your use case with a graph and what implication the design may bring with it, I think that you should take a look at our learning material about ArangoDB (not necessarily limited to the graph aspects).
Once you know what ArangoDB offers, you can choose the right tool for the job.

For example, another user uses aggregation as first step towards a type of recommendation:


You can learn about graph basics with our graph course:

More details about AQL graph traversal is in our documentation:

Especially see the traversal options under the Syntax headline:

There's also Pregel since 3.2:

Also see our performance course however, because it covers the basics about indexing, as well as a comparison of different data models (using the example dataset):

A course specifically about data modeling is definitely on our shortlist and the use case "recommendation engine" was something we thought of already. Don't expect it over the near-term though.

Some thoughts about your data modeling ideas:

Are prices proper entities in such a model? A price feels more like a vertex attribute of places for rent, but you wouldn't want to connect apartments with searching users.
Prices as weight attributes of edges seems like an interesting ideas, but wouldn't that as well mean to create a lot of edges between apartments and locations?
I guess one could store the actual price in the apartment vertex, and link price range vertices to them (so a limited number of price vertices).

But is that really required for a fuzzy search? Couldn't you start either with a geospatial search, then filter / limit / sort the results based on price OR filter by price, then sort based on distance?
If indexes can be utilized, it should not be a problem to issue additional requests for matches from the next higher price segment, or a larger radius.

You could slightly increase the price range or a bit larger search radius than entered by the user to allow for some fuzziness in general, which might already accomplish what you aim for.

Regarding locations, are they really point-based? In large cities, you would probably want to search for offers in certain districts, which are rather polygons (with holes?). On the other hand, if you use center point + radius to represent a district geographically, you already add some fuzziness.

Best,
Simran

Michael

unread,
Nov 27, 2017, 7:55:24 AM11/27/17
to ArangoDB
Hi Simran,

Thank you very much for that comprehensive reply!

I will check out all your links. A course on modelling is a good idea as ArangodDB is multi-model and therefore naturally presents interesting new use cases and applications.

Your thoughts on my modelling ideas clarify why I got a bit stuck on how to model this scenario as it’s not really a natural fit.
I totally agree that price makes more sense as a vertex attribute. However in the context of a recommendation engine I saw some models online where the attributes are vertexes with weights on the edges and thought there might be some clever model that would work, but maybe this doesn’t work with prices and locations as these are more factual properties and can’t/shouldn't really be weighted. As you point out searching users shouldn’t be connected to apartments. Maybe your ideas for a fuzzy search is better where I i.e. increase the price range if a user is less interested in price or vice versa increase the radius if price is more important.

You are correct that locations aren’t really point based. Apartments would belong to a polygon district or just a simple tag/category. In theory if a location is close to another district I could also add it to another district but with a lower weight, but then I might as well just use point based search.

I guess my use case probably makes more sense to use a graph based recommendations engine if I start using views, matches etc or types of apartments/features to base my recommendations on?

Thanks once again for your feedback!
Michael

Simran Brucherseifer

unread,
Dec 1, 2017, 8:45:41 AM12/1/17
to ArangoDB
Apartments would belong to a polygon district ...

You might wanna use this AQL function, but be sure to combine it with a geo-index accelerated filter
(I guess the min and max values of x and y of all polygon points for a bounding box?):

... or just a simple tag/category.

That would be better performance-wise, as it could use a hash (array) index.

Something to consider: if you wanted to search for a user-specified district, but also include neighboring districts, then a graph to store district relations could come in handy.


Of course, you need to have this polygon data and the relationships. A simple radius based search does not require any extra data.

If you track things like views, matches or how many types of apartments/features there are, then there are probably many ways to calculate an edge weight based on these numbers.
But you could also store some kind of ranking based on that per apartment offer and use it an a document-style rather than a graphy query.
Reply all
Reply to author
Forward
0 new messages