Pagination over sorted 1-N relationship

97 views
Skip to first unread message

Andrey Verbin

unread,
May 18, 2015, 10:58:17 AM5/18/15
to orient-...@googlegroups.com
Hi there!

Let say I'm building something like Twitter. There is a User with 20000 followers modeled as a relationships between User and Follow classes. Every user reference N Follow instances which in turn has a link to follower user and also creation date. I want to display paginated list of followers but not sure how to do it with OrientDB. What would be the best way to display paginated list of such followers sorted by creation date? 

Thanks

ste...@activitystream.com

unread,
May 19, 2015, 10:02:06 AM5/19/15
to orient-...@googlegroups.com
Hi,

As far as I know there is no way to do that without:
  - Scanning all the classes incoming edges
  - Fetch the edges + vertexes
  - Keep them all in memory while they are being sorted
  - Return the top n edges/vertexes 

This happens behind the scenes and is not problematic on the query side but it is on the performance side.

This is one of the issues we have come across when working with dense graphs and the changes, needed to fix this, are on the backlog.

I would love fore someone to tell me that I'm terribly wrong here.

Regards,
 -Stefan

Andrey Verbin

unread,
May 19, 2015, 11:19:52 AM5/19/15
to orient-...@googlegroups.com
Hi Stefan, 

I was thinking that one way I can do it is to create sorted index on (user_id, follower_id, date_created) and then select PAGE_SIZE entries form it. Rest is easy - fetch selected entries and pass them to web page for rendering.  I’m still not sure if this is practical so any advice from community will be helpful.

Thanks,
Andrey

On 19 мая 2015 г., at 17:02, ste...@activitystream.com wrote:

Hi,

As far as I know there is no way to do that without:
  - Scanning all the "incoming classes" edges

--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/5HXM9ysEIsE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stefán

unread,
May 19, 2015, 12:03:18 PM5/19/15
to orient-...@googlegroups.com
Hi,

You can, if all the information exist on the edge, create that index on in, date_created (the composition you propose will not give all the speed you could get).

I though you were talking about the date_created belonging to the vertex (rather than the edge) but I understand now that you are talking about the date that the relationship was created.

Regards,
 -Stefán

Andrey Verbin

unread,
May 19, 2015, 8:33:35 PM5/19/15
to orient-...@googlegroups.com
Hi,

I document here my findings about pagination over relations. There are four 1-N relation types: LINKMAP, LINKSET, LINKLIST and LINKBAG. They seems to be stored as a list of rids right inside vertex/document (which is cool). LINKMAP, LINKSET, LINKLIST
are loaded all at once when you ask for corresponding document/vertex fields - this makes them unusable to store large number of entries. BTW in my testing I found that 10000 links to empty documents takes only 170KB. LINKBAG on the other side can be partially loaded and potentially contain infinite number of entries.

Unfortunately all these goodies completely useless when you want to display paginated list. MAP, SET and LIST cannot be used since you will have to load entire collection, sort it and only then display some small portion of it. LINKBAG does not maintain order so it is also useless. The only method which I can think of is to use “relational” way of doing things.
1. Add “foreign key” aka "one directional link" from child to parent. CREATE LINK parent FROM Child.parentId TO Parent.rid
2. Create SB-Tree index on “child” vertex/document which will store entries in correct order. Something like this: CREATE INDEX xyz ON Child (ParentId, SortKey) NOT UNIQUE
Since this is SB-Tree index hopefully index data will be sorted by “ParentId" and then “SortKey" which is exactly the order we need to display in a web page.
3. Select child entries from index which match your parent and ordering/sorting criteria. SELECT FROM INDEX:xyz WHERE ParentId = ? and SortKey > ? LIMIT PAGE_SIZE


On 19 мая 2015 г., at 19:03, Stefán <ste...@gmail.com> wrote:

Hi,

You can, if all the information exist on the edge, create that index on out, date_created (the composition you propose will not give all the speed you could get).

ste...@activitystream.com

unread,
May 20, 2015, 3:48:56 AM5/20/15
to orient-...@googlegroups.com
Hi,

Does this mean you are not using graph and edges to connect between these?
And if so then why not?

Regards,
 -Stefan
To unsubscribe from this group and all its topics, send an email to orient-database+unsub...@googlegroups.com.

Andrey Verbin

unread,
May 20, 2015, 4:55:03 AM5/20/15
to orient-...@googlegroups.com
Hi,

I use document api (ODocument and friends). Graph vs Document vs Object api  very confusing for a beginner like me as every API has it trade offs which are not clear. I opted for the most powerful API available which I hope is Document API.

For my scenario Edges add overhead and don't bring any value. Since I need sorted list of child records, I always access child records thru index as this is the only way to get them sorted. So I don't need Edges to connect my nodes, they already connected thru index. 

Thanks


20 мая 2015 г., в 10:48, ste...@activitystream.com написал(а):

Reply all
Reply to author
Forward
0 new messages