Real World Performance Status

34 views
Skip to first unread message

Andrew Woods

unread,
Aug 15, 2016, 11:33:15 PM8/15/16
to fedor...@googlegroups.com

Hello All,

You may be aware of recent discussions relating to performance implications of certain modeling techniques for collections of resources. Specifically, the issue arises in the scenario where a Fedora resource has references to a large number of “member” resources; where “large” is defined as 1000+.


Details of the scenario and performance tests are available on the wiki:


The intent of this message is to highlight the status of three potential solution approaches.


1. Reverse membership relationships

This is a client-side solution that recommends modeling membership relationships from the member to the collection (memberOf) instead of creating relationships from the collection to its members (hasMember). The advantage with this approach is that all of the members are not loaded when the collection is requested. However, all of the members can still be retrieved by requesting the collection with an optional header indicating inclusion of “inbound references” [1].


There are two pull requests from Esmé Cowles that introduce this into the Hydra layer:

Although this approach represents a sound modeling technique, it may not be required if one of the following two solutions are realized.


2. Minimize loading members

This is a solution within the Fedora codebase that would optimize performance of retrieving references to members without incurring the overhead of actually loading each of the members from the underlying ModeShape (JCR).


There is an early draft from Ben Armintor that begins to explore this option:

Related Jira Ticket:


3. ModeShape queries

This is a solution within the Fedora codebase that would change the approach for accessing members from traversing references and loading the underlying resources to making targeted, optimized queries to the underlying ModeShape (JCR). It is worth noting that this approach would also benefit the collecting of “inbound references” mentioned in approach #1.


There is a very early draft from Ben Armintor that begins to explore this option:


This is an important issue that will require attention from more of the community. If you are interested and available to help move solution 2 or 3 forward, please respond.


Regards,

Andrew

[1] https://wiki.duraspace.org/display/FEDORA4x/RESTful+HTTP+API+-+Containers#RESTfulHTTPAPI-Containers-GETRetrievethecontentoftheresource


Tom Johnson

unread,
Aug 16, 2016, 12:12:29 AM8/16/16
to fedor...@googlegroups.com
Thanks for this update, Andrew.

PR #1086 looks like a very substantial refactor, but I'm wondering whether there's any downside. If there's a way to avoid costly and wasted loads of JCR nodes in resolving references, that seems like a good optimization to pursue.

Likewise for solution 3, though I'm even less certain I understand the implications of that one.

I guess my point is: +1 for work that improves the low level characteristics of the system, even as we pursue client-side solutions to present day bottlenecks.

- Tom

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech+unsubscribe@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at https://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.



--
-Tom Johnson
Reply all
Reply to author
Forward
0 new messages