Performance issues with creating objects via RDF and REST

65 views
Skip to first unread message

Peter Eichman

unread,
Dec 4, 2014, 1:44:37 PM12/4/14
to fedor...@googlegroups.com
Hello all,

I have been testing a prototype of a simple batch loader that creates objects in a Fedora 4 repository by POSTing Turtle representations of the RDF. I have been running it with groups of 1000, and I have noticed that the time it takes to create the object in Fedora appears to be increasing roughly logarithmically with the number of objects already in the repository. With a fresh, practically empty repository, 100 objects are created in 5-10 seconds. However, by the time there are 4000 objects in the repository, each group of 100 takes roughly 3-4 minutes to create.

Each batch of 1000 has its own parent container resource. Is this behavior of Fedora taking longer to create an object the more objects there are in the repository to be expected? Each of the RDF resources I am POSTing has multiple blank nodes, which Fedora is instantiating as /.well-known/genid/{uuid}. Could this be a source of the slowdown?

Thanks,
-Peter

Andrew Woods

unread,
Dec 4, 2014, 2:27:23 PM12/4/14
to fedor...@googlegroups.com, Mohamed Mohideen
Hello Peter,
Quite likely the slowdown you are seeing is a result of the blanknodes all being created under the same resource: /.well-known/genid
I just scripted the creation of 43,000 containers (naked, no RDF bodies) with no noticeable performance degradation. 
Are you in a position to ingest your containers without blanknodes, as a test?
It would also be very helpful if you could create an integration test that demonstrates the issue so that we can prove it has been resolved (once it is resolved). *Mohamed, can you help with the IT?
Thanks,
Andrew

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at http://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

Peter Matthew Eichman

unread,
Dec 4, 2014, 3:00:39 PM12/4/14
to fedor...@googlegroups.com, Mohamed Mohideen Abdul Rasheed
Hello Andrew,

Yes, I was able to run a quick test where I just stripped out the predicates featuring blank nodes, and performance is back up to ~3 seconds to create 100 objects. I am now working on refactoring some of the test data to have the same number of predicates, by replacing the blank nodes with named nodes.

Are there any guidelines in the documentation about creating integration tests?

Thanks,
-Peter

Andrew Woods

unread,
Dec 4, 2014, 7:18:57 PM12/4/14
to fedor...@googlegroups.com, Mohamed Mohideen Abdul Rasheed
Hello Peter,
You should be able to use FedoraLdpIT.java as an example:

You will basically want to create a new class in the fcrepo-http-api module, named <whatever>IT.java, that extends AbstractResourceIT.java.
That should do it. 

We have a fairly thin wiki page documenting the project's testing practices. It would be fantastic if you could help include some notes on this page based on your experience writing this new integration test:

Regards,
Andrew

Peter Matthew Eichman

unread,
Dec 9, 2014, 3:30:24 PM12/9/14
to fedor...@googlegroups.com, Mohamed Mohideen Abdul Rasheed
Andrew,

I am currently working on implementing that test. When I am done, what is the procedure for contributing back to the fcrepo4 project? Would I fork and issue a pull request?

Peter Matthew Eichman

unread,
Dec 10, 2014, 2:18:45 PM12/10/14
to fedor...@googlegroups.com, Mohamed Mohideen Abdul Rasheed
Okay, my current implementation of the test has it attempting to create 5000 new container resources, each with 13 blank nodes, and catching any IOException thrown by the HTTP client and reporting that as a test failure. (Currently, it fails on my dev machine after about 3800 resources are created.)

So basically, it is an exhaustion test, trying to get the Fedora server to fail due too too many blank nodes being created. Is this the right strategy? Also, it takes a while to run (about 6 minutes on my machine); how much of a concern is this?

Thanks,
-Peter

Andrew Woods

unread,
Jan 23, 2015, 6:12:29 PM1/23/15
to Peter Eichman, fedor...@googlegroups.com
Hello Peter,
I wanted to circle back on this thread, as there have been some recent developments. 
The primary issue that I believe you were running into was (as you suspected) related to the performance impacts of Fedora4 automatically generating thousands of *direct* children off of the same parent resource when creating blanknodes. This issue is being addressed with the following ticket and should be completed early next week.

Testing against a preliminary fix to FCREPO-1258, I was still running into the errors that you noted with the integration test offered in:

It turns out that the test itself, is exhausting socket resources which causes the failure. I therefore ran an analogous test with 'curl' looping on the creation of new resources that include the mods-rdf that you provided. Here is the script that I ran (where "mods.ttl" is your file):
=================
time for x in {0..50000}; do echo "--- $x"; curl -s -XPOST --data-binary @mods.ttl -H"Content-Type: text/turtle" localhost:8080/fcrepo/rest/test/collection0; done
=================

After the creation of about 40,000 resources (which amounts to 40,000 resources plus 520,000 blanknodes) performance began to degrade, but continues without failure. I believe that is an issue completely unrelated to blanknode creation, and is being tracked here:

In short, once FCREPO-1258 is resolved, I hope the blanknode performance issue you were seeing will have been improved if not resolved. I look forward to your feedback.
Regards,
Andrew


Reply all
Reply to author
Forward
0 new messages