Performance / Pairtree

25 views
Skip to first unread message

Kilian Amrhein

unread,
Jan 18, 2021, 12:52:20 PM1/18/21
to Fedora Tech
Hello All,

we're currently about to start feeding data in our 5.1.1 Fedora Repository. 
As we are storing data from multiple dataproviders, we would have organizational identifiers on the first level of the hierarchy under which all of the organization's objects would be stored, i.e. http://localhost:8080/fcrepo/rest/orgA/1cbc1b7e-501d-4fa1-9e6a-1dd0e90dadf2
There may well be (far) more than 100.000 objects per organization. 

I've seen, that creating a pairtree (like so: http://localhost:8080/fcrepo/rest/orgA/1c/bc/1b/7e/1cbc1b7e-501d-4fa1-9e6a-1dd0e90dadf2) is no longer a default setting, but didn't really find an explanation why. 

There once was a recommendation of not having too many objects on one level, so: What would be (mainly performance-wise) the current recommendation for us, with Fedora 6 on the way? Should we leave it off or switch it on? Does anyone have experiences with such a high number of objects on the same level with the current Fedora Versions?

Cheers
Kilian

Jared Whiklo

unread,
Jan 18, 2021, 1:34:20 PM1/18/21
to fedor...@googlegroups.com
Hi Killian,

There (IIRC) was work done on the Modeshape side to reduce issues around
the size of a repository on a single level and I believe the decision
was if you felt (as a repository owner/admin) that you needed to add
more levels you could do that.

I will note that in 5.1.1 you can re-enable the pairtree structure in
the fcrepo-config.xml [1], you can find more information on the
Configuration Options here [2].

But essentially copy the existing fcrepo-config.xml out of the webapp to
some location and then when starting your container set a JAVA_OPT for
-D|fcrepo.spring.configuration=file:/path/to/your/config.xml
|

cheers,
jared


[1]
https://github.com/fcrepo/fcrepo/blob/fcrepo-5.1.1/fcrepo-webapp/src/main/resources/spring/fcrepo-config.xml#L137-L143
[2]
https://wiki.lyrasis.org/display/FEDORA51/Configuration+Options+Inventory
> --
> You received this message because you are subscribed to the Google
> Groups "Fedora Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to fedora-tech...@googlegroups.com
> <mailto:fedora-tech...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/fedora-tech/8de096eb-03d0-42c8-a1e3-246dbe787a83n%40googlegroups.com
> <https://groups.google.com/d/msgid/fedora-tech/8de096eb-03d0-42c8-a1e3-246dbe787a83n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Jared Whiklo
jwh...@gmail.com


OpenPGP_signature

Kilian Amrhein

unread,
Jan 27, 2021, 1:57:40 PM1/27/21
to Fedora Tech
Thanks Jared for your quick reply,

in the meantime I did some performance testing and posted the results as a comment in the wiki [1]. There is really a big difference between pairtree on and off. When off, the POSTing gets significantly slower the more children there are. That also stays the same if I stop the script, restart the container to free memory and continue posting to the same resource (that shouldn't be necessary anyways). 

We would rather stick with the default/recommended setting (pairtree off), as we assume there is a reason for the recommendation. The rate (~200 obj/minute) is still faster than what we need it to be (the tool that generates the resource-packages which are uploaded to Fedora is slower). But we are unsure what happens, if there are, let's say, half a million or more direct children. Would we still be able to add more at a reasonable rate? Or would it break down altogether? - Is there anything we could do in the (memory-?)settings to speed the whole thing up? Maybe use postgres instead of mysql? Or throw heaps of memory on it?

What I also didn't expect is that Fedora 6 is slower. The only difference is that it is run in docker (and uses postgres). It looks like in Andrew's demo it is way faster, although I don't know which exact script and configuration he is running there [2]. He speaks about 4 million ingested objects, but I am not sure if they are kept, as there are also DELETE-requests showing up under the first graph. Also, here the ingest rate is not slowing down like in my tests. So, any suggestions what might be the difference?

I couldn't get it to run with the changed spring config and switched-on pairtree as described here [3]. It looks like the custom fcrepo.properties / fcrepo-config.xml is being used, but still no pairtree is created. So unfortunately I couldn't test that (yet). I did 

$ docker run -p8080:8080 -v/home/myuser/fcrepo.properties:/fcrepo.properties:Z -v/home/myuser/fcrepo-config.xml:/fcrepo-config.xml:Z  -e CATALINA_OPTS="-Dfcrepo.config.file=/fcrepo.properties" --name=fcrepo fcrepo/fcrepo
$ docker exec -it fcrepo bash
root@b33edc54e1a7:/usr/local/tomcat# cat /fcrepo.properties
fcrepo.spring.configuration=file:/fcrepo-config.xml

fcrepo.config.xml with the first bean under PID-Minter enabled and the c-namespace added in the beans node. 

What am I doing wrong here?


Sorry for putting multiple issues in this one reply...

Hope anyone can help! :)


Jared Whiklo

unread,
Jan 27, 2021, 4:06:25 PM1/27/21
to fedor...@googlegroups.com
Hey Killian,

I believe you said you are using Fedora 5.1.1.

The demo you are referencing is for the upcoming release of Fedora 6.0.0.

Unless I am mistaken, which happens a lot.

cheers,

jared
> <http://localhost:8080/fcrepo/rest/orgA/1c/bc/1b/7e/1cbc1b7e-501d-4fa1-9e6a-1dd0e90dadf2>)
>
> > is no longer a default setting, but didn't really find an
> explanation
> > why.
> >
> > There once was a recommendation of not having too many objects
> on one
> > level, so: What would be (mainly performance-wise) the current
> > recommendation for us, with Fedora 6 on the way? Should we leave it
> > off or switch it on? Does anyone have experiences with such a high
> > number of objects on the same level with the current Fedora
> Versions?
> >
> > Cheers
> > Kilian
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Fedora Tech" group.
> > To unsubscribe from this group and stop receiving emails from
> it, send
> > an email to fedora-tech...@googlegroups.com
> > <mailto:fedora-tech...@googlegroups.com>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/fedora-tech/8de096eb-03d0-42c8-a1e3-246dbe787a83n%40googlegroups.com
> <https://groups.google.com/d/msgid/fedora-tech/8de096eb-03d0-42c8-a1e3-246dbe787a83n%40googlegroups.com>
>
> >
> <https://groups.google.com/d/msgid/fedora-tech/8de096eb-03d0-42c8-a1e3-246dbe787a83n%40googlegroups.com?utm_medium=email&utm_source=footer
> --
> You received this message because you are subscribed to the Google
> Groups "Fedora Tech" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to fedora-tech...@googlegroups.com
> <mailto:fedora-tech...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/fedora-tech/73564aba-ae82-4d8e-8959-7402b9f7e95en%40googlegroups.com
> <https://groups.google.com/d/msgid/fedora-tech/73564aba-ae82-4d8e-8959-7402b9f7e95en%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Jared Whiklo
jwh...@gmail.com


OpenPGP_signature

Kilian Amrhein

unread,
Jan 28, 2021, 6:39:24 AM1/28/21
to Fedora Tech
Hi,

sorry I didn't make that clearer here. I tested 5.1.1 first, but then I thought it would be a good idea to also test Version 6 which ended up being even slower. Then I wondered why it was faster in the demo, but couldn't really get an answer to that.

cheers, Kilian

Jared Whiklo

unread,
Feb 4, 2021, 9:33:07 AM2/4/21
to fedor...@googlegroups.com
Hi Killian,

So I did a simple test along the vein of the n-children test you ran.


I didn't add as many children as you have, but I did notice a difference between MySQL and the other 2 databases.

I'll give it a test with Tomcat next to see if that is adding any more latency.

But we're adding this to our list of things to review. 

Cheers,
Jared

To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fedora-tech/22d23ddb-1766-4f81-a4b6-5529b574ff32n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages