RDF4J server performance

Jo-Jo

unread,

Mar 23, 2021, 6:42:43 PM3/23/21

to RDF4J Users

Hello!

I recently posted on here asking for some help obtaining an independent Console distribution so I could set up a Docker image with pre-loaded data that is available at RDF4J server start time -- thanks to your help, I was able to get this set up! However, I have noticed that performance has been very sporadic; in some cases, I am able to send a query via GET request to the /rdf4j-server/repositories/[REPOSITORY ID] endpoint and get a response back very quickly, but usually (I would say 95%+ of the time), my requests time out (after a minute).

I was wondering if you had any ideas why this might be, or how I could improve the performance of my RDF4J server running inside my Docker container.

Thank you so much for your help!

Jo-Jo

unread,

Mar 23, 2021, 7:00:59 PM3/23/21

to RDF4J Users

I am also noticing that in some cases (but not always), after a request fails and I try it again, it will succeed almost immediately, and subsequent requests will succeed as well. This gives me the impression that there might be some sort of "warming-up" process that is triggered by the first request, and allows the subsequent requests to process quickly? I apologize if my analysis is very naive!

For further information, I am using in-memory data stores with RDFS inferencing with all of the default options

hmott...@gmail.com

unread,

Mar 26, 2021, 10:45:12 AM3/26/21

to RDF4J Users

Hi,

The RDFS inferencer has to read through all your data when it gets initialised in order to create it's cached forward-chained schema. This could be what is happening.

I just checked the source code and I'm afraid I haven't added any debug logging for the initialisation.

Do you see the same timeout if you don't use the RDFS inferencer?

Also, what exactly is the timeout error you are getting?

Cheers,

Håvard

Jo-Jo

unread,

Mar 29, 2021, 2:35:22 PM3/29/21

to RDF4J Users

Hello, thank you for the response!

Your explanation makes sense to me. I can try to check without the RDFS inferencer, but we really do need with inference! I also wanted to clarify that the timeout is not triggered by RDF4J server itself, but by the 60-second timeout in the container where we have deployed the server (so there is no error message from the server).

Qualitatively, I can tell you that after server startup, loading the query interface in the RDF4J workbench is extremely slow (takes on the order of multiple minutes to load initially), and HTTP queries to the server take just as long. We have noticed that after a few initial queries, the server memory usage seems to increase dramatically, and once it reaches a plateau, we are able to execute queries very quickly, which leads me to believe your explanation that it needs to cache the forward-chained schema, and begins to do so in earnest once it receives a query.

Because we would like to reduce the startup time of our RDF4J server (and avoid having to manually make queries in order to trigger the caching behavior):

- Can we configure our RDF4J server to preemptively begin the caching process as soon as it starts up?

- We are currently using the in-memory data store; would moving to a native disk store have any meaningful consequences on performance?

- Any other ideas on how to improve query performance or caching time?

- Would you be available/interested in hopping on a short call sometime this week to discuss debugging strategies? I would be very happy to!

Thanks so much for your help!

Jo-Jo

unread,

Mar 29, 2021, 6:02:34 PM3/29/21

to RDF4J Users

Relatedly, I'd also like to note that when loading around 600 MB of data into a `native-rdfs` store with default options, I get an OOM error using 16 GB of memory! (-Xmx16g) What is taking up so much memory? Any idea how to resolve this?

hmott...@gmail.com

unread,

Apr 7, 2021, 1:04:15 PM4/7/21

to RDF4J Users

Try using IsolationLevel.NONE.

As for debugging the inferencer I have opened an issue and you are very welcome to contribute :) https://github.com/eclipse/rdf4j/issues/2947

I've also created an issue to improve performance for sails that use persistence: https://github.com/eclipse/rdf4j/issues/2974