Multiple Fuseki?

119 views
Skip to first unread message

Jonas Waeber

unread,
Apr 18, 2018, 3:35:25 AM4/18/18
to Skosmos Users
Hi Osma

Would it be possible to store the various Vocabularies in different Fuseki Triple Stores?

The speed of the Fuseki is currently the biggest obstacle to expanding the number of Concepts/Tripples accessible by Skosmos.

Would it be possible to expand Skosmos to be able to access multiple stores?
Would it be possible to abstract multiple Fuski server as a single Server?

Best regards

Jonas

Osma Suominen

unread,
Apr 18, 2018, 6:44:27 AM4/18/18
to skosmo...@googlegroups.com
Hi Jonas!

I see you are really trying to push the limits of Fuseki and Skosmos :)

You can configure each vocabulary in vocabularies.ttl to use a different
SPARQL endpoint (and of course different named graph). This works
already. The only thing you'll lose is the global search, which
currently relies on having a single endpoint where all the vocabularies
are accessible.

But just having several Fusekis running on the same server probably
won't help. Fuseki is already quite good at running queries in parallel
on multiple CPUs.

What I would suggest is:

- check that the machine has enough RAM (both for Fuseki and for OS disk
caches, perhaps split 50/50)
- check that your disks are fast enough (i.e. SSD)
- make sure you use an HTTP cache in front of Fuseki with a long TTL (I
guess your data sets are fairly static) - see the InstallTutorial on how
to configure Varnish, or you could use nginx too if you like
- rebuild the TDB once in a while - it will grow over time if you do
updates and eat your disk space, and disk caching will become less
efficient since a smaller proportion fits into RAM

If none of the above helps, think about whether it would be possible to
have more CPU cores available for Fuseki. I think the ideal setup for
you could be a single machine with lots of CPU cores - perhaps 16 or more.

If you can't have that, then run Fuseki on multiple servers, one Fuseki
per server, and spread out your vocabularies. If you still want global
search to function, you will need to have one default Fuseki endpoint
(set in config.inc) that has all the vocabularies in named graphs.

You can also ask the Jena users list for help in setting up a big Fuseki
installation, there are people on the list who know a lot more about
these setups than I do.

-Osma
> --
> You received this message because you are subscribed to the Google
> Groups "Skosmos Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to skosmos-user...@googlegroups.com
> <mailto:skosmos-user...@googlegroups.com>.
> To post to this group, send email to skosmo...@googlegroups.com
> <mailto:skosmo...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/skosmos-users/9b5f34c3-1c2f-4f47-9e34-6d7e2786d37a%40googlegroups.com
> <https://groups.google.com/d/msgid/skosmos-users/9b5f34c3-1c2f-4f47-9e34-6d7e2786d37a%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

Jonas Waeber

unread,
Apr 19, 2018, 2:30:58 AM4/19/18
to Skosmos Users
Hi Osma

Thanks for your detailed answer.


I see you are really trying to push the limits of Fuseki and Skosmos :)


Yes, it is an interessting endeavor. Unfortunately it will most likely stay theoretical as I am only staying on the Project for two more months and our financial resources are limited.

 
You can configure each vocabulary in vocabularies.ttl to use a different
SPARQL endpoint (and of course different named graph). This works
already. The only thing you'll lose is the global search, which
currently relies on having a single endpoint where all the vocabularies
are accessible.

Then this is not an option as the global search is the core feature of Bartoc Skosmos. Even though it is way too slow currently to be very usefull.

But just having several Fusekis running on the same server probably
won't help. Fuseki is already quite good at running queries in parallel
on multiple CPUs.

What I would suggest is:

- check that the machine has enough RAM (both for Fuseki and for OS disk
caches, perhaps split 50/50)

Propably the biggest problem. Everything is currently running on a 16 GB machine. (e.g. Skosmos, Fuseki, Varnish, Upload Routines). It would probably make more sense to run Skosmos & Varnish on a Server and Fuseki on another. Fuseki reserves 8 GB, which leaves 8 GB for the rest. Fuseki runs out of RAM from time to time.
 
- check that your disks are fast enough (i.e. SSD)

Done.
- make sure you use an HTTP cache in front of Fuseki with a long TTL (I
guess your data sets are fairly static) - see the InstallTutorial on how
to configure Varnish, or you could use nginx too if you like
Done.
- rebuild the TDB once in a while - it will grow over time if you do
updates and eat your disk space, and disk caching will become less
efficient since a smaller proportion fits into RAM

Does this happen when reloading vocabularies with PUT request to the Fuseki? Or would I have to actually delete the index files and then load all the vocabularies from scratch.
The issue here is, that dowloading, processing and uploading takes now more than a day for all vocabularies. Which uses all the RAM for large vocabularies.
 
If none of the above helps, think about whether it would be possible to
have more CPU cores available for Fuseki. I think the ideal setup for
you could be a single machine with lots of CPU cores - perhaps 16 or more.

This is definitelly a bottleneck then. The current server has only 2 cores... Will have to see if this can be changed.
 

If you can't have that, then run Fuseki on multiple servers, one Fuseki
per server, and spread out your vocabularies. If you still want global
search to function, you will need to have one default Fuseki endpoint
(set in config.inc) that has all the vocabularies in named graphs.

You can also ask the Jena users list for help in setting up a big Fuseki
installation, there are people on the list who know a lot more about
these setups than I do.

That would be interesting to set up. I will see if I have the time to get into this.

Best regards,

Jonas

Osma Suominen

unread,
Apr 19, 2018, 6:55:15 AM4/19/18
to skosmo...@googlegroups.com
Hi Jonas!

Jonas Waeber kirjoitti 19.04.2018 klo 09:30:
> Yes, it is an interessting endeavor. Unfortunately it will most likely
> stay theoretical as I am only staying on the Project for two more months
> and our financial resources are limited.

Oh, what a pity. WHat is the future of the Bartoc Skosmos installation
then? Are you (as an institution) planning to continue maintaining it?

> Then this is not an option as the global search is the core feature of
> Bartoc Skosmos. Even though it is way too slow currently to be very usefull.

Yes, global search is a big challenge. We are not happy with how it
works in Finto.fi either, and it's much smaller than your installation.

> Propably the biggest problem. Everything is currently running on a 16 GB
> machine. (e.g. Skosmos, Fuseki, Varnish, Upload Routines). It would
> probably make more sense to run Skosmos & Varnish on a Server and Fuseki
> on another. Fuseki reserves 8 GB, which leaves 8 GB for the rest. Fuseki
> runs out of RAM from time to time.

16 GB isn't very much for what you are doing. I would recommend
increasing this, at least double the amount would be good.

> Does this happen when reloading vocabularies with PUT request to the
> Fuseki? Or would I have to actually delete the index files and then load
> all the vocabularies from scratch.

Unfortunately TDB tends to grow each time you update the data, whether
using PUT or POST or SPARQL updates. Even if the number of triples stays
the same, the size on disk tends to grow. With TDB1 the only way to fix
this is to start over with an empty database. With the new TDB2 there is
also a "compact" operation which will essentially rebuild the database,
but you will have to take down Fuseki for the operation. Even then you
will end up with two copies of the database - old and new - and you will
have to maually delete the old database. So it doesn't help much.

> The issue here is, that dowloading, processing and uploading takes now
> more than a day for all vocabularies. Which uses all the RAM for large
> vocabularies.

Maybe the TDB could be built on another machine? It's just a bunch of
files in a directory, so easy to transfer or to use some kind of shared
filesystem.

> This is definitelly a bottleneck then. The current server has only 2
> cores... Will have to see if this can be changed.

Sounds like adding more cores (and RAM) would be the easiest ways of
improving the situation!

For the record, Finto.fi currently has 4 (virtual) CPU cores, 16GB RAM
and a fast SAN disk (at least partly SSD backed). We are currently
setting up new servers with 4 cores and 32GB RAM.

-Osma
Reply all
Reply to author
Forward
0 new messages