fedora cluster performance

90 views

Skip to first unread message

P_ire

unread,

Oct 22, 2015, 5:39:21 AM10/22/15

to Fedora Community

hello,

we are noticing terrible performance when using a fedora 4.2 cluster with activefedora9.3.0 /hydra. (I attempted to test fedora 4.4 cluster but came up against this bug: https://jira.duraspace.org/browse/FCREPO-1739 . is this bug being worked on at the moment?) .

I have fedora configured correctly I believe, but i want to ask if there is something i am forgetting. When i swap out fedora clustered for a single unclustered fedora instance, it works normally with usual performance. but on using clustered, its 10-20 times slower taking 1-2 minutes for simple actions such as creating collection.

When I interact directly with fedora rest it seems to behave normally. I'm wondering has anyone else had issues like this? if this is an issue with activefedora? and also if the issue with fedora 4.4 clustering is being resolved?

here are configuration files:

javaopts:

JAVA_OPTS="-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms2048m -Xmx4096m -XX:NewSize=512m -XX:MaxNewSize=512m -XX:+UseConcMarkSweepGC -XX:+DisableExplicitGC -Dfcrepo.home=/opt/fedora/data -Dfcrepo.modeshape.configuration=file:///opt/fedora/etc/modeshape.json -Djava.net.preferIPv4Stack=true -Djgroups.udp.mcast_addr=239.42.42.42 -Dfcrepo.ispn.jgroups.configuration=/opt/fedora/etc/jgroups-fcrepo-tcp.xml -Dfcrepo.ispn.configuration=/opt/fedora/etc/infinispan.xml"

jgroups-fcrepo-tcp.xml:

<TCP bind_port="7800"

loopback_separate_thread="true"

loopback="true"

recv_buf_size="${tcp.recv_buf_size:5M}"

send_buf_size="${tcp.send_buf_size:640K}"

max_bundle_size="64K"

max_bundle_timeout="30"

use_send_queues="true"

sock_conn_timeout="300"

timer_type="new3"

timer.min_threads="4"

timer.max_threads="10"

timer.keep_alive_time="3000"

timer.queue_max_size="500"

thread_pool.enabled="true"

thread_pool.min_threads="1"

thread_pool.max_threads="10"

thread_pool.keep_alive_time="5000"

thread_pool.queue_enabled="true"

thread_pool.queue_max_size="10000"

thread_pool.rejection_policy="discard"

oob_thread_pool.enabled="true"

oob_thread_pool.min_threads="1"

oob_thread_pool.max_threads="8"

oob_thread_pool.keep_alive_time="5000"

oob_thread_pool.queue_enabled="false"

oob_thread_pool.queue_max_size="100"

oob_thread_pool.rejection_policy="discard"/>

<MPING timeout="1000"

num_initial_members="1"/>

<MERGE2 max_interval="30000"

min_interval="10000"/>

<FD_ALL timeout="150000"/>

<VERIFY_SUSPECT timeout="150000" />

<pbcast.NAKACK2 use_mcast_xmit="false"

discard_delivered_msgs="true"/>

<pbcast.STABLE stability_delay="2000" desired_avg_gossip="50000"

max_bytes="4M"/>

<pbcast.GMS print_local_addr="true" join_timeout="6000"

view_bundling="true"/>

<MFC max_credits="2M"

min_threshold="0.4"/>

<pbcast.STATE_TRANSFER />

</config>

modeshape repo.xml:

{

"name" : "repo",

"jndiName" : "",

"workspaces" : {

"predefined" : ["default"],

"default" : "default",

"allowCreation" : true

"storage" : {

"cacheName" : "FedoraRepository",

"cacheConfiguration" : "/opt/fedora/etc/infinispan.xml",

"binaryStorage" : {

"type" : "cache",

"dataCacheName" : "FedoraRepositoryBinaryData",

"metadataCacheName" : "FedoraRepositoryMetaData"

}

"security" : {

"anonymous" : {

"roles" : ["readonly","readwrite","admin"],

"useOnFailedLogin" : false

"providers" : [

{ "classname" : "org.fcrepo.auth.common.BypassSecurityServletAuthenticationProvider" }

]

"node-types" : ["fedora-node-types.cnd"]

}

infinispan.xml:

<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="urn:infinispan:config:6.0 http://www.infinispan.org/schemas/infinispan-config-6.0.xsd"

xmlns="urn:infinispan:config:6.0">

</properties>

</transport>

</global>

<!--

Defines the default behavior for all caches, including those created dynamically (e.g., when a

repository uses a cache that doesn't exist in this configuration).

-->

<sync/>

</clustering>

</default>

<!--

Our Infinispan cache needs to be transactional. However, we'll also configure it to

use pessimistic locking, which is required whenever applications will be concurrently

updating nodes within the same process. If you're not sure, use pessimistic locking.

-->

<sync/>

</clustering>

<transaction

transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"

transactionMode="TRANSACTIONAL" lockingMode="PESSIMISTIC"/>

<!--

Define the cache loaders (i.e., cache stores). Passivation is false because we want *all*

data to be persisted, not just what doesn't fit into memory. Shared is false because there

are no other caches sharing this file store. We set preload to false for lazy loading;

may be improved by preloading and configuring eviction.

We can have multiple cache loaders, which get chained. But we'll define just one.

-->

<singleFile shared="false"

preload="false"

fetchPersistentState="true"

purgeOnStartup="false"

location="${fcrepo.ispn.repo.cache:target/FedoraRepository/storage}"/>

</persistence>

</namedCache>

<!--

Our Infinispan cache needs to be transactional. However, we'll also configure it to

use pessimistic locking, which is required whenever applications will be concurrently

updating nodes within the same process. If you're not sure, use pessimistic locking.

-->

<sync/>

</clustering>

<transaction

transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"

transactionMode="TRANSACTIONAL" lockingMode="PESSIMISTIC"/>

<!--

Define the cache loaders (i.e., cache stores). Passivation is false because we want *all*

data to be persisted, not just what doesn't fit into memory. Shared is false because there

are no other caches sharing this file store. We set preload to false for lazy loading;

may be improved by preloading and configuring eviction.

We can have multiple cache loaders, which get chained. But we'll define just one.

-->

<singleFile shared="false"

preload="false"

fetchPersistentState="true"

purgeOnStartup="false"

location="${fcrepo.ispn.cache:target/FedoraRepositoryMetaData/storage}"/>

</persistence>

</namedCache>

<!--

Our Infinispan cache needs to be transactional. However, we'll also configure it to

use pessimistic locking, which is required whenever applications will be concurrently

updating nodes within the same process. If you're not sure, use pessimistic locking.

-->

</clustering>

<transaction

transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"

transactionMode="TRANSACTIONAL" lockingMode="PESSIMISTIC"/>

<!--

Define the cache loaders (i.e., cache stores). Passivation is false because we want *all*

data to be persisted, not just what doesn't fit into memory. Shared is false because there

are no other caches sharing this file store. We set preload to false for lazy loading;

may be improved by preloading and configuring eviction.

We can have multiple cache loaders, which get chained. But we'll define just one.

-->

<singleFile shared="false"

preload="false"

fetchPersistentState="true"

purgeOnStartup="false"

location="${fcrepo.ispn.binary.cache:target/FedoraRepositoryBinaryData/storage}"/>

</persistence>

</namedCache>

</infinispan>

Andrew Woods

unread,

Oct 22, 2015, 8:27:14 AM10/22/15

to P_ire, Fedora Community

Hello P_ire,

What you are seeing is in line with the testing that others in the community have documented:

Test plan:

https://wiki.duraspace.org/display/FF/Assessment+Plan+-+Clustering

Test results:

https://wiki.duraspace.org/display/FF/Outcomes+-+Clustering

Older, but similar results:

https://wiki.duraspace.org/display/FEDORA41/Response+Time+Comparison+of+Single+Fedora+VS+Cluster

The basic clustering that has been demonstrated in Fedora4 is well suited for the high-availability, read-heavy, replicated use cases. In the current replicated mode [1], ingest requests block until a new resource has been replicated across every node in the cluster; hence the performance hit on collection creation. There is an alternative configuration, distributed mode [2], where you define n-copies of a resource across your x-number of clustered servers. It would be very informative to hear back on your experiences with the distributed mode.

That raises the question, however, of the exact nature of your use case. What are you hoping to achieve with clustering? What scenario(s) are you intending to address with a clustered Fedora4?

Regarding https://jira.duraspace.org/browse/FCREPO-1739 , I believe investigation on that ticket has just recently begun by a developer who has an interest in helping the community fix high-priority bugs, but not someone who is a stakeholder in the clustering feature. It would be great if a clustering stakeholder like yourself added a comment to that ticket to coordinate with the in-progress work.

Regards,

Andrew

[1] https://docs.jboss.org/author/display/MODE40/Clustering#Clustering-Replicated

[2] https://docs.jboss.org/author/display/MODE40/Clustering#Clustering-Distributed

--
You received this message because you are subscribed to the Google Groups "Fedora Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-communi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

P_ire

unread,

Oct 22, 2015, 8:56:10 AM10/22/15

to Fedora Community, tierna...@gmail.com

hi,

thank you for the quick reply.

Our use case for now is to replace a single fedora instance to improve high-availability, but hit it with the same traffic (reads and writes). we have issues with single fedora whereby it falls over (eats up all available memory and dies), so we hoped clustered fedora would be a solution. In future, we hope to separate ingest and access and thus could have fedora instance(s)/clusters suited to read (replicated mode), and another suited to writes (distributed mode).

Ill have a look at the documentation you have listed and deploy a distributed cluster and i will report back.

P_ire

unread,

Oct 22, 2015, 11:46:32 AM10/22/15

to Fedora Community, tierna...@gmail.com

hi,

im following this page: https://wiki.duraspace.org/display/FF/ModeShape+Clustering

can you confirm that the changes required to configure distribution clustering is to have infinispan.xml as follows

changes are in red.

<!--

<sync/>

</clustering>

-->

P_ire

unread,

Oct 27, 2015, 8:44:40 AM10/27/15

to Fedora Community, tierna...@gmail.com

i think i am now using distributed clustering, it says "DIST_SYNC" on the rest api page. but it is still the same, taking 1 minute to create a collection.

Is this expected performance? it seems to be excessively slower than a single instance.