Anybody clustering out there?

Daniel Lamb

unread,

Oct 29, 2015, 12:08:26 PM10/29/15

to fedor...@googlegroups.com

All,

Some of us in the community are starting to take deeper looks at the clustering capabilities of Fedora. There was a meeting earlier this week about it, and pursuant to this, I’d like to solicit the community for any information people have on the subject.

Has anyone out there attempted this yet? Do you have any results or issues that you’d like to share? What was the use case you were trying to solve?

If you haven’t attempted it yet, but are considering it, what’s your use case? Are you interested in high availability, sharding for distributed storage, or some mix of the two? Is there anything in particular you would like to see addressed?

Anything you could share will help us move this topic forward, and will hopefully result in some sort of community knowledge base on the subject, since it’s likely to be a shared concern for many of the people and organizations using Fedora.

Thanks,

Danny

Alexander

unread,

Oct 29, 2015, 12:27:31 PM10/29/15

to Fedora Tech

I'm trying to to build a two-node cluster now, for availability purpose. The goal is to have a reserve node with data in case something bad happens with the main node. I want to have full Infinispan replication, which seems good for my use case. That means I'll have two equal nodes, and can use them to distribute load as well, if I want to.

I'm working with Sufia installation, which (I think) uses Fedora 4.1. I started working with Fedora 4.2 - some time ago it was the last version which supported clustering; looks like now a bug was fixed in the newest version, but I didn't try it yet.

I'm installing Fedora on a fresh AWS EC2 machine (Ubuntu OS). Here are some bash scripts which help me with installation.

f-install.sh

f-build1.sh

f-build2.sh

f-run1.sh

f-run2.sh

Stefano Cossu

unread,

Oct 29, 2015, 12:29:41 PM10/29/15

to fedor...@googlegroups.com

Hi Danny,
At the AIC we are planning to start with a single 16-core, 16Gb RAM Fedora server and a replication server for backup. As we assess bottlenecks during the beta phase we might consider clustering if Fedora turns out to be one.

Clustering would be more desired to improve write performance and read of large binary files. We want to use Fedora as little as possible for reading metadata, which we will rely on triplestore and Solr indexes for.

Hope this helps,
Stefano

--
You received this message because you are subscribed to the Google Groups "Fedora Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedora-tech...@googlegroups.com.
To post to this group, send email to fedor...@googlegroups.com.
Visit this group at http://groups.google.com/group/fedora-tech.
For more options, visit https://groups.google.com/d/optout.

--

Stefano Cossu
Director of Application Services, Collections

The Art Institute of Chicago
116 S. Michigan Ave.
Chicago, IL 60603
312-499-4026

DRI-P

unread,

Oct 29, 2015, 1:38:54 PM10/29/15

to Fedora Tech

good afternoon,

i have been trying out clustering with fedora 4.2 . (i have created ansible deployment if anyone is interested at: https://github.com/Digital-Repository-of-Ireland/ansible-fedora4 ) .

Our use case is to add failover to our existing single fedora instance. our usage requires acceptable write performance.

On testing with our hydra stack, i am noticing terrible write performance (create collection) which is taking 10 times longer to complete. Im wondering if i have the configuration correct (i think i am using distributed replication which is recommened for best write performance). should i try fedora 4.2 (assuming clustering bug is fixed)?

i have posted configuration here previously: https://groups.google.com/forum/#!topic/fedora-community/pnzkD470Ab0

thanks

Andrew Woods

unread,

Oct 29, 2015, 4:34:57 PM10/29/15

to fedor...@googlegroups.com

Thanks Danny (and All), for raising this topic.

It would appear that the Fedora community is approaching critical mass of interest around and testing of clustering. At the same time, there are a variety of questions that would potentially benefit from the attention of like-minded stakeholders.

I could see value in documenting the:

- specific scenarios clustering-stakeholders respectively aim to address with the feature,

- specific installation environments and configurations, and

- roadblocks, issues and successes that are being met in the process.

As it stands right now, the clustering investigations seem to be occurring in isolation. Is there room for an informal "clustering tiger team" [1] to collectively work through common problems?

For example, there is a wealth of untapped configurations that could likely prove very effective:

- https://github.com/infinispan/infinispan/wiki/RAC:-Reliable-Asynchronous-Clustering

- https://github.com/infinispan/infinispan/wiki/Handling-cluster-partitions

- https://github.com/infinispan/infinispan/wiki/Clustered-listeners

- https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Replication

- https://github.com/infinispan/infinispan/wiki/Consistency-guarantees-in-Infinispan

- etc, etc, etc

What is needed, however, is a team of stakeholders with actual use cases against which to explore such opportunities.

Thoughts?

Andrew

[1] https://en.wikipedia.org/wiki/Tiger_team

Benjamin Pennell

unread,

Oct 30, 2015, 11:15:39 AM10/30/15

to Fedora Tech

We're also now starting to look into clustering for replication purposes, largely to wrap our heads around how it would replace our current usage of irods.

It looks like fcrepo's master has a significantly different version of infinispan versus fcrepo4.4 (6.06 => 7.2.3) too, which has some definite implications for configuration. Would you recommend using master as a starting point for investigating clustering instead of the current release for repositories that aren't anywhere near production? I'm actually having some difficulty with the clustering examples on the wiki and in github since there seem to be properties in the example configurations that modeshape is rejecting.

Is clustering going to be covered at all at the Fedora camp in Durham? It is on the proposed schedule but has a strikethrough.

DRI-P

unread,

Oct 30, 2015, 11:33:03 AM10/30/15

to Fedora Tech

i have also struggled with finding the correct clustering configurations, ive had to resort to trial and error until it works. A standard configuration on the wiki would be useful.

Andrew Woods

unread,

Oct 30, 2015, 1:12:31 PM10/30/15

to fedor...@googlegroups.com

Hello Ben,

If you are starting to investigate clustering now, I would recommend starting with the master branch of fcrepo4. As you mentioned, recent updates that followed the 4.4.0 release include an Infinispan upgrade that impacts its configuration file. The examples in the codebase should allow you to get a cluster up, but your help translating that into the wiki documentation would be a valued contribution:

https://wiki.duraspace.org/display/FEDORA4x/Deploying+a+Fedora+Cluster

As asked in my previous email on this thread, "Would it be helpful to establish a coordinated, collaborative community effort around clustering"?

Regarding the upcoming Fedora camp in Durham, it is not anticipated that clustering will be covered.

Regards,

Andrew

Alexander Mikhailov

unread,

Oct 30, 2015, 1:19:12 PM10/30/15

to fedor...@googlegroups.com

I think we can have several use cases - not too many, which are particularly popular - and cover them well from bottom up for complete Fedora beginners. No external knowledge assumed (about Infinispan, ModeShape, JGroups, LevelDB...).

It would be great also to have those typical configuration implemented as automated tests.

The scripts I mentioned above I can use for nearly automatic deployment of cluster. I'd like to have something like that integrated into Fedora testing set, but I'm missing several things. What is a way to have those scripts in Maven environment (right now they are bash scripts)? How to automatically test a cluster after launching?

--
You received this message because you are subscribed to a topic in the Google Groups "Fedora Tech" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fedora-tech/kCGoE1ZBc0c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fedora-tech...@googlegroups.com.

Daniel Lamb

unread,

Oct 30, 2015, 1:22:50 PM10/30/15

to fedor...@googlegroups.com

Alexander,

You could use mvn to call ant tasks, which would really be just wrappers for your bash scripts.

I’m sure there’s other options, but that’s the first thing that comes to mind.

~Danny

Andrew Woods

unread,

Oct 30, 2015, 1:32:21 PM10/30/15

to fedor...@googlegroups.com

Hello Alexander,

Establishing a small set of clustering use cases, surrounding documentation, deployment scripts, and automated testing sounds like an excellent objective. Maven is likely not the ideal tool for spinning up a cluster and associated tests; however, a Vagrant box or even P_ire's Ansible scripts [1] could be used for the purpose of spinning up a consistent environment. We would want to talk more about what the "test scripts" would be exercising, but they could either exist within the scripted environment or as a separate tool.

If you would be willing to start a list of suggested use cases, I have created a wiki location for collecting them [2]. This page can also be used for collecting ideas around successful configurations, tools, environments, etc.

Thanks,

Andrew

[1] https://github.com/Digital-Repository-of-Ireland/ansible-fedora4

[2] https://wiki.duraspace.org/display/FF/Design+-+Clustering+Recipes

Alexander Mikhailov

unread,

Oct 30, 2015, 1:33:33 PM10/30/15

to fedor...@googlegroups.com

Daniel, I think from inside Maven project I have much better ability to tailor config files (that's majority of work done by those scripts) and, in addition to that, to check the results. I'd really not want to have several levels of wrappers - Maven over ant over bash over some string manipulators like sed and awk - instead of straightforward Java with XML parsers easily available (plus, again, checking that after launching the cluster actually passes tests).

What I need is understanding of "traditions" of the project (right now I create directories f1 and f2 in the /home/ubuntu - default user - directory of a fresh AWS EC2 instance - does Fedora have other, more traditional ways to have two instances installed on the same machine?) and API to check that the cluster is working (what are a good set of tests for cluster?)

Alexander Mikhailov

unread,

Nov 2, 2015, 12:00:33 PM11/2/15

to fedor...@googlegroups.com

Do we have a good way to check the process of clustering between several hosts?

I'm having a problem to merge into cluster a couple of nodes. Those nodes seem to work fine when they are on the same physical machine (but different JVMs); however when I put them on different (virtual) machines, they stop exchanging updates.

I'd like to watch the process of them finding each other and joining into cluster; I think I can then figure out where the process fails.

DRI-P

unread,

Nov 11, 2015, 10:51:49 AM11/11/15

to Fedora Tech

hi again,

i have updated the ansible-fedora4 deployment to use fcrepo version 4.4.1 snapshot. here is the link:

https://github.com/Digital-Repository-of-Ireland/ansible-fedora4

- p

Andrew Woods

unread,

Nov 12, 2015, 6:30:25 PM11/12/15

to fedor...@googlegroups.com

Hello All,

As an update, I have pulled the salient points from this thread into the following document:

https://wiki.duraspace.org/display/FF/Design+-+Clustering+Recipes

There appears to be common interest in Fedora clustering for "high availability and/or fail-over". Concerns that have been raised relate to documentation and write performance. On that page are also successful scripts and configuration opportunities to explore.

We can continue to collect use cases, tooling, and experiences on this page, and potentially decide to work as a team towards specific and progressing goals. That effort will need to spring from within the community, but will certainly be met with project support.

Regards,

Andrew

DRI-P

unread,

Nov 13, 2015, 10:01:02 AM11/13/15

to Fedora Tech

update:

i have been testing with fcrepo 4.4.1 snapshot and i think i'm noticing greatly improved performance over fcrepo 4.2. in single instance it seems faster. in clustered, (i assume i'm using distributed clustering but im not sure. the landing REST page is changed now and doesn't show any clustering information. i'm using the default conf files).

But i have come up against what i think is a bug. When creating a collection, our hydra/activefedora stack fails. We are using sufia version committer to version our xml datastreams. we get the following error:

Nov 6 15:05:48 vm50 rails[30785]: [15:05:48.363621 ] [FATAL]: Ldp::NotFound (<html><head><title>Apache Tomcat/7.0.52 (Ubuntu) - Error report</title><style></style> </head><body><h1>HTTP Status 404 - Not Found</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>Not Found</u></p><p><b>description</b> <u>The requested resource is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/7.0.52 (Ubuntu)</h3></body></html>):#012 app/actors/dri/object/actor.rb:31:in `version_and_record_committer'#012 app/controllers/collections_controller.rb:164:in `create'

On fedora access logs in tomcat, it seems to get a 404 when attempting versioning:

- - [13/Nov/2015:14:41:07 +0000] "GET /fedora/rest/prod/gf/06/g2/91/gf06g291g/descMetadata/fcr:versions HTTP/1.1" 404 30

- - [13/Nov/2015:14:41:08 +0000] "POST /fedora/rest/prod/gf/06/g2/91/gf06g291g/descMetadata/fcr:versions HTTP/1.1" 201 -

- - [13/Nov/2015:14:41:08 +0000] "GET /fedora/rest/prod/gf/06/g2/91/gf06g291g/descMetadata/fcr:versions HTTP/1.1" 200 1591

- - [13/Nov/2015:14:41:08 +0000] "GET /fedora/rest/prod/gf/06/g2/91/gf06g291g/properties/fcr:versions HTTP/1.1" 404 30

- - [13/Nov/2015:14:41:08 +0000] "POST /fedora/rest/prod/gf/06/g2/91/gf06g291g/properties/fcr:versions HTTP/1.1" 404 985

if i navigate to this location it is there. Also, if after this i check fedora, the collection is created correctly. This error does not occur with fedora 4.4.1 single instance so i assume its a bug in clustered fedora. here is the access logs from single instance fedora 4.4.1 :

- - [06/Nov/2015:15:57:57 +0000] "GET /fedora/rest/prod/z3/16/q1/67/z316q1679/descMetadata/fcr:versions HTTP/1.1" 404 30
- - [06/Nov/2015:15:57:57 +0000] "POST /fedora/rest/prod/z3/16/q1/67/z316q1679/descMetadata/fcr:versions HTTP/1.1" 201 -
- - [06/Nov/2015:15:57:57 +0000] "GET /fedora/rest/prod/z3/16/q1/67/z316q1679/descMetadata/fcr:versions HTTP/1.1" 200 1910
- - [06/Nov/2015:15:57:58 +0000] "GET /fedora/rest/prod/z3/16/q1/67/z316q1679/properties/fcr:versions HTTP/1.1" 404 30
- - [06/Nov/2015:15:57:58 +0000] "POST /fedora/rest/prod/z3/16/q1/67/z316q1679/properties/fcr:versions HTTP/1.1" 201 -

any ideas what the issue might be?

thanks

- p

Andrew Woods

unread,

Nov 13, 2015, 10:33:28 AM11/13/15

to fedor...@googlegroups.com

Hello DRI-P,

If you have your cluster set up in distributed mode, that means that the repository resources are distributed across the cluster (i.e. every resource is not on every cluster node). My initial guess would be that you are making a <resource>/properties/fcr:versions request on a cluster node that does not contain that resource.

How are you routing your calls to the various cluster nodes? Is there a load balancers?

Regarding, "the landing REST page is changed now and doesn't show any clustering information": yes, that was recently removed in the effort to move towards a cleanly specified REST API that does not expose underlying implementation assumptions.
Andrew

DRI-P

unread,

Nov 13, 2015, 10:38:08 AM11/13/15

to Fedora Tech

thank you for the clarification, i must be using distributed mode then. This is the mode i would prefer as you had mentioned it has better write performance.

So currently im testing it by just hitting one side of the two node cluster. does it matter which node the resource is on as its failing on a POST, not a GET?

I could test with a loadbalancer but how is the load balancer to know on which cluster node the resource is? my experience with haproxy is that it will route traffic based on a whole node being up or down.

- p

Andrew Woods

unread,

Nov 13, 2015, 10:43:36 AM11/13/15

to fedor...@googlegroups.com

Let me echo your questions to the others on this list who are working with clustering...

DRI-P

unread,

Nov 13, 2015, 10:50:37 AM11/13/15

to Fedora Tech

its worth noting that after i create collection and i get the 404 on POST /fedora/rest/prod/gf/06/g2/91/gf06g291g/properties/fcr:versions, i can navigate manually to it on both sides of cluster successfully. So it exists.

Reply all

Reply to author

Forward