SyncDelay attribute of Jackrabbit Cluster in repository configuration

266 views
Skip to first unread message

Lucas Vossberg

unread,
Jul 30, 2017, 5:24:36 AM7/30/17
to Hippo Community
Hi,

I'm trying to reduce the time updates take to replicate to the locale repositories in a multi-node setup.

The Jackrabbit Wiki describes the "syncDelay" attribute of a cluster configuration [3] and states a default value of 5 sec.
My repository.xml does not set this "syncDelay" attribute:

<Cluster>
 
<Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
        ...
 
</Journal>
</Cluster>

Still updates take only about .5 to 1 second to arrive at the other nodes. So the "syncDelay" must be configured somewhere else?
Is hippo-repository somehow changing that value?
I'ld greatly appreciate if someone could point me into the right direction.

Thank you and
kind regards,
Lucas

FYI: My intention to reduce the update delay has to do with the Post-Redirect-Get [1] pattern and load balancing without sticky sessions [4]:
On a form submit first data is saved to the repository and then the user is redirected to the next page. In a multi-node setup without sticky sessions this request often is answered by a node that has not yet received the form-data content in it´s local repository copy.
The mechanism used by Jackrabbit to keep track of the sync state is described on the Repository Maintenance page [2].

[1] https://www.onehippo.org/12/library/concepts/component-development/hst-2-forms.html
[2] https://www.onehippo.org/12/library/enterprise/installation-and-configuration/repository-maintenance.html
[3] https://wiki.apache.org/jackrabbit/Clustering
[4] https://en.wikipedia.org/wiki/Load_balancing_(computing)#Persistence

Lucas Vossberg

unread,
Jul 30, 2017, 11:36:34 AM7/30/17
to Hippo Community
Hi,

after further looking into the problem I've now found out that the delay in syncing updates between two copies of the repository is not the reason for my Post-Redirect-Get problem. If the load balancer directs the user to a different node, the saved form data cannot be retrieved at all, even after a sync has happened. I'll have to look into this.

Regarding the SyncDelay attribute I'm still curious why updates don't take the default 5 seconds. I've looked at Hippo CMS´s own version of Jackrabbit [1].
The RepositoryConfigurationParser class [2] also defines the default "SyncDelay" value with 5.000 milliseconds:

line 198: public static final String DEFAULT_SYNC_DELAY = "5000";

I've not found a piece of code where this value is overwritten...

Lucas





[1] https://www.onehippo.org/12/library/concepts/content-repository/patched-jackrabbit-versions-included-in-hippo-repository.html
[2] https://code.onehippo.org/cms-community/hippo-jackrabbit/blob/6d3880f708984789733e0fbd152cc2f2261e7105/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/config/RepositoryConfigurationParser.java

Ard Schrijvers

unread,
Jul 30, 2017, 12:20:07 PM7/30/17
to hippo-c...@googlegroups.com
Hey Lucas,

On Sun, Jul 30, 2017 at 5:36 PM, Lucas Vossberg <voss...@exedra.de> wrote:
> Hi,
>
> after further looking into the problem I've now found out that the delay in
> syncing updates between two copies of the repository is not the reason for
> my Post-Redirect-Get problem. If the load balancer directs the user to a
> different node, the saved form data cannot be retrieved at all, even after a
> sync has happened. I'll have to look into this.

You are making some incorrect assumptions. Why the form data cannot be
retrieved in your setup I don't know, you have to sort this out. Can
it be retrieved on the cluster node that saved the form data?

Either way, these are your wrong assumptions:

1) You do not need a sync to make a 'new node' available on a
different cluster node for *fetching*. Thus, when trying to fetch a
node by a UUID, and this node is not in local cache, a database lookup
is done, regardless whether a sync has been done after the node has
been written to the database (by another cluster node). Since the hst
2 forms works with UUIDs, you can use them in stateless websites
without requiring a cluster sync. This is how I designed the hst form
data to work in stateless webapps many years ago.

2) The cluster sync does update JCR Nodes in local cache, it does
update the local lucene indexes and it triggers JCR eventlisteners. As
explained, for fetching a new node by UUID that was not retrieved
earlier (and thus not in local cache), you do not need a cluster sync

3) The cluster sync delay configuration is the *maximum* time it takes
before a sync is done. However, and jcr Session refresh *does* trigger
a direct cluster sync. If there is some code that triggers a jcr
session refresh every, say, 0.5 seconds, then the sync is done every
0.5 seconds. This is why session refreshes need to be done with care
(and why the HST uses in its session pools a 'localRefresh' that we
added in the Hippo repository: A refresh that does not trigger a way
more expensive cluster sync)

HTH,

Regards Ard
> --
> Hippo Community Group: The place for all discussions and announcements about
> Hippo CMS (and HST, repository etc. etc.)
>
> To post to this group, send email to hippo-c...@googlegroups.com
> RSS:
> https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
> ---
> You received this message because you are subscribed to the Google Groups
> "Hippo Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hippo-communi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/hippo-community.
> For more options, visit https://groups.google.com/d/optout.



--
Hippo Netherlands, Oosteinde 11, 1017 WT Amsterdam, Netherlands
Hippo USA, Inc. 71 Summer Street, 2nd Floor Boston, MA 02110, United
states of America.

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Lucas Vossberg

unread,
Jul 30, 2017, 6:18:40 PM7/30/17
to Hippo Community
Hi Ard,

thank you so much for taking the time to explain, that really helped a lot. I'll leave the SyncDelay as it is.


>Can it be retrieved on the cluster node that saved the form data?
Yes, no problem there.

On the other node I'm getting an ItemNotFoundException for

Node persistedFormData = session.getNodeByIdentifier(uuid);

even if uuid is valid on that call.
I've changed the code to do

Thread.sleep(1000);
persistedFormData
= session.getNodeByIdentifier(uuid);

when an exception was raised. I need to do this between 1-3 times until the query gets a valid result.

Atm I can live with this workaround. With more time I'll try to find the reason for this delay. I'm sure it has to do with my setup.

Lucas

Ard Schrijvers

unread,
Jul 31, 2017, 5:07:24 AM7/31/17
to hippo-c...@googlegroups.com
Hey Lucas,

On Mon, Jul 31, 2017 at 12:18 AM, Lucas Vossberg <voss...@exedra.de> wrote:
> Hi Ard,
>
> thank you so much for taking the time to explain, that really helped a lot.
> I'll leave the SyncDelay as it is.
>
>>Can it be retrieved on the cluster node that saved the form data?
> Yes, no problem there.
>
> On the other node I'm getting an ItemNotFoundException for
>
> Node persistedFormData = session.getNodeByIdentifier(uuid);
>
> even if uuid is valid on that call.
> I've changed the code to do
>
> Thread.sleep(1000);
> persistedFormData = session.getNodeByIdentifier(uuid);
>
> when an exception was raised. I need to do this between 1-3 times until the
> query gets a valid result.

Do you have the database clustered as well? I just checked the entire
call stack that is executed for a #getNodeByIdentifier and this
results in a database call, regardless whether a cluster sync has been
done or not (aka, it should work as I described in my previous mail, I
just validated it in the code)

>
> Atm I can live with this workaround. With more time I'll try to find the
> reason for this delay. I'm sure it has to do with my setup.

Well....if you have your database clustered and the cluster node
instances connect to different databases, then your observed behavior
does resonate with me. Otherwise, I can't really understand it. I'd be
interested however to know what is causing your observed problem. I
did not test it myself, but did not hear about your observed problem
before.

Please let me know

Regards Ard

Lucas Vossberg

unread,
Jul 31, 2017, 5:31:17 AM7/31/17
to Hippo Community
Hi,

there is only a single instance of AWS RDS (Community Mysql Edition) running. Both Site nodes use it for the repository.

The 1-3 second delay until I can fetch the new data corresponds with the time it takes REPOSITORY_LOCAL_REVISIONS [1] to show all nodes are up-to-date. But this can just be a coincidence and the cause has nothing to do with the sync activity.
In any case I'll update this post with the reason for the delay, ones I've found it :-)

Regards,
Lucas

[1] https://www.onehippo.org/12/library/enterprise/installation-and-configuration/repository-maintenance.html




Am Montag, 31. Juli 2017 11:07:24 UTC+2 schrieb ard.schrijvers:
Hey Lucas,

Ard Schrijvers

unread,
Jul 31, 2017, 5:40:22 AM7/31/17
to hippo-c...@googlegroups.com
Hey Lucas,

I am curious to your findings because they do not match with how the
code (should) work. Please let me know your findings.

Regards Ard

Lucas Vossberg

unread,
Jul 31, 2017, 8:50:37 PM7/31/17
to Hippo Community
Hi Ard,

I finally managed to get rid of the delay. All that was needed is a session.refresh. This is my code to fetch the formmap data:
session = request.getRequestContext().getSession();
session
.refresh(true);
Node persistedFormData = session.getNodeByIdentifier(uuid);

I've no idea why this is necessary. From your explanations I understand no refresh should be necessary.
On the other hand there is this detailed description of ItemStateManagement [1]. From those examples (Use case 3: save)  I think it is necessary to refresh the local cache on the 2nd node? Or is all this not relevant because I'm accessing the underlying database directly by using the uuid query?

Btw: To make the above code work one needs to add the liveusers group to the "formdata" domain. [2]

Kind regards,
Lucas

[1] https://wiki.apache.org/jackrabbit/ItemStateManagement
[2] https://www.onehippo.org/12/library/concepts/security/domains.html

Ard Schrijvers

unread,
Aug 1, 2017, 5:24:03 AM8/1/17
to hippo-c...@googlegroups.com
On Tue, Aug 1, 2017 at 2:50 AM, Lucas Vossberg <voss...@exedra.de> wrote:
> Hi Ard,
>
> I finally managed to get rid of the delay. All that was needed is a
> session.refresh. This is my code to fetch the formmap data:
> session = request.getRequestContext().getSession();
> session.refresh(true);
> Node persistedFormData = session.getNodeByIdentifier(uuid);
>
> I've no idea why this is necessary. From your explanations I understand no
> refresh should be necessary.
> On the other hand there is this detailed description of ItemStateManagement
> [1]. From those examples (Use case 3: save) I think it is necessary to
> refresh the local cache on the 2nd node? Or is all this not relevant because
> I'm accessing the underlying database directly by using the uuid query?

I can only understand this behavior *if* the node would be fetched for
example by path, and the parent would already be in local cache (and
thus not yet updated and not yet having the child node). *However*,
since the fetch is by *new* UUID, the only thing a cluster node can
do, regardless whether it has anything in cache or not, is trying to
fetch the UUID from the database. A session refresh shouldn't impact
anything. This is how I think it should be.

However, your observed behavior is different. I am now going to guess:
Perhaps the new node can be fetched by UUID on the other cluster node,
but when checking the read-access, it fails because for example the
parent node was already in local cache and not updated (so the parent
doesn't see the newly fetched node as child). Note, I am just guessing
here.

Either way, I am interested in finding the root cause. At this moment,
I unfortunately do not have the time to set this all up. You can
create a repo jira issue for it if you want.

Note also that I would not recommend as workaround the
session#refresh. I'd recommend the setup that we at Hippo typically
do: cluster node affinity *without* relying on http sessions, see [1].
Note you need to be a customer to read that page. You can contact
sales / helpdesk if you do not have an account

HTH,

Regards Ard

[1] https://www.onehippo.org/library/enterprise/installation-and-configuration/hippo-cms-loadbalancing-requirements.html

Lucas Vossberg

unread,
Aug 1, 2017, 6:53:07 AM8/1/17
to Hippo Community
Hi Ard,
thank you for your further suggestions. Please see my inline comments below.


I can only understand this behavior *if* the node would be fetched for
example by path, and the parent would already be in local cache (and
thus not yet updated and not yet having the child node). *However*,
since the fetch is by *new* UUID, the only thing a cluster node can
do, regardless whether it has anything in cache or not, is trying to
fetch the UUID from the database. A session refresh shouldn't impact
anything. This is how I think it should be.

However, your observed behavior is different. I am now going to guess:
Perhaps the new node can be fetched by UUID on the other cluster node,
but when checking the read-access, it fails because for example the
parent node was already in local cache and not updated (so the parent
doesn't see the newly fetched node as child). Note, I am just guessing
here.

This sounded like a good idea. I'm creating a random folder hierarchy under /formdata root node to save the new node there.
Unfortunately this is not the cause of the delay: I've changed the code to create the new node directly under /formdata. I can see the nodes in the console view. So at least after the second form submit the node should have the root node /formdata in it´s local cache?. But this didnt help, I'm experiencing the delay again with every request.
 

Either way, I am interested in finding the root cause. At this moment,
I unfortunately do not have the time to set this all up. You can
create a repo jira issue for it if you want.

Note also that I would not recommend as workaround the
session#refresh. I'd recommend the setup that we at Hippo typically
do: cluster node affinity *without* relying on http sessions, see [1].
Note you need to be a customer to read that page. You can contact
sales / helpdesk if you do not have an account
Atm I'm bound to Amazons Application Load Balancer which only does cookie based node affinity on the http level (aka sticky sessions). There is a nginx instance in front of each node. I'll have to redesign my setup to have nginx do the actual load balancing. This will indeed be better than using cookies. Though not having to rely on any node affinity is my final goal.

Regards,
Lucas
 

Ard Schrijvers

unread,
Aug 1, 2017, 7:41:55 AM8/1/17
to hippo-c...@googlegroups.com
On Tue, Aug 1, 2017 at 12:53 PM, Lucas Vossberg <voss...@exedra.de> wrote:
> Hi Ard,
> thank you for your further suggestions. Please see my inline comments below.
>
>> I can only understand this behavior *if* the node would be fetched for
>> example by path, and the parent would already be in local cache (and
>> thus not yet updated and not yet having the child node). *However*,
>> since the fetch is by *new* UUID, the only thing a cluster node can
>> do, regardless whether it has anything in cache or not, is trying to
>> fetch the UUID from the database. A session refresh shouldn't impact
>> anything. This is how I think it should be.
>>
>> However, your observed behavior is different. I am now going to guess:
>> Perhaps the new node can be fetched by UUID on the other cluster node,
>> but when checking the read-access, it fails because for example the
>> parent node was already in local cache and not updated (so the parent
>> doesn't see the newly fetched node as child). Note, I am just guessing
>> here.
>
>
> This sounded like a good idea. I'm creating a random folder hierarchy under
> /formdata root node to save the new node there.
> Unfortunately this is not the cause of the delay: I've changed the code to
> create the new node directly under /formdata. I can see the nodes in the
> console view. So at least after the second form submit the node should have
> the root node /formdata in it´s local cache?. But this didnt help, I'm
> experiencing the delay again with every request.

I should reproduce this locally so I can validate my claims.
Unfortunately not yet time for it

>
>>
>>
>> Either way, I am interested in finding the root cause. At this moment,
>> I unfortunately do not have the time to set this all up. You can
>> create a repo jira issue for it if you want.
>>
>> Note also that I would not recommend as workaround the
>> session#refresh. I'd recommend the setup that we at Hippo typically
>> do: cluster node affinity *without* relying on http sessions, see [1].
>> Note you need to be a customer to read that page. You can contact
>> sales / helpdesk if you do not have an account
>
> Atm I'm bound to Amazons Application Load Balancer which only does cookie
> based node affinity on the http level (aka sticky sessions). There is a
> nginx instance in front of each node. I'll have to redesign my setup to have
> nginx do the actual load balancing. This will indeed be better than using
> cookies. Though not having to rely on any node affinity is my final goal.

The idea is to use a cookie, but just not a http session cookie! The
load balancer itself can set the cookie (for example serverId). The
application doesn't need to know anything about this. I am quite sure
you must be able to set this up with Amazons Application Load
Balancer. IIRC we do this with amazon elb as well in this way. [1]
describes an example with HAProxy. The concept is for all the load
balancers the same. Thus, on application level you stay stateless (no
http session), just on loadbalancer level you inject a cookie that is
used for cluster node affinity

Lucas Vossberg

unread,
Aug 1, 2017, 8:46:46 AM8/1/17
to Hippo Community
Sorry, I didn't realize the different cookie types/usages. The "sticky session" cookie of the ALB is doing just what you're describing.

Unfortunately it is set with a lifetime of one week, even if the "stickyness" of the session is only set to 5 seconds. From a user´s perspective this feels unnecessary and why I'ld like to avoid the cookie.

Lucas

Ard Schrijvers

unread,
Aug 1, 2017, 10:21:44 AM8/1/17
to hippo-c...@googlegroups.com
On Tue, Aug 1, 2017 at 2:46 PM, Lucas Vossberg <voss...@exedra.de> wrote:
> Sorry, I didn't realize the different cookie types/usages. The "sticky
> session" cookie of the ALB is doing just what you're describing.
>
> Unfortunately it is set with a lifetime of one week, even if the
> "stickyness" of the session is only set to 5 seconds. From a user´s
> perspective this feels unnecessary and why I'ld like to avoid the cookie.

Well....a user (website visitor) doesn't notice, right?

I think for now that is preferable above the session#refresh but that
is your choice :-)

Regards Ard

>
> Lucas
>
>>
>> The idea is to use a cookie, but just not a http session cookie! The
>> load balancer itself can set the cookie (for example serverId). The
>> application doesn't need to know anything about this. I am quite sure
>> you must be able to set this up with Amazons Application Load
>> Balancer. IIRC we do this with amazon elb as well in this way. [1]
>> describes an example with HAProxy. The concept is for all the load
>> balancers the same. Thus, on application level you stay stateless (no
>> http session), just on loadbalancer level you inject a cookie that is
>> used for cluster node affinity
>>
>> HTH,
>>
>> Regards Ard
>>
>> [1]
>> https://www.onehippo.org/library/enterprise/installation-and-configuration/hippo-cms-loadbalancing-requirements.html
>>
>
Reply all
Reply to author
Forward
0 new messages