3 replica cluster full HA

Vlad CopyLove

unread,

Nov 20, 2018, 12:12:29 PM11/20/18

to XtreemFS

Hello,

I need 3 nodes to work with same data set. Having at least one node fail tolerance for writes and sole node survival just for reads.

Is it possible to make such setup meaning that dir and meta also need to be replicated to each node.

From doc's I see that it will select local node for reads which is great.

I also see that write is simultaneous, but does it mean that client handles write to all nodes itself or there is some other mechanism in place?

like for example client start next write while waiting for node to populate replicas to other nodes then later returning write complete, or it waits and makes connections to all replicas multiplying traffic and failing behind slowest replica with whole write queue?

Thanks,

vlad

Robert Schmidtke

unread,

Nov 26, 2018, 4:10:29 AM11/26/18

to XtreemFS

Hi Vlad,

DIR and MRC replication is available, however not recommended for production use, as there are some race conditions during failover, which we have not addressed, and I cannot promise as to when/whether we will have the resources to fix these issues.

That being said, OSD replication works, and using quorum policies you can achieve the setup that you describe.

The client writes to one node, which then triggers replication to the other nodes (either sync or async).

Cheers

Robert

Vlad Kopylov

unread,

Nov 27, 2018, 10:16:56 AM11/27/18

to xtre...@googlegroups.com

I see. Thank you Robert.

Maybe shoot for multiple MRC with primary or even multi primary.

Looks like quorum helps with write file distribution. If available an
example how it can pick proper read OSD.

-v

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "XtreemFS" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to xtreemfs+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Robert Schmidtke

unread,

Nov 27, 2018, 11:40:25 AM11/27/18

to XtreemFS

Hi Vlad,

I recommend reading the replication section of the user guide: http://xtreemfs.org/xtfs-guide-1.5.1/index.html#tth_sEc6.1

As well as the async write section: http://xtreemfs.org/xtfs-guide-1.5.1/index.html#tth_sEc4.3.4

As to the OSD selection for reads, you may want to consider replica selection policies: http://xtreemfs.org/xtfs-guide-1.5.1/index.html#tth_sEc7.3

You could use a datacenter map or vivaldi coordinate based selection policies: http://xtreemfs.org/xtfs-guide-1.5.1/index.html#sec:osd_select_policy

Cheers

Robert

On Tuesday, November 27, 2018 at 4:16:56 PM UTC+1, Vlad Kopylov wrote:

I see. Thank you Robert.

Maybe shoot for multiple MRC with primary or even multi primary.

Looks like quorum helps with write file distribution. If available an
example how it can pick proper read OSD.

-v

On Mon, Nov 26, 2018 at 4:10 AM Robert Schmidtke wrote:
>
> Hi Vlad,
>
> DIR and MRC replication is available, however not recommended for production use, as there are some race conditions during failover, which we have not addressed, and I cannot promise as to when/whether we will have the resources to fix these issues.
> That being said, OSD replication works, and using quorum policies you can achieve the setup that you describe.
>
> The client writes to one node, which then triggers replication to the other nodes (either sync or async).
>
> Cheers
> Robert
>
> On Tuesday, November 20, 2018 at 6:12:29 PM UTC+1, Vlad CopyLove wrote:
>>
>> Hello,
>>
>> I need 3 nodes to work with same data set. Having at least one node fail tolerance for writes and sole node survival just for reads.
>> Is it possible to make such setup meaning that dir and meta also need to be replicated to each node.
>>
>> From doc's I see that it will select local node for reads which is great.
>>
>> I also see that write is simultaneous, but does it mean that client handles write to all nodes itself or there is some other mechanism in place?
>> like for example client start next write while waiting for node to populate replicas to other nodes then later returning write complete, or it waits and makes connections to all replicas multiplying traffic and failing behind slowest replica with whole write queue?
>>
>> Thanks,
>>
>> vlad
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "XtreemFS" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to .

Vlad Kopylov

unread,

Nov 30, 2018, 11:28:38 AM11/30/18

to xtre...@googlegroups.com

Thank you Robert.
Wow Vivaldi network coordinates for OSD and clients is awesome. Guess
that is what I need.
This certainly makes xtreemfs stand out. No one else has something
like this. Even ceph with it's crush maps doesn't include client in
the picture ruining the crush map approach as a whole.

-v

> To unsubscribe from this group and stop receiving emails from it, send an email to xtreemfs+u...@googlegroups.com.

Reply all

Reply to author

Forward