striping + replication problems

103 views
Skip to first unread message

nigeo...@gmail.com

unread,
May 10, 2013, 8:02:30 AM5/10/13
to xtre...@googlegroups.com
Hello,
I am trying to configure xtreemFS with striping + replication. I have 2 Windows Server 2008 R2 nodes, java xtreem servers, java xtreem client( based on libxtreemfs or your  hadoop implementation ). I setup 1 DIR , 1 MRC and 6 OSDs (32640,32641....) . Here is the information about my  volume:

My test volume
selectable OSDs  a66f00a0-1a21-12e1-bddb-0810200c9a88, b66f00a0-1a21-12e1-bddb-0810200c9a88, c66f00a0-1a21-12e1-bddb-0810200c9a88, d66f00a0-1a21-12e1-bddb-0810200c9a88, e66f00a0-1a21-12e1-bddb-0810200c9a88, f66f00a0-1a21-12e1-bddb-0810200c9a88, h66f00a0-1a21-12e1-bddb-0810200c9a88, i66f00a0-1a21-12e1-bddb-0810200c9a88, j66f00a0-1a21-12e1-bddb-0810200c9a88, k66f00a0-1a21-12e1-bddb-0810200c9a88
striping policy         STRIPING_POLICY_RAID0, 128, 2
access policy         ACCESS_CONTROL_POLICY_NULL
osd policy                 1000,3000
replica policy         3000
#files                          7
#directories          1
free disk space:          2.53 TB
occupied disk space:  121.87 MB

I am tried to configure "ronly" replication with replication factor 3 and striping 2.I uploaded successfully 7 files and then downloaded them (custom java client). Then simulated failure and stopped 1 OSD ( either a66f00a0-1a21-12e1-bddb-0810200c9a88 or b66f00a0-1a21-12e1-bddb-0810200c9a88 ). The download failed:

[ D | -                    | main            |   1 | May 10 14:34:02 ] java.io.IOException: sending RPC failed: server '/192.168.0.2:32640' not reachable (java.net.ConnectException: Connection refused: no further information)
 ...                                           org.xtreemfs.foundation.pbrpc.client.RPCResponse.get(RPCResponse.java:69)
 ...                                           org.xtreemfs.common.libxtreemfs.RPCCaller.syncCall(RPCCaller.java:95)
 ...                                           org.xtreemfs.common.libxtreemfs.RPCCaller.syncCall(RPCCaller.java:67)
 ...                                           org.xtreemfs.common.libxtreemfs.FileHandleImplementation.read(FileHandleImplementation.java:265)
 ...                                           com.tarmin.xtreemfs.client.XFSInputStream.read(XFSInputStream.java:69)
 ...                                           java.io.DataInputStream.read(DataInputStream.java:83)
 ...                                           org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)
 ...                                           org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
 ...                                           com.tarmin.xtreemfs.client.DownloadClient.main(DownloadClient.java:35)
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] sending request org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@27972e3a no 882324
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] reconnect to server still blocked locally to avoid flooding (server: /192.168.0.2:32640)
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] sending request org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@365bf624 no 882325
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] reconnect to server still blocked locally to avoid flooding (server: /192.168.0.2:32640)
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] sending request org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@4f7cd15d no 882326
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] reconnect to server still blocked locally to avoid flooding (server: /192.168.0.2:32640)
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] sending request org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@b955970 no 882327
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] reconnect to server still blocked locally to avoid flooding (server: /192.168.0.2:32640)
[ D | RPCNIOSocketClient   | main            |   1 | May 10 14:34:02 ] sending request org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@7a22ce00 no 882328 

If I  stop some of the others 4 OSDs  - I don't have problem with the  download. Here is the map of the replicas of one of the uploaded files:

osd_uuids: "a66f00a0-1a21-12e1-bddb-0810200c9a88"
osd_uuids: "b66f00a0-1a21-12e1-bddb-0810200c9a88"
replication_flags: 3
striping_policy {
  type: STRIPING_POLICY_RAID0
  stripe_size: 128
  width: 2
}

osd_uuids: "k66f00a0-1a21-12e1-bddb-0810200c9a88"
osd_uuids: "d66f00a0-1a21-12e1-bddb-0810200c9a88"
replication_flags: 17
striping_policy {
  type: STRIPING_POLICY_RAID0
  stripe_size: 128
  width: 2
}

osd_uuids: "h66f00a0-1a21-12e1-bddb-0810200c9a88"
osd_uuids: "e66f00a0-1a21-12e1-bddb-0810200c9a88"
replication_flags: 17
striping_policy {
  type: STRIPING_POLICY_RAID0
  stripe_size: 128
  width: 2
}

I also tried same test but without striping ( factor = 1) - no problem with downloading even when removed a66f00a0-1a21-12e1-bddb-0810200c9a88 or b66f00a0-1a21-12e1-bddb-0810200c9a88. What I am doing wrong? I expected replication factor = 3 to guarantee me failure of  1 node even with striping. Please help.
Greetings,
Nikolay 

Michael Berlin

unread,
May 13, 2013, 9:16:37 AM5/13/13
to xtre...@googlegroups.com
Hi,

You ran into a limitation of the current code :-(

In case of striped replicas, only the first replica is always used. We
fixed this in the C++ client library in 1.4, but not in the libxtreemfs
for Java client yet.

I've created an issue for it and posted a patch there:
http://code.google.com/p/xtreemfs/issues/detail?id=288

Please try the patch and give feedback on the issue tracker.

Best regards,
Michael



On 05/10/2013 02:02 PM, nigeo...@gmail.com wrote:
> Hello,
> I am trying to configure xtreemFS with striping + replication. I have 2
> Windows Server 2008 R2 nodes, java xtreem servers, java xtreem client(
> based on libxtreemfs or your hadoop implementation ). I setup 1 DIR , 1
> MRC and 6 OSDs (32640,32641....) . Here is the information about my volume:
>
> /My test volume/
> /selectable OSDs a66f00a0-1a21-12e1-bddb-0810200c9a88,
> b66f00a0-1a21-12e1-bddb-0810200c9a88,
> c66f00a0-1a21-12e1-bddb-0810200c9a88,
> d66f00a0-1a21-12e1-bddb-0810200c9a88,
> e66f00a0-1a21-12e1-bddb-0810200c9a88,
> f66f00a0-1a21-12e1-bddb-0810200c9a88,
> h66f00a0-1a21-12e1-bddb-0810200c9a88,
> i66f00a0-1a21-12e1-bddb-0810200c9a88,
> j66f00a0-1a21-12e1-bddb-0810200c9a88, k66f00a0-1a21-12e1-bddb-0810200c9a88/
> /striping policy STRIPING_POLICY_RAID0, 128, 2/
> /access policy ACCESS_CONTROL_POLICY_NULL/
> /osd policy 1000,3000/
> /replica policy 3000/
> /#files 7/
> /#directories 1/
> /free disk space: 2.53 TB/
> /occupied disk space: 121.87 MB/
>
> I am tried to configure "ronly" replication with replication factor 3
> and striping 2.I uploaded successfully 7 files and then downloaded them
> (custom java client). Then simulated failure and stopped 1 OSD ( either
> a66f00a0-1a21-12e1-bddb-0810200c9a88 or
> b66f00a0-1a21-12e1-bddb-0810200c9a88 ). The download failed:
>
> /[ D | - | main | 1 | May 10 14:34:02 ]
> java.io.IOException: sending RPC failed: server '/192.168.0.2:32640' not
> reachable (java.net.ConnectException: Connection refused: no further
> information)/
> / ...
> org.xtreemfs.foundation.pbrpc.client.RPCResponse.get(RPCResponse.java:69)/
> / ...
> org.xtreemfs.common.libxtreemfs.RPCCaller.syncCall(RPCCaller.java:95)/
> / ...
> org.xtreemfs.common.libxtreemfs.RPCCaller.syncCall(RPCCaller.java:67)/
> / ...
> org.xtreemfs.common.libxtreemfs.FileHandleImplementation.read(FileHandleImplementation.java:265)/
> / ...
> com.tarmin.xtreemfs.client.XFSInputStream.read(XFSInputStream.java:69)/
> / ...
> java.io.DataInputStream.read(DataInputStream.java:83)/
> / ...
> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)/
> / ...
> org.apache.commons.io.IOUtils.copy(IOUtils.java:999)/
> / ...
> com.tarmin.xtreemfs.client.DownloadClient.main(DownloadClient.java:35)/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> sending request
> org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@27972e3a no 882324/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> reconnect to server still blocked locally to avoid flooding (server:
> /192.168.0.2:32640)/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> sending request
> org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@365bf624 no 882325/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> reconnect to server still blocked locally to avoid flooding (server:
> /192.168.0.2:32640)/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> sending request
> org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@4f7cd15d no 882326/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> reconnect to server still blocked locally to avoid flooding (server:
> /192.168.0.2:32640)/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> sending request
> org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@b955970 no 882327/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> reconnect to server still blocked locally to avoid flooding (server:
> /192.168.0.2:32640)/
> /[ D | RPCNIOSocketClient | main | 1 | May 10 14:34:02 ]
> sending request
> org.xtreemfs.foundation.pbrpc.client.RPCClientRequest@7a22ce00 no 882328 /
> --
>
> ---
> You received this message because you are subscribed to the Google
> Groups "XtreemFS" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to xtreemfs+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
Reply all
Reply to author
Forward
0 new messages