Hello again,
I have been able to finally get everything I needed working with XtreemFS going in some capacity or another, but my latest testing is with the SSL features. I followed the user guide to set up a very quick version where the server hosting all of the daemons is also the client. The problem is during the mkfs.xtreemfs step, it hangs for a bit before giving an error about the connection being reset:
root@win3:~# mkfs.xtreemfs -d DEBUG --pkcs12-file-path=/etc/xos/xtreemfs/truststore/certs/client.p12 --pkcs12-passphrase='*********' localhost/disk2
Trying to create the volume: localhost/disk2
Using options:
Mode: 777
Access Control Policy: POSIX
Default striping policy: RAID0
Default stripe size (object size): 128
Default stripe width (# OSDs): 1
[ D | 1/ 4 04:13:48.353 | 0x26bfae0 ] Created a new libxtreemfs Client object (version 1.4 (Salty Sticks))
[ I | 1/ 4 04:13:48.353 | 0x26bfae0 ] SSL support activated.
[ I | 1/ 4 04:13:48.353 | 0x26bfae0 ] SSL support using PKCS#12 file /etc/xos/xtreemfs/truststore/certs/client.p12
[ D | 1/ 4 04:13:48.356 | 0x26bfae0 ] tmp file name:/tmp/pmK9igS2 /tmp/ct6RT8Yw
[ D | 1/ 4 04:13:48.356 | 0x26b1260 ] Starting RPC client.
[ D | 1/ 4 04:13:48.356 | 0x26b1260 ] Running in SSL mode.
[ D | 1/ 4 04:13:48.357 | 0x26bfae0 ] Generated client UUID: 5ognN5GI-Dwfj-rBoW-ipQv-Y3X4ExtlHAz7
[ D | 1/ 4 04:13:48.357 | 0x26b1260 ] new connection for localhost:32636
[ D | 1/ 4 04:13:48.357 | 0x26b1260 ] connect timeout is 60 seconds
[ D | 1/ 4 04:13:48.357 | 0x26b1260 ] resolved: localhost
[ D | 1/ 4 04:14:48.361 | 0x26b1260 ] Connection reset, next reconnect in 0 seconds.
[ E | 1/ 4 04:14:48.362 | 0x26b1260 ] operation failed: call_id=1 errno=5 message=connection to 'localhost:32636' timed out
[ E | 1/ 4 04:14:48.362 | 0x26bfae0 ] The client encountered a communication error sending a request to the server: localhost:32636. Error: connection to 'localhost:32636' timed out
Failed to create the volume, error:
connection to 'localhost:32636' timed out
[ D | 1/ 4 04:14:48.362 | 0x26bfae0 ] RPC client stopped.
I should also note this crashes both the MRC and OSD services every time it is run. They become marked as crashed until I restart them.
root@win3:/var/log/xtreemfs# tail -n20 mrc.log osd.log
==> mrc.log <==
[ I | BabuDBImpl | DiskLogger | 23 | Jan 04 04:13:44 ] has been successfully started.
[ I | CheckpointerImpl | ChkptrThr | 22 | Jan 04 04:13:44 ] Thread ChkptrThr started
[ I | BabuDBImpl | ChkptrThr | 22 | Jan 04 04:13:44 ] has been successfully started.
[ I | BabuDBImpl | MRC | 1 | Jan 04 04:13:44 ] BabuDB for Java is running (version 0.5.6)
[ E | HeartbeatThread | MRC | 1 | Jan 04 04:17:29 ] an error occurred while initially contacting the Directory Service: java.io.IOException: Request finally failed after 15 tries.
[ E | MRCRequestDispatcher | MRC | 1 | Jan 04 04:17:29 ] STARTUP FAILED!
[ E | MRCRequestDispatcher | MRC | 1 | Jan 04 04:17:29 ] java.io.IOException: cannot initialize service at XtreemFS DIR: java.io.IOException: Request finally failed after 15 tries.
... org.xtreemfs.common.HeartbeatThread.initialize(HeartbeatThread.java:271)
... org.xtreemfs.mrc.MRCRequestDispatcher.startup(MRCRequestDispatcher.java:395)
... org.xtreemfs.mrc.MRC.<init>(MRC.java:39)
... org.xtreemfs.mrc.MRC.main(MRC.java:105)
[ E | MRCRequestDispatcher | MRC | 1 | Jan 04 04:17:29 ] root cause: java.io.IOException: Request finally failed after 15 tries.
... org.xtreemfs.dir.DIRClient.syncCall(DIRClient.java:406)
... org.xtreemfs.dir.DIRClient.xtreemfs_service_get_by_uuid(DIRClient.java:234)
... org.xtreemfs.dir.DIRClient.xtreemfs_service_get_by_uuid(DIRClient.java:228)
... org.xtreemfs.common.HeartbeatThread.registerServices(HeartbeatThread.java:339)
... org.xtreemfs.common.HeartbeatThread.initialize(HeartbeatThread.java:151)
... org.xtreemfs.mrc.MRCRequestDispatcher.startup(MRCRequestDispatcher.java:395)
... org.xtreemfs.mrc.MRC.<init>(MRC.java:39)
... org.xtreemfs.mrc.MRC.main(MRC.java:105)
==> osd.log <==
[ I | FleaseStage | FleaseSt | 25 | Jan 04 04:13:51 ] Flease (version 0.2.4 (trunk)) ready
[ I | RPCUDPSocketServer | UDPComStage | 14 | Jan 04 04:13:51 ] UDP socket on port 32640 ready
[ I | FleaseStage | FleaseSt | 25 | Jan 04 04:13:51 ] Thread FleaseSt started
[ I | RPCUDPSocketServer | UDPComStage | 14 | Jan 04 04:13:51 ] Thread UDPComStage started
[ E | HeartbeatThread | OSD | 1 | Jan 04 04:17:36 ] an error occurred while initially contacting the Directory Service: java.io.IOException: Request finally failed after 15 tries.
[ E | OSDRequestDispatcher | OSD | 1 | Jan 04 04:17:36 ] STARTUP FAILED!
[ E | OSDRequestDispatcher | OSD | 1 | Jan 04 04:17:36 ] java.io.IOException: cannot initialize service at XtreemFS DIR: java.io.IOException: Request finally failed after 15 tries.
... org.xtreemfs.common.HeartbeatThread.initialize(HeartbeatThread.java:271)
... org.xtreemfs.osd.OSDRequestDispatcher.start(OSDRequestDispatcher.java:504)
... org.xtreemfs.osd.OSD.<init>(OSD.java:32)
... org.xtreemfs.osd.OSD.main(OSD.java:102)
[ E | OSDRequestDispatcher | OSD | 1 | Jan 04 04:17:36 ] root cause: java.io.IOException: Request finally failed after 15 tries.
... org.xtreemfs.dir.DIRClient.syncCall(DIRClient.java:406)
... org.xtreemfs.dir.DIRClient.xtreemfs_service_get_by_uuid(DIRClient.java:234)
... org.xtreemfs.dir.DIRClient.xtreemfs_service_get_by_uuid(DIRClient.java:228)
... org.xtreemfs.common.HeartbeatThread.registerServices(HeartbeatThread.java:339)
... org.xtreemfs.common.HeartbeatThread.initialize(HeartbeatThread.java:151)
... org.xtreemfs.osd.OSDRequestDispatcher.start(OSDRequestDispatcher.java:504)
... org.xtreemfs.osd.OSD.<init>(OSD.java:32)
... org.xtreemfs.osd.OSD.main(OSD.java:102)
The instructions were followed nearly verbatim (I used the same internal password throughout) so all service configs have the following lines:
ssl.enabled = true
ssl.service_creds.pw = ********
ssl.service_creds.container = pkcs12
ssl.service_creds = /etc/xos/xtreemfs/truststore/certs/SERVICE.p12
ssl.trusted_certs = /etc/xos/xtreemfs/truststore/certs/trusted.jks
ssl.trusted_certs.pw = *********
ssl.trusted_certs.container = jks
The errors from the best I can see just seem to indicate that the MRC and OSD services have an issue communicating with the DIR service. In trying to connect a number of times, they eventually fail and crash. Has this behavior been observed before? Any ideas on where I should go from here?
Current Setup:
Debian 6 64-bit
2.6.32-5-amd64