SDFS 2.0 RC2 Released

188 views
Skip to first unread message

Sam Silverberg

unread,
Mar 28, 2014, 8:31:44 PM3/28/14
to dedupfilesystem-...@googlegroups.com
Fixes Based of feedback from users.


Fixes :
* Fixed Redhat/Centos 6.5 Library dependency issue
* Fixed write failures due to slow disk subsystem. This issue only effects Variable Block Deduplication
* Fixed slow performance for large volumes due to the impact of bitmap scans for every put
* Fixed sync error loop in FileStore
* Fixed replication script to point to updated libraries.
* Fixed  Rabin hashing memory allocation performance issue.

Tom Klein

unread,
Mar 29, 2014, 6:58:36 PM3/29/14
to dedupfilesystem-...@googlegroups.com
Will give it a go on Monday and see if it fixes the Replication issue I had before with RC1.

Cheers!

Svein-Erik Lund

unread,
Mar 29, 2014, 8:19:37 PM3/29/14
to dedupfilesystem-...@googlegroups.com


On Saturday, March 29, 2014 1:31:44 AM UTC+1, Sam Silverberg wrote:
 
Fixes :
* Fixed replication script to point to updated libraries.

I'm looking at Opendedup for a project where I need replication, and to my dissapointment the replication failed both on the RC1 version and with the RC2 version. 
I've attached the error message I get when testing with the web UI. Am I missing something, or isn't the replication feature in the webUI working?

Also I think it would be a good thing to provide a copy of the webui for use on bare metal installations with some documentation on how to set it up. 



Replication
error
vol02
opendedup02.XXXXXXXXXX
Unable to complete replication task for Hourly because java.lang.NumberFormatException: For input string: ""
100%
79A7076D-B684-559D-C0F6-5A8612E2FD5A
03/30/2014 01:04:07
03/30/2014 01:04:07
 

Sam Silverberg

unread,
Mar 30, 2014, 1:49:35 PM3/30/14
to dedupfilesystem-...@googlegroups.com
Sven,

2.0rc2 is not yet compatible with the virtual appliance. You will need to run replication from the command line.
--
You received this message because you are subscribed to the Google Groups "dedupfilesystem-sdfs-user-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedupfilesystem-sdfs-u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tom Klein

unread,
Apr 2, 2014, 10:43:09 AM4/2/14
to dedupfilesystem-...@googlegroups.com
Replication is working fine now since RC2.

Cheers! 

Sam Silverberg

unread,
Apr 2, 2014, 11:03:34 AM4/2/14
to dedupfilesystem-...@googlegroups.com
Thanks for the feedback.


On Wed, Apr 2, 2014 at 7:43 AM, Tom Klein <xer...@gmail.com> wrote:
Replication is working fine now since RC2.

Cheers! 

--

Tom Klein

unread,
Apr 3, 2014, 3:42:09 AM4/3/14
to dedupfilesystem-...@googlegroups.com
Hi Sam,

One more thing on the replication service.

I think the program is not closing the connection properly on the localhost (slave)
After a replication is done I have allot of these connection in the netstat that are waiting for the connection to close.

tcp        1      0 127.0.0.1:44213         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:42439         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:51398         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:48714         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:33229         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:51420         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:41333         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:54479         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:52496         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:33172         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:36006         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:48965         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:40890         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:40328         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:43614         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:57525         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:42678         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:47053         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:33044         127.0.0.1:6442          CLOSE_WAIT

I think this would cause this error which I found in the logs when the replication failed.

2014-04-03 04:00:00,004 [QuartzScheduler_Worker-1] INFO sdfs  - replicating 10.132.96.215:6442:/ to localhost:6442:Replica-%d
2014-04-03 04:00:00,009 [QuartzScheduler_Worker-1] INFO sdfs  - Will keep [ALL] copies of the replicated folder [/]
2014-04-03 04:00:00,009 [QuartzScheduler_Worker-1] WARN sdfs  - unable to finish executing replication
java.io.IOException: java.net.SocketException: Too many open files
        at org.opendedup.sdfs.mgmt.cli.MgmtServerConnection.getResponse(MgmtServerConnection.java:39)
        at org.opendedup.sdfs.mgmt.cli.ProcessArchiveOutCmd.runCmd(ProcessArchiveOutCmd.java:24)
        at org.opendedup.sdfs.replication.ReplicationService.getRemoteArchive(ReplicationService.java:245)
        at org.opendedup.sdfs.replication.ReplicationService.replicate(ReplicationService.java:143)
        at org.opendedup.sdfs.replication.ReplicationJob.execute(ReplicationJob.java:15)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Caused by: java.net.SocketException: Too many open files
        at java.net.Socket.createImpl(Socket.java:447)
        at java.net.Socket.getImpl(Socket.java:510)
        at java.net.Socket.bind(Socket.java:631)
        at sun.security.ssl.BaseSSLSocketImpl.bind(BaseSSLSocketImpl.java:114)
        at sun.security.ssl.SSLSocketImpl.bind(SSLSocketImpl.java:65)
        at sun.security.ssl.SSLSocketImpl.<init>(SSLSocketImpl.java:447)
        at sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:140)
        at org.apache.commons.httpclient.contrib.ssl.EasySSLProtocolSocketFactory.createSocket(EasySSLProtocolSocketFactory.java:188)
        at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
        at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
        at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at org.opendedup.sdfs.mgmt.cli.MgmtServerConnection.connectAndGet(MgmtServerConnection.java:65)
        at org.opendedup.sdfs.mgmt.cli.MgmtServerConnection.connectAndGet(MgmtServerConnection.java:45)
        at org.opendedup.sdfs.mgmt.cli.MgmtServerConnection.getResponse(MgmtServerConnection.java:35)
        ... 6 more

Sam Silverberg

unread,
Apr 4, 2014, 11:24:20 AM4/4/14
to dedupfilesystem-...@googlegroups.com
Thanks tom,

I just fixed this error for the next release. In the mean time, I would do the following and then reboot the system with these changes

echo "* hardnofile 65535" >> /etc/security/limits.conf
echo "* soft nofile 65535" >> /etc/security/limits.conf


Tom Klein

unread,
Apr 4, 2014, 4:35:10 PM4/4/14
to dedupfilesystem-...@googlegroups.com
Cheers!

I've made my own Quickstart guide which I normally copy and paste so I'm quite sure I've put in the limits but I'll make sure its there.

Have a great weekend!  

Tom Klein

unread,
Apr 7, 2014, 4:09:49 AM4/7/14
to dedupfilesystem-...@googlegroups.com
Hi Sam,

I've checked the limits and those entries were there. 

Sam Silverberg

unread,
Apr 7, 2014, 1:58:53 PM4/7/14
to dedupfilesystem-...@googlegroups.com
Tom,

I will send you a fix shortly. It would be great if you could validate it before the next release.


On Monday, April 7, 2014, Tom Klein <xer...@gmail.com> wrote:
Hi Sam,

I've checked the limits and those entries were there. 

--

Tom Klein

unread,
Apr 8, 2014, 4:56:20 AM4/8/14
to dedupfilesystem-...@googlegroups.com
Cheers! 

I'll keep checking the discussion group. 

Sam Silverberg

unread,
Apr 10, 2014, 8:28:14 PM4/10/14
to dedupfilesystem-...@googlegroups.com
Here is the fixed code. Please validate that it fixes your replication issue.

copy it to /usr/share/sdfs/lib/



On Tue, Apr 8, 2014 at 1:56 AM, Tom Klein <xer...@gmail.com> wrote:
Cheers! 

I'll keep checking the discussion group. 

--
sdfs.jar

Tom Klein

unread,
Apr 11, 2014, 9:04:18 AM4/11/14
to dedupfilesystem-...@googlegroups.com
Thanks Sam!

Will give it a go today and see if it resolves the issue. 

Tom Klein

unread,
Apr 12, 2014, 8:29:47 AM4/12/14
to dedupfilesystem-...@googlegroups.com
Still the same I'm afraid.

tcp        1      0 127.0.0.1:46867         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:51365         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:56057         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:60525         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:51028         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:55383         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:35009         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:36369         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:39412         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:45538         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:35366         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:37813         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:33524         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:36292         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:43490         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:58049         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:57701         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:45715         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:58240         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:55910         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:50456         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:37068         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:56585         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:60182         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:49022         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:50432         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:55083         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:46765         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:42328         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:34117         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:45304         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:56703         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:47551         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:59921         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:33150         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:38031         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:33300         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:46366         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:53183         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:46773         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:55500         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:39980         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:49085         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:44159         127.0.0.1:6442          CLOSE_WAIT

tcp        1      0 127.0.0.1:51072         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:54625         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:51310         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:38288         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:53163         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:42786         127.0.0.1:6442          CLOSE_WAIT
tcp        1      0 127.0.0.1:32831         127.0.0.1:6442          CLOSE_WAIT

Tom Klein

unread,
Apr 12, 2014, 8:33:04 AM4/12/14
to dedupfilesystem-...@googlegroups.com
After killing the replication service the close waits are all gone.

Sam Silverberg

unread,
Apr 12, 2014, 10:27:37 AM4/12/14
to dedupfilesystem-...@googlegroups.com
You will need to remount both sides as well.


On Saturday, April 12, 2014, Tom Klein <xer...@gmail.com> wrote:
After killing the replication service the close waits are all gone.

--

Tom Klein

unread,
Apr 12, 2014, 5:43:20 PM4/12/14
to dedupfilesystem-...@googlegroups.com
Yes gave it full reboot after the patch so its been remounted. 

Sam Silverberg

unread,
Apr 14, 2014, 12:58:20 PM4/14/14
to dedupfilesystem-...@googlegroups.com
Tom,

I will take another look at this. I tested it and was able to get past the too many open files issue but still had some open files. Are you still getting the too many open files errors?


On Sat, Apr 12, 2014 at 2:43 PM, Tom Klein <xer...@gmail.com> wrote:
Yes gave it full reboot after the patch so its been remounted. 

--

Tom Klein

unread,
Apr 15, 2014, 4:20:36 AM4/15/14
to dedupfilesystem-...@googlegroups.com
Hi Sam,

No I haven't seen that anymore after the patch.
Only allot of connections with wait close.

Cheers,

Tom 

Tom Klein

unread,
Apr 17, 2014, 4:07:52 AM4/17/14
to dedupfilesystem-...@googlegroups.com
Hi Sam,

The error came back.

java.io.IOException: java.net.SocketException: Too many open files
        at org.opendedup.sdfs.mgmt.cli.MgmtServerConnection.getResponse(MgmtServerConnection.java:63)
        at org.opendedup.sdfs.mgmt.cli.ProcessArchiveOutCmd.runCmd(ProcessArchiveOutCmd.java:25)

        at org.opendedup.sdfs.replication.ReplicationService.getRemoteArchive(ReplicationService.java:245)
        at org.opendedup.sdfs.replication.ReplicationService.replicate(ReplicationService.java:143)
        at org.opendedup.sdfs.replication.ReplicationJob.execute(ReplicationJob.java:15)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Caused by: java.net.SocketException: Too many open files
        at java.net.Socket.createImpl(Socket.java:447)
        at java.net.Socket.getImpl(Socket.java:510)
        at java.net.Socket.bind(Socket.java:631)
        at sun.security.ssl.BaseSSLSocketImpl.bind(BaseSSLSocketImpl.java:114)
        at sun.security.ssl.SSLSocketImpl.bind(SSLSocketImpl.java:65)
        at sun.security.ssl.SSLSocketImpl.<init>(SSLSocketImpl.java:447)
        at sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:140)
        at org.apache.commons.httpclient.contrib.ssl.EasySSLProtocolSocketFactory.createSocket(EasySSLProtocolSocketFactory.java:188)
        at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
        at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
        at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at org.opendedup.sdfs.mgmt.cli.MgmtServerConnection.getResponse(MgmtServerConnection.java:51)
        ... 6 more

The replication service is not closing its connections it seems.
After I just kill the replication service it works fine again.
No reboot needed. 

Tom Klein

unread,
Apr 17, 2014, 6:07:02 AM4/17/14
to dedupfilesystem-...@googlegroups.com
For now I found a little workaround to kill the process and start it again via cron so I don't need to restart the machine to clear out the connections.

I just run the following from a script.

jps -l | grep org.opendedup.sdfs.replication.ReplicationService | cut -d ' ' -f 1 | xargs -rn1 kill
/sbin/sdfsreplicate /etc/sdfs/replication.replica.props &

Sam Silverberg

unread,
Apr 17, 2014, 4:31:03 PM4/17/14
to dedupfilesystem-...@googlegroups.com
 I have another fix and will email it to you later today.

Sam Silverberg

unread,
Apr 17, 2014, 6:17:26 PM4/17/14
to dedupfilesystem-...@googlegroups.com
Tom,

This will fix the issue permanently. Thanks for the feedback!
sdfs.jar

Tom Klein

unread,
Apr 18, 2014, 7:58:47 PM4/18/14
to dedupfilesystem-...@googlegroups.com
Thanks Sam!
Will give this a try on Tuesday because of the Easter weekend.

Have nice Easter.

Sam Silverberg

unread,
Apr 18, 2014, 8:10:41 PM4/18/14
to dedupfilesystem-...@googlegroups.com
Thanks, I do not see a proliferation of connections so I think you are good to go.

Have a good Easter as well.


Tom Klein

unread,
Apr 24, 2014, 4:16:27 AM4/24/14
to dedupfilesystem-...@googlegroups.com
Hi Sam,

I've installed the final release and this resolved the issue with the connections being on close wait.
No restarts needed anymore.

Great job!

Thanks! 

Bury Huang

unread,
May 1, 2014, 11:07:15 AM5/1/14
to dedupfilesystem-...@googlegroups.com
Hi Sam,
Can I know if this fix is included in the download? I seem to have similar replication issue, and it's hanging (or super slow), but would like to try ur latest code to see if things improve.

Thanks,
Bury

Sam Silverberg

unread,
May 1, 2014, 11:12:26 AM5/1/14
to dedupfilesystem-...@googlegroups.com
The latest code fixes this issue.
Reply all
Reply to author
Forward
0 new messages