SDFS Buckles Under Load (Windows 10 build 15014.rs_prerelease.170115-1253, x86-64)

161 views
Skip to first unread message

Tyson Key

unread,
Feb 13, 2017, 9:45:25 AM2/13/17
to dedupfilesystem-sdfs-user-discuss
Hi,

I started playing with OpenDedup, under a preview build of Windows 10, last night, and was able to induce several different failure modes, when testing under a relatively-modest load, on a Core2Duo laptop, with 4GB of RAM (which is not exactly state-of-the-art, to be honest), involving moving/copying about 5-10GB of files using Windows Explorer, whilst attempting to manipulate a local volume using sdfscli. 

The volume started life as 20GB, but I attempted to expand it to 25GB, only for Explorer to continue to misleadingly report it as 20GB - even after eventually manually editing the XML definition file to fake being cleanly unmounted, thanks to the integrity checking routines failing, post-crash, and leaving a nice big 200GB+ log file (which compresses to 500MB, using WinRAR - and would probably provide a nice postmortem, if anyone actually has the disk space, bandwidth, tools, willpower, and time to comb through it, without going insane; although I managed to read part of it, using a port of the UNIX "tail" command, and saw that an attempt to read an invalid hashtable ID was made).

Often, the sdfscli tool will simply stall, and eventually fail with the following output, after the server process seemingly just dies:


C
:\Windows\System32>sdfscli  --volume-info
java
.io.IOException: java.net.SocketTimeoutException: Read timed out
        at org
.opendedup.sdfs.mgmt.cli.MgmtServerConnection.getResponse(MgmtServerConnection.java:77)
        at org
.opendedup.sdfs.mgmt.cli.ProcessVolumeInfo.runCmd(ProcessVolumeInfo.java:19)
        at org
.opendedup.sdfs.mgmt.cli.SDFSCmdline.parseCmdLine(SDFSCmdline.java:124)
        at org
.opendedup.sdfs.mgmt.cli.SDFSCmdline.main(SDFSCmdline.java:578)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java
.net.SocketInputStream.socketRead0(Native Method)
        at java
.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java
.net.SocketInputStream.read(SocketInputStream.java:170)
        at java
.net.SocketInputStream.read(SocketInputStream.java:141)
        at sun
.security.ssl.InputRecord.readFully(InputRecord.java:465)
        at sun
.security.ssl.InputRecord.read(InputRecord.java:503)
        at sun
.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
        at sun
.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
        at sun
.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:735)
        at sun
.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
        at java
.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java
.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at org
.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(HttpConnection.java:828)
        at org
.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.flushRequestOutputStream(MultiThreadedHttpConnectionManager.java:1565)
        at org
.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2116)
        at org
.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at org
.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at org
.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at org
.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at org
.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at org
.opendedup.sdfs.mgmt.cli.MgmtServerConnection.getResponse(MgmtServerConnection.java:65)
       
... 3 more


At least once, I managed to hang Explorer, to the point where the process required restarting as Administrator; and seemingly got my volume into an inconsistent state, where the mount-time integrity check gets stuck at 98%, even if I leave it overnight (which meant that I stayed awake until 3am, and now feel like garbage!), but there doesn't seem to be much/any noticeable corruption, from randomly looking at files, now (although not all of the modification date metadata appears).

Despite actually having something approximating 11GB+ of data in the volume, sdfscli only reports the existence of newly-added stuff, post-crash:

C:\WINDOWS\system32>sdfscli  --volume-info
Files : 3548
Volume Capacity : 25 GB
Volume Current Logical Size : 7.35 GB
Volume Max Percentage Full : 95.0%
Volume Duplicate Data Written : 3.86 GB
Unique Blocks Stored: 4.34 GB
Unique Blocks Stored after Compression : 4.36 GB
Cluster Block Copies : 2
Volume Virtual Dedup Rate (Unique Blocks Stored/Current Size) : 40.95%
Volume Actual Storage Savings (Compressed Unique Blocks Stored/Current Size) : 40.75%
Compression Rate: -0.33%


Additionally, if the server/engine process dies, Windows Explorer will report "An unexpected error is keeping you from moving the file. If you continue to receive this error, you can use the error code to search for help with this problem. Error 0x800705AA: Insufficient system resources exist to complete the requested service".

Attaching one of my logs, and configuration file, if it helps.

Tyson.



BigStore-volume-cfg.zip
BigStore-volume-cfg.xml

Tyson Key

unread,
Feb 13, 2017, 10:12:50 AM2/13/17
to dedupfilesystem-sdfs-user-discuss
Managed to break it again, and got a 1GB+ log - which mostly has the following, at the end:

2017-02-13 14:53:51,574 [sdfs] [org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter] [125] [Thread-30]  - error getting string result for 1391719488346728699
java.io.IOException: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1235)
at org.opendedup.sdfs.filestore.HashBlobArchive.getStrings(HashBlobArchive.java:171)
at org.opendedup.sdfs.filestore.BatchFileChunkStore.getStringResult(BatchFileChunkStore.java:667)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.addStringResult(MultiDownload.java:103)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.access$1(MultiDownload.java:99)
at org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter.run(MultiDownload.java:122)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2354)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2324)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1217)
... 6 more
2017-02-13 14:53:51,575 [sdfs] [org.opendedup.sdfs.filestore.HashBlobArchive] [1234] [Thread-30]  - error getting size
com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2354)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2324)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1217)
at org.opendedup.sdfs.filestore.HashBlobArchive.getStrings(HashBlobArchive.java:171)
at org.opendedup.sdfs.filestore.BatchFileChunkStore.getStringResult(BatchFileChunkStore.java:667)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.addStringResult(MultiDownload.java:103)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.access$1(MultiDownload.java:99)
at org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter.run(MultiDownload.java:122)
at java.lang.Thread.run(Thread.java:745)
2017-02-13 14:53:51,575 [sdfs] [org.opendedup.sdfs.filestore.BatchFileChunkStore] [674] [Thread-30]  - unable to get strings for 1391719488346728699
java.io.IOException: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1235)
at org.opendedup.sdfs.filestore.HashBlobArchive.getStrings(HashBlobArchive.java:171)
at org.opendedup.sdfs.filestore.BatchFileChunkStore.getStringResult(BatchFileChunkStore.java:667)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.addStringResult(MultiDownload.java:103)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.access$1(MultiDownload.java:99)
at org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter.run(MultiDownload.java:122)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2354)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2324)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1217)
... 6 more
2017-02-13 14:53:51,575 [sdfs] [org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter] [125] [Thread-30]  - error getting string result for 1391719488346728699
java.io.IOException: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1235)
at org.opendedup.sdfs.filestore.HashBlobArchive.getStrings(HashBlobArchive.java:171)
at org.opendedup.sdfs.filestore.BatchFileChunkStore.getStringResult(BatchFileChunkStore.java:667)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.addStringResult(MultiDownload.java:103)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.access$1(MultiDownload.java:99)
at org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter.run(MultiDownload.java:122)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2354)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2324)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1217)
... 6 more
2017-02-13 14:53:51,576 [sdfs] [org.opendedup.sdfs.filestore.HashBlobArchive] [1234] [Thread-30]  - error getting size
com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2354)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2324)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1217)
at org.opendedup.sdfs.filestore.HashBlobArchive.getStrings(HashBlobArchive.java:171)
at org.opendedup.sdfs.filestore.BatchFileChunkStore.getStringResult(BatchFileChunkStore.java:667)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.addStringResult(MultiDownload.java:103)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.access$1(MultiDownload.java:99)
at org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter.run(MultiDownload.java:122)
at java.lang.Thread.run(Thread.java:745)
2017-02-13 14:53:51,576 [sdfs] [org.opendedup.sdfs.filestore.BatchFileChunkStore] [674] [Thread-30]  - unable to get strings for 1391719488346728699
java.io.IOException: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1235)
at org.opendedup.sdfs.filestore.HashBlobArchive.getStrings(HashBlobArchive.java:171)
at org.opendedup.sdfs.filestore.BatchFileChunkStore.getStringResult(BatchFileChunkStore.java:667)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.addStringResult(MultiDownload.java:103)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.access$1(MultiDownload.java:99)
at org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter.run(MultiDownload.java:122)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2354)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2324)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1217)
... 6 more
2017-02-13 14:53:51,576 [sdfs] [org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter] [125] [Thread-30]  - error getting string result for 1391719488346728699
java.io.IOException: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1235)
at org.opendedup.sdfs.filestore.HashBlobArchive.getStrings(HashBlobArchive.java:171)
at org.opendedup.sdfs.filestore.BatchFileChunkStore.getStringResult(BatchFileChunkStore.java:667)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.addStringResult(MultiDownload.java:103)
at org.opendedup.sdfs.filestore.cloud.MultiDownload.access$1(MultiDownload.java:99)
at org.opendedup.sdfs.filestore.cloud.MultiDownload$KeyGetter.run(MultiDownload.java:122)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key 1391719488346728699.
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2354)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2324)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.opendedup.sdfs.filestore.HashBlobArchive.getHashesString(HashBlobArchive.java:1217)
... 6 more


From sdfscli:

C:\Windows\System32>mountsdfs -v BigStore -m B
Running Program SDFS Version 3.3.6
reading config file = C:\Program Files\sdfs\etc\BigStore-volume-cfg.xml
Loading Existing Hash Tables |))))))))))))))))))))))))))))))))))))))))))))))))))| 100%

Loading BloomFilters |))))))))))))))))))))))))))))))))))))))))))))))))))| 100%

Waiting for last bloomfilters to load
ReCreating BloomFilters |))))))))))))))))))))))))))))))))))))))))))))))))))| 100%

ReCreating BloomFilters |))))))))))))))))))))))))))))))))))))))))))))))))))| 100%

Waiting for all BloomFilters creation threads to finish done
Loaded entries 1069726
Running Consistancy Check on DSE, this may take a while
Scanning DSE |)))))))))))))))))))))))))))))))))))))))))))))))]  | 95% ^C

Tyson.
Reply all
Reply to author
Forward
0 new messages