I'm unable to read written data

Andrey

unread,

Sep 14, 2016, 12:13:33 PM9/14/16

to dedupfilesystem-sdfs-user-discuss

hi
I have some troubles reading data back from volume, for that particular file I can only read about 29% (and 54% on the next attempt)
I didn't see any exceptions on data write operations

my volume config is:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><subsystem-config version="3.2.1">
<locations dedup-db-store="/sdfs/volumes/awspool/ddb" io-log="/sdfs/volumes/awspool/ioperf.log"/>
<io chunk-size="1024" claim-hash-schedule="0 59 23 * * ?" dedup-files="true" hash-type="VARIABLE_MURMUR3" log-level="1" max-file-inactive="900" max-file-write-buffers="1" max-open-files="1024" max-variable-segment-size="32" meta-file-cache="1024" read-ahead="true" safe-close="true" safe-sync="true" write-threads="16"/>
<permissions default-file="0644" default-folder="0755" default-group="0" default-owner="0"/>

<sdfscli enable="true" enable-auth="false" listen-address="localhost" password="****" port="6442" salt="****"/>
<local-chunkstore allocation-size="54975581388800" average-chunk-size="8192" chunk-store="/sdfs/volumes/awspool/chunkstore/chunks" chunkstore-class="org.opendedup.sdfs.filestore.BatchFileChunkStore" cluster-config="/etc/sdfs/jgroups.cfg.xml" cluster-dse-password="****" cluster-id="sdfs-cluster" compress="true" enabled="true" encrypt="false" encryption-iv="****" encryption-key="******"org.opendedup.sdfs.filestore.gc.PFullGC" hash-db-store="/sdfs/volumes/awspool/chunkstore/hdb-1303545904338133879" hashdb-class="org.opendedup.collections.ProgressiveFileBasedCSMap" io-threads="8" low-memory="false" max-repl-batch-sz="128">
<network enable="false" hostname="0.0.0.0" port="2222" use-ssl="false"/>
<extended-config allow-sync="false" block-size="1024 MB" default-bucket-location="us-east-1" delete-unclaimed="true" glacier-archive-days="0" io-threads="16" local-cache-size="100 GB" map-cache-size="200" read-speed="0" simple-s3="false" sync-check-schedule="4 59 23 * * ?" sync-files="true" upload-thread-sleep-time="600000" write-speed="0"/>
<aws aws-access-key="****" aws-aim="false" aws-bucket-name="*****" aws-secret-key="****" chunkstore-class="org.opendedup.sdfs.filestore.cloud.BatchAwsS3ChunkStore" enabled="true"/>
</local-chunkstore>
<volume allow-external-links="true" capacity="50 TB" closed-gracefully="false" cluster-block-copies="2" cluster-id="sdfs-cluster" cluster-rack-aware="false" cluster-response-timeout="1024000" current-size="0" dse-comp-size="0" dse-size="0" duplicate-bytes="0" maximum-percentage-full="0.95" name="awspool" path="/sdfs/volumes/awspool/files" perf-mon-file="/sdfs//logs/volume-awspool-perf.json" read-bytes="0.0" read-timeout-seconds="-1" serial-number="1303545904338133879" sync-files="false" use-dse-capacity="true" use-dse-size="true" use-perf-mon="false" volume-clustered="false" write-bytes="0" write-timeout-seconds="-1"/></subsystem-config>

In logs I see

2016-09-14 10:04:49,863 [sdfs] [org.opendedup.collections.ProgressiveFileByteArrayLongMap] [347] [pool-1-thread-1] - sz=89478407 maxSz=67108805
2016-09-14 10:04:49,865 [sdfs] [org.opendedup.collections.ProgressiveFileByteArrayLongMap] [352] [pool-1-thread-1] - set table to size 2147481768
2016-09-14 10:04:49,893 [sdfs] [org.opendedup.collections.ProgressiveFileByteArrayLongMap] [467] [pool-1-thread-1] - Percentage full=0.0 full=false
2016-09-14 12:53:12,613 [sdfs] [org.opendedup.sdfs.filestore.HashBlobArchive$2] [271] [pool-5-thread-3] - unable to fetch hashmap [7620927707499785126]
java.lang.Exception: unable to find file /sdfs/volumes/awspool/chunkstore/chunks/762/7620927707499785126
        at org.opendedup.sdfs.filestore.HashBlobArchive$2.load(HashBlobArchive.java:268)
        at org.opendedup.sdfs.filestore.HashBlobArchive$2.load(HashBlobArchive.java:1)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
...
2016-09-14 12:53:12,768 [sdfs] [org.opendedup.sdfs.filestore.HashBlobArchive] [1140] [pool-5-thread-16] - unable to read at 967986517 0 flen 0 file=/sdfs/volumes/awspool/chunkstore/chunks/762/7620927707499785126
java.util.concurrent.ExecutionException: java.io.IOException: unable to read 7620927707499785126
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:455)
        at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
        at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:143)
        at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3573)
        at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2306)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2195)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
        at org.opendedup.sdfs.filestore.HashBlobArchive.getChunk(HashBlobArchive.java:1101)
        at org.opendedup.sdfs.filestore.HashBlobArchive.getBlock(HashBlobArchive.java:658)
        at org.opendedup.sdfs.filestore.cloud.BatchAwsS3ChunkStore.getChunk(BatchAwsS3ChunkStore.java:255)
        at org.opendedup.sdfs.filestore.ChunkData.getChunk(ChunkData.java:198)
        at org.opendedup.collections.ProgressiveFileBasedCSMap.getData(ProgressiveFileBasedCSMap.java:767)
        at org.opendedup.sdfs.filestore.HashStore.getHashChunk(HashStore.java:216)
        at org.opendedup.sdfs.servers.HashChunkService.fetchChunk(HashChunkService.java:155)
        at org.opendedup.sdfs.servers.HCServiceProxy.fetchChunk(HCServiceProxy.java:588)
        at org.opendedup.sdfs.io.WritableCacheBuffer$Shard.run(WritableCacheBuffer.java:1190)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: unable to read 7620927707499785126
        at org.opendedup.sdfs.filestore.HashBlobArchive$2.load(HashBlobArchive.java:274)
        at org.opendedup.sdfs.filestore.HashBlobArchive$2.load(HashBlobArchive.java:1)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)

later in logs these messages just repeat like this:
2016-09-14 16:09:08,730 [sdfs] [org.opendedup.sdfs.filestore.HashBlobArchive] [1140] [pool-5-thread-16] - unable to read at 1475854 0 flen 0 file=/sdfs/volumes/awspool/chunkstore/chunks/-826/-8268176452249939125
java.util.concurrent.ExecutionException: java.io.IOException: unable to read -8268176452249939125

next try I have got 54% of file (I'm scanning file sequentially)

are there any way to deal with it? or are parameters for volume bad?

Sam Silverberg

unread,

Sep 14, 2016, 1:07:09 PM9/14/16

to dedupfilesystem-...@googlegroups.com

I have not tested with that large of a block size. I think that might be the issue. Is there a reason you set the block size to 1024MB? Is this a test system or production data?

I would create a new sdfs volume and use a block size of 60MB-100MB.

--
You received this message because you are subscribed to the Google Groups "dedupfilesystem-sdfs-user-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedupfilesystem-sdfs-user-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrey

unread,

Sep 14, 2016, 1:24:06 PM9/14/16

to dedupfilesystem-sdfs-user-discuss

>> Is there a reason you set the block size to 1024MB?

I found out that maximum block size is (2048Mb - 1, with large block size system is not able to operate) so I used half of that size

What I'm trying to achieve is to have as small amount of blocks on s3 as possible,
and also want blocks to be full, so I set upload-thread-wait to 5min,

>> I would create a new sdfs volume and use a block size of 60MB-100MB.

100Mb is small block for us, as it will result in 10 times more objects on s3. My tests with smaller blocks (50M) were successful (with smaller files)
initally I wanted to use 2Gb blocks, but that failed

>> Is this a test system or production data?

This is something like final testing of our system, so this is test, we can loose all that data

Also might be the reason, that we have big files (some files are 1T large)?

>> I have not tested with that large of a block size.

Oh, ok. Do you plan to have that kind of testing in future?

Sam Silverberg

unread,

Sep 14, 2016, 1:42:38 PM9/14/16

to dedupfilesystem-...@googlegroups.com

I think the issue with the large blocks is that it's 1/100th of the cache so its aging the local block off before the io is done using it. If the local cache was 1tb this might not be an issue. You might want to try that or just using smaller blocks.

Andrey

unread,

Sep 16, 2016, 1:23:29 AM9/16/16

to dedupfilesystem-sdfs-user-discuss

hi,
that actually worked, with cache of 1T I was able to restore from that volume, which is cool :) Thank you!

I have two more questions for you:

1. every day I see NPE, this looks like some scheduled task fails
2016-09-16 02:41:08,545 [sdfs] [org.opendedup.collections.ProgressiveFileBasedCSMap] [323] [QuartzScheduler_Worker-1] - removed [0] records
2016-09-16 02:41:08,588 [sdfs] [org.opendedup.mtools.SyncFS] [108] [QuartzScheduler_Worker-1] - Cloud Storage Conistancy Check failed
java.lang.NullPointerException
        at org.opendedup.mtools.SyncFS.init(SyncFS.java:77)
        at org.opendedup.mtools.SyncFS.<init>(SyncFS.java:59)
        at org.opendedup.fsync.GCJob.execute(GCJob.java:32)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
2016-09-16 02:41:08,593 [sdfs] [org.opendedup.fsync.GCJob] [34] [QuartzScheduler_Worker-1] - SyncFS Job Failed
java.lang.NullPointerException
        at org.opendedup.mtools.SyncFS.init(SyncFS.java:109)
        at org.opendedup.mtools.SyncFS.<init>(SyncFS.java:59)
        at org.opendedup.fsync.GCJob.execute(GCJob.java:32)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
is it ok?

2. And another question
I tried to build sdfs from source but failed,
a) jfuse/src/build.xml - is outdated, I've changed to source/target to 1.8, it was 1.5 and compilation had failed
b) jfuse/src/jni/javafs.c - cannot be compiled, because javafs_destroy(void *mt) has no env variable defined, I made some patch to pass it
c) as I found, earlier you mentioned https://sourceforge.net/projects/fuse-j/ as external dep, but that project is outdated,
   so I have got several errors on build: "duplicate member `array`", and has "no member named ‘read__Ljava_nio_ByteBuffer_J_Ljava_nio_ByteBuffer_J’"
Could you please provide any hints to me on those things?

Reply all

Reply to author

Forward