fuse: fork: Cannot allocate memory

1,135 views
Skip to first unread message

Ian P. Christian

unread,
Sep 7, 2011, 8:46:23 AM9/7/11
to dedupfilesystem-sdfs-user-discuss
When trying to do a mount, I get an error from fuse saying it can't
allocate memory. I don't think this is related to how much memory I'm
giving Java, but I'm not sure what is causing it.

Here's the full output:


root@thq-vmstore01:/usr/share/sdfs# mount.sdfs -v esxbackup -m /store/
backups
Running SDFS Version 1.0.9
reading config file = /etc/sdfs/esxbackup-volume-cfg.xml
Loading
########################################################################################################################################################
Running Consistancy Check on DSE, this may take a while
Scanning DSE Finished
Succesfully Ran Consistance Check for [0] records, recovered [0]
22:52:49.547 main INFO [fuse.FuseMount]: Mounted filesystem
fuse: fork: Cannot allocate memory
22:52:49.776 main INFO [fuse.FuseMount]: Filesystem is unmounted


I'm not getting anything in syslog, and as far as I can tell, nothing
is entering swap.
It's a 4kb chunk, 1000GB volume, and I've given the mount script to
have -Xmx12g. The server itself only has 10Gig of RAM, and 2Gig swap,
but I upped that Xmx from 8 to 12 just to see what would happen -
apparently the result is the same though.

Any suggestions here? How can I go about getting more debug
information?

Sam Silverberg

unread,
Sep 7, 2011, 11:24:12 AM9/7/11
to dedupfilesystem-...@googlegroups.com
Ian,

I am not sure exactly what the issue is but I suspect its related to memory. Can you send me your sdfs config so I can test it here?

Thanks,

Sam

Ian P. Christian

unread,
Sep 7, 2011, 1:16:10 PM9/7/11
to dedupfilesystem-sdfs-user-discuss
I stupidly trashed the data after posting, and re-created it again.

The reason I did this is because I suspected I might have broken
things - as earlier that volume died due to lack of memory (the OOM
killer killed Java). I assumed this might have caused a strange edge
case bug, so I thought I'd re-try.

So far, so good - I'm copying things over onto it again.

Ian P. Christian

unread,
Sep 8, 2011, 5:56:07 AM9/8/11
to dedupfilesystem-sdfs-user-discuss
I've managed to re-create a similar problem I believe doing the same thing.

If I have a 1000GB volume, when does it become full? When there's
1000GB written to it (and therefore df shows it as full), or once
there has been 1000GB of unique chunks written?

I assumed it's the latter - and tried to rsync a large amount of data
onto the volume. The rsync died about 8 hours in:

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]:
Broken pipe (32)
rsync: write failed on
"/store/backups/backup/esx06/orac-slave-new/orac-slave-new-2011-09-02_13-19-01/oral-slave-new-flat.vmdk":
Software caused connection abort (103)
rsync: stat "/store/backups/backup/esx06/orac-slave-new/orac-slave-new-2011-09-02_13-19-01/.oral-slave-new-flat.vmdk.JCgfIi"
failed: Transport endpoint is not connected (107)
rsync: connection unexpectedly closed (3121 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at
io.c(601) [sender=3.0.7]

This is what was show in the console containing the mount command:

/sbin/mount.sdfs: line 4: 6367 Killed
/usr/share/sdfs/jre1.7.0/bin/java
-Djava.library.path=/usr/share/sdfs/bin/
-Dorg.apache.commons.logging.Log=fuse.logging.FuseLog
-Dfuse.logging.level=INFO -Xmx12g -Xms2g -server -XX:+UseG1GC
-XX:+UseCompressedOops -classpath
/usr/share/sdfs/lib/jacksum.jar:/usr/share/sdfs/lib/trove-3.0.0a3.jar:/usr/share/sdfs/lib/slf4j-api-1.5.10.jar:/usr/share/sdfs/lib/slf4j-log4j12-1.5.10.jar:/usr/share/sdfs/lib/quartz-1.8.3.jar:/usr/share/sdfs/lib/commons-collections-3.2.1.jar:/usr/share/sdfs/lib/log4j-1.2.15.jar:/usr/share/sdfs/lib/jdbm.jar:/usr/share/sdfs/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/sdfs/lib/bcprov-jdk16-143.jar:~/java_api/sdfs-bin/lib/commons-codec-1.3.jar:/usr/share/sdfs/lib/commons-httpclient-3.1.jar:/usr/share/sdfs/lib/commons-logging-1.1.1.jar:/usr/share/sdfs/lib/commons-codec-1.3.jar:/usr/share/sdfs/lib/java-xmlbuilder-1.jar:/usr/share/sdfs/lib/jets3t-0.7.4.jar:/usr/share/sdfs/lib/commons-cli-1.2.jar:/usr/share/sdfs/lib/simple-4.1.21.jar:/usr/share/sdfs/lib/jdokan.jar:/usr/share/sdfs/lib/commons-io-1.4.jar:/usr/share/sdfs/lib/sdfs.jar
fuse.SDFS.MountSDFS $*

and in kern.log, I can see it was killed by the OOM killer again:

Sep 8 00:52:49 thq-vmstore01 kernel: [221022.054324] Out of memory:
Kill process 6367 (java) score 1000 or sacrifice child
Sep 8 00:52:49 thq-vmstore01 kernel: [221022.054473] Killed process
6367 (java) total-vm:20096772kB, anon-rss:9707900kB, file-rss:0kB


The first attempt to remount is resulted in:

root@thq-vmstore01:/usr/share/sdfs# mount.sdfs -v backup2 -m /store/backups
Running SDFS Version 1.0.9
reading config file = /etc/sdfs/backup2-volume-cfg.xml
Loading #######################################################################################################################################################


Running Consistancy Check on DSE, this may take a while
Scanning DSE Finished
Succesfully Ran Consistance Check for [0] records, recovered [0]

09:36:56.211 main INFO [fuse.FuseMount]: Mounted filesystem
fuse: bad mount point `/store/backups': Transport endpoint is not connected
09:36:56.235 main INFO [fuse.FuseMount]: Filesystem is unmounted

the 2nd:

root@thq-vmstore01:/usr/share/sdfs# mount.sdfs -v backup2 -m /store/backups
Running SDFS Version 1.0.9
reading config file = /etc/sdfs/backup2-volume-cfg.xml
Loading #######################################################################################################################################################
09:50:15.522 main INFO [fuse.FuseMount]: Mounted filesystem


fuse: fork: Cannot allocate memory

09:50:15.756 main INFO [fuse.FuseMount]: Filesystem is unmounted


Attached is the XML config file. Let me know if I can do anything
else to help!

Thanks,


On 7 September 2011 18:16, Ian P. Christian <poo...@pookey.co.uk> wrote:
> I stupidly trashed the data after posting, and re-created it again.

--
Blog: http://pookey.co.uk/blog
Follow me on twitter: http://twitter.com/ipchristian

backup2-volume-cfg.xml

Sam Silverberg

unread,
Sep 8, 2011, 10:50:32 AM9/8/11
to dedupfilesystem-...@googlegroups.com
Ian,

I think your system might not be capable of using that much memory. You mentioned before that the total system memory was 10 + 2 swap. In the above output it looks like you are trying to allocate 12 to java.

For a 1 TB volume you will need, at most 8 GB of memory. Try editing the mount.sdfs and change -Xmx to 8. Edit the script to this -Xmx8000m -Xmn1600m.

If that does not work reduce the size of the local-chunkstore from allocation-size="1073741824000" to allocation-size="805306368000" and change your -Xmx7000m -Xmn1600m

It looks like you are getting about 85% deduplication rates and are at most using 300GB of space for 2TB of data.

Ian P. Christian

unread,
Sep 8, 2011, 4:22:11 PM9/8/11
to dedupfilesystem-...@googlegroups.com
The first suggestion dind't fix it - but the 2nd did.

Thanks very much!

However - what does this mean exactly? What are the effects of this change?

Thanks again

Reply all
Reply to author
Forward
0 new messages