I had two short questions about the fourth part of generation of reference files.
First: When I run this command I got strange results:
module load Java/1.7.0_21
/home/ulg/genan/vhahaut/JAFFA-version-1.06/tools/bin/reformat fastawrap=0 in=JAFFA-cow/bosTau6.fa out=stdout.fa | sed 's/ /__/g' > bosTau6_fixed.fa
ModuleCmd_Load.c(208):ERROR:105: Unable to locate a modulefile for 'oracle-jdk/1.7_64bit'
ModuleCmd_Load.c(208):ERROR:105: Unable to locate a modulefile for 'pigz'
ModuleCmd_Load.c(208):ERROR:105: Unable to locate a modulefile for 'samtools'
java -ea -Xmx200m -cp /home/ulg/genan/vhahaut/JAFFA-version-1.06/tools/bbmap/current/ jgi.ReformatReads fastawrap=0 in=JAFFA-cow/bosTau6.fa out=stdout.fa
Executing jgi.ReformatReads [fastawrap=0, in=JAFFA-cow/bosTau6.fa, out=stdout.fa]
Input is being processed as unpaired
Exception in thread "Thread-1" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at stream.FastaReadInputStream.fillBuffer(FastaReadInputStream.java:442)
at stream.FastaReadInputStream.nextHeader(FastaReadInputStream.java:310)
at stream.FastaReadInputStream.fillList(FastaReadInputStream.java:205)
at stream.FastaReadInputStream.hasMore(FastaReadInputStream.java:137)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:745)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:737)
Is it expected? (The command line seems to take a long time to finish, already 4h)
The second is concerning the sed part of this command line, is it really sed 's/*space*/*underscore underscore*/g' or am I missing something?
I am using:
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
Linux Redhat 2.6.32-504.30.3.el6.x86_64
Thanks in advance!
Vincent
Hi Vincent,
The sed part of the command is okay, I think. It looks like Java is running out of virtual memory when the reformat command is executed. On a normal sized genome this command should not take longer than a minute really. I think you should kill the job and check your memory settings for Java. You might need to set the heap size to be larger. I’m not really an expertise in this, but a quick google gives some info like this, http://www.mkyong.com/java/find-out-your-java-heap-memory-size/ which might be useful. If you’ve got a system administrator maybe they can. My MaxHeapSize is 32038191104 if that’s any help.
You could use the fastx tools command fast_formatter to do they same thing as reformat, but I think there’s a really high chance you would run into the same problem when you run JAFFA as it also uses the reformat command in its pipeline.
Best of luck.
Cheers,
Nadia.
Indeed the MaxHeapSize is maybe not appropriate. I will try to find a way to deal with that.
uintx AdaptivePermSizeWeight = 20 {product}
intx CompilerThreadStackSize = 0 {pd product}
uintx ErgoHeapSizeLimit = 0 {product}
uintx InitialHeapSize := 536870912 {product}
uintx LargePageHeapSizeThreshold = 134217728 {product}
uintx MaxHeapSize := 536870912 {product}
uintx MaxPermSize = 85983232 {pd product}
uintx PermSize = 21757952 {pd product}
intx ThreadStackSize = 1024 {pd product}
intx VMThreadStackSize = 1024 {pd product}
Thanks!
I installed and added to my path java 1.7.0_79 and tried again the command on a Debian linux which have the amount of memory you suggested.
I got this error while running the script reformat:
../tools/bin/reformat fastawrap=0 in=hg19_genCode.fasta out=stdout.fa | sed 's/ /__/g' > hg19_genCode.fa
......./tools/bbmap/reformat.sh: line 123: module: command not found
......./tools/bbmap/reformat.sh: line 124: module: command not found
......./tools/bbmap/reformat.sh: line 125: module: command not found
java -ea -Xmx200m -cp ......./tools/bbmap/current/ jgi.ReformatReads fastawrap=0 in=hg19_genCode.fasta out=stdout.fa
Executing jgi.ReformatReads [fastawrap=0, in=hg19_genCode19.fasta, out=stdout.fa]
Input is being processed as unpaired
Input: 99900 reads 195726195 bases
Output: 99900 reads (100.00%) 195726195 bases (100.00%)
Time: 4.425 seconds.
Reads Processed: 99900 22.58k reads/sec
Bases Processed: 195m 44.24m bases/sec
Is it normal that you're calling module load here? It seems that my JAFFA can't find them since there are not in the module avail on my machine. Should I replace them by something else (like PATH=./samtools$PATH? Does the command JAFFA-bpipe run works without these module?
#########################
function reformat() {
#module unload oracle-jdk
#module unload samtools
module load oracle-jdk/1.7_64bit
module load pigz
module load samtools
local CMD="java $EA $z -cp $CP jgi.ReformatReads $@"
echo $CMD >&2
$CMD
}
Thanks in advance,
Vincent