problem running Rseg with deadzones

92 views
Skip to first unread message

Noboru Sakabe

unread,
Apr 29, 2011, 7:01:32 PM4/29/11
to RSEG Users
Hi, I am successfully running Rseg, but when I use the exact same
command line adding -d , Rseg crashes:

[LOADING_DATA] chromosomes
[LOADING_DATA] dead zones
[LOADING_DATA] ../../../alignments_02/map.bed
[LOADING_DATA] /home/noboru/seq/mnlab/ip_data/input/
seq_join02_2011-03-10/alignments_01/map.bed
[LOADING_DATA] separating deserts
[Remove duplicate reads]
[Selecting bin size] use Hideaki's empirical method: ERROR: could not
allocate memory

The problem seems to be that my deadzone file is too large for Rseg
(25M). When I tried a few lines of the deadzone file, it seems to work
ok.

I wonder if the fix is easy, something like changing a variable and
recompiling.

Thanks.

Song, Qiang

unread,
Apr 29, 2011, 7:21:16 PM4/29/11
to rseg-s...@googlegroups.com
Hi Noboru,

I am sorry, but it may take some time to figure out the cause of this memory allocating error.
How many reads are there in your mapped reads file? Could you please try manually setting
the bin size with the -b option? For example, set the bin size as the bin size computed without
deadzones.

Best,
Song Qiang

Noboru Jo Sakabe

unread,
Apr 29, 2011, 7:39:55 PM4/29/11
to rseg-s...@googlegroups.com
    Thanks for your quick reply.
    Your solution worked, thanks!
    My files are huge, my treatment has ~40M reads, my input ~30M.
    But Rseg does not seem to have a problem with my IP data, it seems to die because of the deadzones.


    One other bug I found was with the deadzones binary. It also had a memory allocation problem, I posted a bug report in your bug tracker. I couldn't make deadzones work, so I compiled a list of unmappable regions based on the UCSC track.

    One other issue I'd like to report is that the sorter binary crashes with my huge files. I was able, however, to use Linux's sort (sort -k1,1 -k2,2g MYFILE.bed > MYFILE.sorted.bed). Just in case someone else runs into this problem.

    Thanks for making such a useful tool available!

Noboru

Song, Qiang

unread,
Apr 29, 2011, 8:08:56 PM4/29/11
to rseg-s...@googlegroups.com
Hi Nororu,

In case you have not seen my reply to the issue about the deadzones.

=====

Song Qiang
2011-04-29 13:52:44 PDT
First of all, we pre-computed the deadzones for 36bp reads and the mouse mm9
genome previously. We added the pre-computed deadzone file to the RSEG website
(http://smithlab.cmb.usc.edu/histone/rseg/). 

Running the deadzones program is memory- and time-consuming. When we computed
the this set of deadzones, we used the default option -prefix=5, the 
computational resources used are cput=17:26:33 and mem=11279300kb. Your system
seems not have this much memory in the first place. It is unclear why it
crashes when using only ~2GB memory. The possible cause on the program side is
that it tries to extend a huge string to hold the whole genome. On the other
hand, were you running other programs that are also memory-consuming?


(In reply to comment #1)
> deadzones can't run to completion in an 8GB RAM system. I monitored RAM usage
> and it reaches ~2GB (25% of my system) and crashes trying to allocate memory.
> I tried the default -p and -p 3.
> 
> $ deadzones -p 3 -s fa -k 36 -o unmappable.mm9.36bp.bed . -v
> 
> [READING SEQUENCE FILES]
> ./chrM.fa    (SEQS: 1)
> ./chr9.fa    (SEQS: 1)
> ./chr4.fa    (SEQS: 1)
> ./chr15.fa    (SEQS: 1)
> ./chrX.fa    (SEQS: 1)
> ./chr2.fa    (SEQS: 1)
> ./chr6.fa    (SEQS: 1)
> ./chrY.fa    (SEQS: 1)
> ./chr10.fa    (SEQS: 1)
> ./chr14.fa    (SEQS: 1)
> ./chr19.fa    (SEQS: 1)
> ./chr12.fa    (SEQS: 1)
> ./chr18.fa    (SEQS: 1)
> ./chr5.fa    (SEQS: 1)
> ./chr11.fa    (SEQS: 1)
> ./chr3.fa    (SEQS: 1)
> ./chr17.fa    (SEQS: 1)
> ./chr13.fa    (SEQS: 1)
> ./chr16.fa    (SEQS: 1)
> ERROR: could not allocate memory
> 
> System info: rseg was compiled and ran in Ubuntu 9.04 64bits (kernel
> 2.6.28-11-generic)

Noboru Jo Sakabe

unread,
May 2, 2011, 1:52:32 PM5/2/11
to rseg-s...@googlegroups.com
    Hi Song Qiang, I couldn't find the deadzones for 36bp for mm9 (I only see 27bp), that's why I decided to generate myself.
    I apologize that I didn't see the memory requirement, but in any case, deadzones was the only program running that required a lot of memory and it seemed to die at 2Gb.
Reply all
Reply to author
Forward
0 new messages