CbsModel or writeRegions error

17 views
Skip to first unread message

Emanuel Gonçalves

unread,
Aug 31, 2017, 4:26:29 AM8/31/17
to aroma.affymetrix
Hi all,

I'm processing a large set of SNP6 samples (1020) and performing CBS segmentations afterwards. It's a very lengthy process and the annoying part is that it systematically crashes on the last sample, see code and output below:

# Run CBS paired segmentation with pooled normal samples
cbs
<- CbsModel(cesN, cesN1.ref)
fit
(cbs, verbose=verbose)
print(cbs)

# Export CBS regions
pathname
<- writeRegions(cbs, verbose=verbose)


20170831 06:57:51| Pathname: cbsData/gdsc_all,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired/GenomeWideSNP_6/YKG-1,chr25,6c653767ed90d29ad5254a760d8f3952.xdr
20170831 06:57:51| Already done. Skipping.
20170831 06:57:51|Array #1018 ('YKG-1') of 1019 on chromosome 25...done
20170831 07:01:07|Genomic-signal tags:
20170831 07:01:07|Reference tags: 6c653767ed90d29ad5254a760d8f3952
20170831 07:01:08|Array #1019 ('YMB-1-E') of 1019 on chromosome 1...
20170831 07:01:08| Pathname: cbsData/gdsc_all,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired/GenomeWideSNP_6/YMB-1-E,chr01,6c653767ed90d29ad5254a760d8f3952.xdr
20170831 07:01:08| Already done. Skipping.
Error: empty (zero-byte) input file
Execution halted

At first I thought it was a problem of the sample, but that is not the case as I could process it separately. In fact, at this stage all the samples seem to be processed. Also strange is that the script works and exports the regions for subsets of the data-set.

Any suggestion in how to export these would be much appreciated?

Thank you,

P.S: I'm using the multithreaded option of aroma

# - Setup configurations
# options(mc.cores = 10)
print(future::availableCores())

# Multithreaded
future
::plan('multiprocess')

# Increase ram
setOption
(aromaSettings, 'memory/ram', 600.0)

# Logs
log
<- verbose <- Arguments$getVerbose(-4, timestamp=TRUE)

# Reduce decimal places to minimize space
options
(digits=4)


Henrik Bengtsson

unread,
Aug 31, 2017, 1:26:36 PM8/31/17
to aroma-affymetrix
Hi,

does the error occur when you run:

fit(cbs, verbose=verbose)

or did that complete successfully and you get the error while running:

pathname <- writeRegions(cbs, verbose=verbose)

If you run interactively, what does traceback() output if called
immediately after the error occurs?

Also, see if

fit(cbs, arrays = 1019:1020, chromosomes = c(1, 25), verbose = verbose)

or alternatively,

pathname <- writeRegions(cbs, arrays = 1019:1020, chromosomes = c(1,
25), verbose=verbose)

gives the same error. Then you can get to traceback() a bit sooner.

/Henrik

PS. I assume that you already know that rerunning the commands will
not redo the actual analysis; already processed samples will be
skipped - though with 1020 samples it might still take some time.
> --
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-af...@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>
> ---
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aroma-affymetr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Emanuel Gonçalves

unread,
Aug 31, 2017, 5:42:18 PM8/31/17
to aroma.affymetrix
Hi Henrik,

does the error occur when you run:

  fit(cbs, verbose=verbose)

Yes, because I never got the "print(cbs)" output
 
or did that complete successfully and you get the error while running:

  pathname <- writeRegions(cbs, verbose=verbose)

If you run interactively, what does traceback() output if called
immediately after the error occurs?

Unfortunately, I can't run this interactively because even though the samples are already preprocessed it takes almost a day to load everything.

Also, see if

fit(cbs, arrays = 1019:1020, chromosomes = c(1, 25), verbose = verbose)

or alternatively,

  pathname <- writeRegions(cbs, arrays = 1019:1020, chromosomes = c(1,
25), verbose=verbose)

gives the same error.  

I changed the code accordingly:

fit(cbs, arrays=1019:1020, chromosomes=c(1, 25), verbose=verbose)

It ran without throwing any error (I tried before the same with a subset of ~100 samples and it also worked). 

It seems to be a specific problem when all the 1020 samples are ran together. I've placed a try/catch with the traceback().

# Run CBS paired segmentation with pooled normal samples
cbs
<- CbsModel(cesN, cesN1.ref)
out <- tryCatch(
 
{
    fit
(cbs, arrays=1019:1020, chromosomes=c(1, 25), verbose=verbose)
 
},
  error
=function() {
    message
('Try/catch: error')
    traceback
()
 
},
  warning
=function() {
    message
('Try/catch: warning')
    traceback
()
 
},
 
finally={
    message
('Try/catch: finally')
    traceback
()
 
}
)
print(cbs)

I started the script with all the samples (without "arrays=1019:1020") as before.

What surprised me is that the script continued to the writeRegions call and it started to export all the samples.

pathname <- writeRegions(cbs, verbose=verbose)

> # Export CBS regions
> pathname <- writeRegions(cbs, verbose=verbose)
20170831 21:53:10|Array #1 ('201T') of 1019...
20170831 21:53:10| Extracting regions from all fits...
20170831 21:53:10|  Obtaining CN model fits (or fit if missing)...
20170831 21:57:15|  Obtaining CN model fits (or fit if missing)...done
20170831 21:57:15|  Extracting regions for chromosome #1...
20170831 21:57:15|  Extracting regions for chromosome #1...done
20170831 21:57:15|  Extracting regions for chromosome #2...
20170831 21:57:15|  Extracting regions for chromosome #2...done
20170831 21:57:15|  Extracting regions for chromosome #3...
20170831 21:57:15|  Extracting regions for chromosome #3...done

Does it mean it will export all the samples processed so far? Can I export everything without re-calling the fit function from CbsModel?

Thanks a lot, 

Henrik Bengtsson

unread,
Aug 31, 2017, 6:07:37 PM8/31/17
to aroma-affymetrix
Hmm... assuming you're on a *nix-like system, see if there are any
empty (zero-size) RDS files is:

ls -la cbsData/gdsc_all,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired/GenomeWideSNP_6/*.rds

The original error message suggests that there could exist such a file.

Also, you wrote "it systematically crashes on the last sample"; did
you run it multiple times and exact same error occurred at the exact
same place? The reason why I'm asking these questions, is to rule out
certain parts of the code. But, a traceback() would be the absolutely
most helpful information.

BTW, your mentioning 1020 samples, but the output says 1019 - probably
not important but better to make sure we're on the same page.

/Henrik

On Thu, Aug 31, 2017 at 2:42 PM, Emanuel Gonçalves

Emanuel Gonçalves

unread,
Aug 31, 2017, 6:52:57 PM8/31/17
to aroma.affymetrix
Hmm... assuming you're on a *nix-like system, see if there are any
empty (zero-size) RDS files is:

ls -la cbsData/gdsc_all,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired/GenomeWideSNP_6/*.rds

The original error message suggests that there could exist such a file.

Indeed, I was wondering about that. I only have *.xdr files, none had 0 bites. Also there are 25 files per sample.

Also, you wrote "it systematically crashes on the last sample"; did
you run it multiple times and exact same error occurred at the exact
same place?  

Yes, I tried three times and always with the same error.

BTW, your mentioning 1020 samples, but the output says 1019 - probably 
not important but better to make sure we're on the same page. 

In fact it's 1020 samples in total, but I excluded one sample when I got the first error (it was the very last sample). Then I re-run a sub-set of the last 20/30 samples containing the problematic ones, but I had no error and the regions were exported correctly. 

The reason why I'm asking these questions, is to rule out
certain parts of the code.  But, a traceback() would be the absolutely
most helpful information.

I only had this error  when running all the samples together in the CbsModel. Unfortunately, because it takes so long to load all the samples through CbsModel it's hard to debug. I'm running now the script with a try/catch.

Is there an alternative way to export the regions without running the CbsModel again? I have the previous script running (where only the last sample was ran in CbsModel) and it seems to be exporting all the processed samples.

Thanks a bunch,

Henrik Bengtsson

unread,
Sep 1, 2017, 7:11:39 PM9/1/17
to aroma-affymetrix
I've been trying to figure where the error could show up and I have
some "poor" guesses. In order to figure it out better, please install
the developer's version of R.utils, which I just updated:

source('http://callr.org/install#HenrikBengtsson/R.utils@develop')

If the error occurs anywhere related to the aroma framework or some of
my underlying packages, the error message should then be much more
informative - particularly, it'll show which the problematic file is.

Unfortunately, the above would require you to rerun your script (but
it should still skip already processed files). If it is the case that
one of the RDS files is corrupt, then a shortcut could be to do:

path <- "cbsData/gdsc_all,ACC,ra,-XY,BPN,-XY,AVG,A+B,FLN,-XY,paired/GenomeWideSNP_6/"
files <- dir(path = path, pattern = "*[.]rds$", full.names = TRUE)
for (file in files) {
message("Reading file: ", file)
res <- R.utils::loadObject(file)
}

That should identify the problematic file, iff such exists.

Without understanding the problem is and / or not having access to all
your *.rds files, it's hard to say if it's safe to export the segments
or not using writeRegions(), e.g. if there's one problematic file,
then it could be that you'll be missing all segments for that
particular sample and chromosome.

Hope this help

Henrik

On Thu, Aug 31, 2017 at 3:52 PM, Emanuel Gonçalves
Reply all
Reply to author
Forward
0 new messages