troubles in doing PCR fragment length normalization AROMA3.0

thomas.g...@merckgroup.com

unread,

Jul 13, 2016, 7:19:18 AM7/13/16

to aroma.affymetrix

Hello,

I have used the standard pipeline for SNP6 CEL files described in the CRMAv2 vignette. I did run the pipeline before several times, without any issue.

Now, the fragment length normalization step failed with an error:

Error in gzfile(file, mode) : cannot open the connection

I am not clear where this error comes from.

Any support or idea is welcome.

Best regards,

Thomas

My sessionInfo:

> sessionInfo()

R version 3.2.1 (2015-06-18)

Platform: x86_64-unknown-linux-gnu (64-bit)

locale:

[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8

[7] LC_PAPER=en_US.UTF-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] aroma.light_3.0.0 aroma.affymetrix_3.0.0 aroma.core_3.0.0

[4] R.devices_2.14.0 R.filesets_2.10.0 R.utils_2.3.0

[7] R.oo_1.20.0 affxparser_1.42.0 R.methodsS3_1.7.1

loaded via a namespace (and not attached):

[1] matrixStats_0.50.2 codetools_0.2-14 listenv_0.6.0 future_0.14.0

[5] digest_0.6.9 R.huge_0.9.0 PSCBS_0.61.0 tools_3.2.1

[9] R.cache_0.12.0 parallel_3.2.1 base64enc_0.1-3 aroma.apd_0.6.0

[13] R.rsp_0.30.0 globals_0.6.1 DNAcopy_1.44.0

The full output of the process(fln, verbose=verbose) function:

> cesN <- process(fln, verbose=verbose)

20160713 12:50:05|Normalizing set for PCR fragment-length effects...

20160713 12:51:07| Identifying SNP and CN units...

types

1 2 5

621 934968 945826

20160713 12:51:09| subsetToUpdate:

int [1:1880794] 622 623 624 625 626 627 628 629 630 631 ...

20160713 12:51:09| Identifying SNP and CN units...done

20160713 12:51:09| Retrieving SNP information annotations...

UflSnpInformation:

Name: GenomeWideSNP_6

Tags: Full,na33,hg19,dbSNP137,HB20140118

Full name: GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118

Pathname: annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118.ufl

File size: 7.18 MiB (7526422 bytes)

RAM: 0.00 MB

Chip type: GenomeWideSNP_6,Full

Number of enzymes: 2

20160713 12:51:10| Retrieving SNP information annotations...done

20160713 12:51:10| Identifying the subset used to fit normalization function(s)...

20160713 12:51:10| Identifying units that are SNP and CN probes...

UflSnpInformation:

Name: GenomeWideSNP_6

Tags: Full,na33,hg19,dbSNP137,HB20140118

Full name: GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118

Pathname: annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118.ufl

File size: 7.18 MiB (7526422 bytes)

RAM: 0.00 MB

Chip type: GenomeWideSNP_6,Full

Number of enzymes: 2

20160713 12:51:10| Identifying SNPs and CN probes...

types

1 2 5

621 934968 945826

20160713 12:51:11| units:

int [1:1880794] 622 623 624 625 626 627 628 629 630 631 ...

20160713 12:51:11| Identifying SNPs and CN probes...done

20160713 12:51:11| Identify subset of units from genome information...

20160713 12:51:11| subsetToFit: -XY

UgpGenomeInformation:

Name: GenomeWideSNP_6

Tags: Full,na33,hg19,dbSNP137,HB20140118

Full name: GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118

Pathname: annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118.ugp

File size: 8.97 MiB (9407882 bytes)

RAM: 0.00 MB

Chip type: GenomeWideSNP_6,Full

20160713 12:51:18| Units to exclude:

int [1:96544] 61101 61102 61103 61104 61105 61106 61107 61108 61109 61110 ...

20160713 12:51:19| Units to include:

int [1:1784871] 1 2 3 4 5 6 7 8 9 10 ...

20160713 12:51:19| Identify subset of units from genome information...done

20160713 12:51:19| Reading fragment lengths...

20160713 12:51:19| Reading and filtering fragment lengths...

20160713 12:51:19| Reading fragment lengths...

UflSnpInformation:

Name: GenomeWideSNP_6

Tags: Full,na33,hg19,dbSNP137,HB20140118

Full name: GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118

Pathname: annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118.ufl

File size: 7.18 MiB (7526422 bytes)

RAM: 0.00 MB

Chip type: GenomeWideSNP_6,Full

Number of enzymes: 2

20160713 12:51:26| Summary of non-filtered fragment lengths:

int [1:1880794, 1:2] 395 4056 633 831 970 1508 4025 4557 3779 420 ...

V1 V2

Min. : 7.0 Min. : 7

1st Qu.: 430.0 1st Qu.: 628

Median : 676.0 Median : 1078

Mean : 915.2 Mean : 1410

3rd Qu.: 970.0 3rd Qu.: 1730

Max. :32767.0 Max. :32767

NA's :2279 NA's :447301

20160713 12:51:27| Reading fragment lengths...done

20160713 12:51:27| Filtering fragment lengths...

20160713 12:51:27| Filtering fragment lengths...done

20160713 12:51:27| Reading and filtering fragment lengths...done

20160713 12:51:27| Reading fragment lengths...done

20160713 12:51:28| Identifying units that are SNP and CN probes...done

int [1:1784250] 622 623 624 625 626 627 628 629 630 631 ...

20160713 12:51:28| Identifying the subset used to fit normalization function(s)...done

20160713 12:51:28| Shift: 0

20160713 12:51:28| onMissing: median

20160713 12:51:28| Array #1 of 1213 ('ARLES_p_NCLE_DNAAffy2_S_GenomeWideSNP_6_A01_256002')...

20160713 12:51:28| Reading and filtering fragment lengths...

20160713 12:51:28| Reading fragment lengths...

UflSnpInformation:

Name: GenomeWideSNP_6

Tags: Full,na33,hg19,dbSNP137,HB20140118

Full name: GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118

Pathname: annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118.ufl

File size: 7.18 MiB (7526422 bytes)

RAM: 14.36 MB

Chip type: GenomeWideSNP_6,Full

Number of enzymes: 2

20160713 12:51:33| Summary of non-filtered fragment lengths:

int [1:1880794, 1:2] 395 4056 633 831 970 1508 4025 4557 3779 420 ...

V1 V2

Min. : 7.0 Min. : 7

1st Qu.: 430.0 1st Qu.: 628

Median : 676.0 Median : 1078

Mean : 915.2 Mean : 1410

3rd Qu.: 970.0 3rd Qu.: 1730

Max. :32767.0 Max. :32767

NA's :2279 NA's :447301

20160713 12:51:37| Reading fragment lengths...done

20160713 12:51:37| Filtering fragment lengths...

20160713 12:51:37| Filtering fragment lengths...done

20160713 12:51:37| Reading and filtering fragment lengths...done

V1 V2

Min. : 7.0 Min. : 7

1st Qu.: 430.0 1st Qu.: 628

Median : 676.0 Median : 1078

Mean : 915.2 Mean : 1410

3rd Qu.: 970.0 3rd Qu.: 1730

Max. :32767.0 Max. :32767

NA's :2279 NA's :447301

int [1:1784250] 1 2 3 4 5 6 7 8 9 10 ...

UflSnpInformation:

Name: GenomeWideSNP_6

Tags: Full,na33,hg19,dbSNP137,HB20140118

Full name: GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118

Pathname: annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,Full,na33,hg19,dbSNP137,HB20140118.ufl

File size: 7.18 MiB (7526422 bytes)

RAM: 14.36 MB

Chip type: GenomeWideSNP_6,Full

Number of enzymes: 2

20160713 12:51:40| Setting up predefined target functions...

20160713 12:51:40| Target type: zero

20160713 12:51:40| Setting up predefined target functions...done

20160713 12:51:40| Getting cell matrix map...

'UnitGroupCellMatrixMap' int [1:1880794, 1:2] 622 624 626 628 630 632 634 636 638 640 ...

20160713 12:51:44| Getting cell matrix map...done

Error in gzfile(file, mode) : cannot open the connection

In addition: Warning message:

In gzfile(file, mode) :

cannot open compressed file '/scratch/tmp/RtmpwSx59C/libloc_178_ccc298d560fce69b.rds', probable reason 'No such file or directory'

20160713 12:51:45| Array #1 of 1213 ('ARLES_p_NCLE_DNAAffy2_S_GenomeWideSNP_6_A01_256002')...done

20160713 12:51:45|Normalizing set for PCR fragment-length effects...done

Henrik Bengtsson

unread,

Jul 13, 2016, 11:45:22 AM7/13/16

to aroma-affymetrix

That looks like a hiccup in the file system or something. Did you try again?

Also, if the error remains, please report what traceback() outputs (as
the first command after you get the error).

/Henrik

> --
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-af...@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>
> ---
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aroma-affymetr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Thomas Grombacher

unread,

Jul 14, 2016, 3:34:14 AM7/14/16

to aroma-af...@googlegroups.com

Dear Henrik,
Yes this seems obvious. I tried again, but the error remained. Also the TEMP directory for writing is world rwx and has plenty of space left.
However the directory where the gzip file should be located does not exist: '/scratch/tmp/RtmpwSx59C

I have manually generated the '/scratch/tmp/RtmpwSx59C' directory. With that the 'process(fln, verbose=verbose)' runs now. So it seems that the generation/selection of the TMP directory for writing/reading is causing the error.

Here is the output from traceback()

> traceback()
19: gzfile(file, mode)
18: saveRDS(list(base = base, value = ret0), dest)
17: installed.packages()
16: findBasePkgs()
15: match(x, table, nomatch = 0L)
14: pkgs %in% findBasePkgs()
13: isBasePkgs(environmentName(where[[name]]))
12: cleanup.Globals(globals)
11: cleanup(globals)
10: getGlobalsAndPackages(expr, envir = envir, tweak = tweak, resolve = resolve,
persistent = persistent)
9: exportGlobals(expr, envir = envir, target = NULL, tweak = tweakExpression,
resolve = TRUE)
8: evaluator(expr, envir = envir, substitute = FALSE, ...)
7: (function (expr, envir = parent.frame(), substitute = TRUE, evaluator = plan(),
...)
{
if (substitute)
expr <- substitute(expr)
if (!is.function(evaluator)) {
stop("Argument 'evaluator' must be a function: ", typeof(evaluator))
}
future <- evaluator(expr, envir = envir, substitute = FALSE,
...)
if (!inherits(future, "Future")) {
stop("Argument 'evaluator' specifies a function that does not return a Future object: ",
paste(sQuote(class(future)), collapse = ", "))
}
future
})({
verbose && enter(verbose, "Getting theta estimates")
theta <- extractTheta(ce, units = cellMatrixMap, drop = FALSE,
verbose = less(verbose, 5))
verbose && str(verbose, theta)
verbose && summary(verbose, theta)
verbose && exit(verbose)
verbose && enter(verbose, "Calculating total signals")
if (ncol(theta) > 1) {
y <- theta
y[is.na(y)] <- 0
for (cc in 2:ncol(y)) {
y[, 1] <- y[, 1] + y[, cc]
}
y <- y[, 1, drop = TRUE]
}
else {
y <- theta[, 1, drop = TRUE]
}
verbose && cat(verbose, "Total thetas:")
verbose && str(verbose, y)
verbose && exit(verbose)
verbose && enter(verbose, "Normalizing log2 signals")
if (shift != 0)
y <- y + shift
y <- log2(y)
verbose && cat(verbose, "Log2 signals:")
verbose && str(verbose, y)
yN <- .normalizeFragmentLength(y, fragmentLengths = fl, targetFcns = targetFcns,
subsetToFit = subset, onMissing = onMissing, ...)
verbose && cat(verbose, "Normalized log2 signals:")
verbose && summary(verbose, yN)
rho <- y - yN
y <- yN <- NULL
rho <- 2^rho
verbose && cat(verbose, "Normalization scale factors:")
verbose && summary(verbose, rho)
stopifnot(length(rho) == nrow(theta))
ok <- which(is.finite(rho))
verbose && str(verbose, ok)
theta[ok, ] <- theta[ok, ]/rho[ok]
ok <- rho <- NULL
verbose && cat(verbose, "Normalized thetas:")
verbose && str(verbose, theta)
verbose && summary(verbose, theta)
verbose && exit(verbose)
verbose && enter(verbose, "Creating CEL file for results, if missing")
isFile <- isFile(pathname)
pathnameT <- pushTemporaryFile(pathname, isFile = isFile,
verbose = verbose)
ceN <- createFrom(ce, filename = pathnameT, path = NULL,
verbose = less(verbose))
verbose && exit(verbose)
if (inherits(ce, "SnpChipEffectFile")) {
ceN$mergeStrands <- ce$mergeStrands
if (inherits(ce, "CnChipEffectFile")) {
ceN$combineAlleles <- ce$combineAlleles
}
}
setCdf(ceN, cdf)
verbose && enter(verbose, "Storing normalized signals")
ok <- which(is.finite(cellMatrixMap))
cells <- cellMatrixMap[ok]
data <- theta[ok]
ok <- theta <- NULL
verbose2 <- -as.integer(verbose) - 5
pathnameN <- getPathname(ceN)
.updateCel(pathnameN, indices = cells, intensities = data,
verbose = verbose2)
cells <- data <- ceN <- NULL
verbose && exit(verbose)
popTemporaryFile(pathnameT, verbose = verbose)
dfZ <- getChecksumFile(pathname)
gc <- gc()
verbose && print(verbose, gc)
pathname
}, envir = <environment>)
6: do.call(future::future, args = future.args, envir = assign.env)
5: futureAssign(name, expr, envir = envir, assign.env = assign.env,
substitute = FALSE)
4: futureAssignInternal(target, expr, envir = envir, substitute = FALSE)
3: res[[kk]] %<=% {
verbose && enter(verbose, "Getting theta estimates")
theta <- extractTheta(ce, units = cellMatrixMap, drop = FALSE,
verbose = less(verbose, 5))
verbose && str(verbose, theta)
verbose && summary(verbose, theta)
verbose && exit(verbose)
verbose && enter(verbose, "Calculating total signals")
if (ncol(theta) > 1) {
y <- theta
y[is.na(y)] <- 0
for (cc in 2:ncol(y)) {
y[, 1] <- y[, 1] + y[, cc]
}
y <- y[, 1, drop = TRUE]
}
else {
y <- theta[, 1, drop = TRUE]
}
verbose && cat(verbose, "Total thetas:")
verbose && str(verbose, y)
verbose && exit(verbose)
verbose && enter(verbose, "Normalizing log2 signals")
if (shift != 0)
y <- y + shift
y <- log2(y)
verbose && cat(verbose, "Log2 signals:")
verbose && str(verbose, y)
yN <- .normalizeFragmentLength(y, fragmentLengths = fl, targetFcns = targetFcns,
subsetToFit = subset, onMissing = onMissing, ...)
verbose && cat(verbose, "Normalized log2 signals:")
verbose && summary(verbose, yN)
rho <- y - yN
y <- yN <- NULL
rho <- 2^rho
verbose && cat(verbose, "Normalization scale factors:")
verbose && summary(verbose, rho)
stopifnot(length(rho) == nrow(theta))
ok <- which(is.finite(rho))
verbose && str(verbose, ok)
theta[ok, ] <- theta[ok, ]/rho[ok]
ok <- rho <- NULL
verbose && cat(verbose, "Normalized thetas:")
verbose && str(verbose, theta)
verbose && summary(verbose, theta)
verbose && exit(verbose)
verbose && enter(verbose, "Creating CEL file for results, if missing")
isFile <- isFile(pathname)
pathnameT <- pushTemporaryFile(pathname, isFile = isFile,
verbose = verbose)
ceN <- createFrom(ce, filename = pathnameT, path = NULL,
verbose = less(verbose))
verbose && exit(verbose)
if (inherits(ce, "SnpChipEffectFile")) {
ceN$mergeStrands <- ce$mergeStrands
if (inherits(ce, "CnChipEffectFile")) {
ceN$combineAlleles <- ce$combineAlleles
}
}
setCdf(ceN, cdf)
verbose && enter(verbose, "Storing normalized signals")
ok <- which(is.finite(cellMatrixMap))
cells <- cellMatrixMap[ok]
data <- theta[ok]
ok <- theta <- NULL
verbose2 <- -as.integer(verbose) - 5
pathnameN <- getPathname(ceN)
.updateCel(pathnameN, indices = cells, intensities = data,
verbose = verbose2)
cells <- data <- ceN <- NULL
verbose && exit(verbose)
popTemporaryFile(pathnameT, verbose = verbose)
dfZ <- getChecksumFile(pathname)
gc <- gc()
verbose && print(verbose, gc)
pathname
}
2: process.FragmentLengthNormalization(fln, verbose = verbose)
1: process(fln, verbose = verbose)

-----Ursprüngliche Nachricht-----
Von: aroma-af...@googlegroups.com [mailto:aroma-af...@googlegroups.com] Im Auftrag von Henrik Bengtsson
Gesendet: Mittwoch, 13. Juli 2016 17:45
An: aroma-affymetrix <aroma-af...@googlegroups.com>
Betreff: Re: [aroma.affymetrix] troubles in doing PCR fragment length normalization AROMA3.0

You received this message because you are subscribed to a topic in the Google Groups "aroma.affymetrix" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/aroma-affymetrix/GYM95TmkdZw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to aroma-affymetr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.

Henrik Bengtsson

unread,

Aug 7, 2016, 7:22:18 AM8/7/16

to aroma-affymetrix

Hi, and sorry for the delay.

1. Yes, it could certainly be that the problem was introduced
recently. More precisely, in aroma.affymetrix 3.0.0 (January 2016), I
added support for parallel processing via futures. Following your
traceback, the error is related but not due to this new code. Thus,
if you say things worked with aroma.affymetrix (< 3.0.0) but now you
get this error, your observation is valid.

2. For troubleshooting, are you using any of the parallel processing
features of aroma.affymetrix, cf.
http://www.aroma-project.org/howtos/parallel_processing/? If you
don't know or you don't set anything explicitly, the default behavior
is sequential processing (as in the past).

3. For troubleshooting, are you running this on a local machine and /
or are you on a shared file system (e.g. NFS)? The fact that you have
a /scratch/ suggests to me that you're on a shared file system and
possibly even on a compute cluster. Any details on this will help
understand the problem.

4. I don't think it should matter, but it could be worth updating to
the most recent version of R (3.3.1), since R 3.2.1 is rather old.

5. Now to the error itself: It's not fully clear to me why this
occurs. Your answers above will help me rule out a few things. I
don't think it's an issue with the aroma.affymetrix package, the
future package, or any of the dependent packages. The error occurs
within R and the base package itself; it's when
base::installed.packages()
[https://github.com/wch/r-source/blob/tags/R-3-2-1/src/library/utils/R/packages.R#L590-L604]
is called. My best guess this far is that there's some kind of delay
when R creates tempdir() internally and it actually appears as
available to R. We've seen this type of things on NFS *across*
compute nodes in the past and this is a "feature" of NFS and has
little to do with R itself.

/Henrik

Reply all

Reply to author

Forward