Error when reading idat files into minfi

ainashch

unread,

Feb 9, 2017, 3:49:43 PM2/9/17

to Epigenomics forum

Hello,

I am having issues with loading my data into minfi (1.20.2).

Has anyone seen this error before? I am not sure why minfi is not recognizing some of my idat files as EPIC?

> RGSet <- read.metharray.exp(base = getwd(), targets = sheet, extended = FALSE,recursive = FALSE, verbose = FALSE, force = TRUE)

[read.metharray] Trying to parse IDAT files from different arrays.

Inferred Array sizes and types:

array size

201114530004_R01C01 "IlluminaHumanMethylationEPIC" "1052641"

201114530004_R02C01 "IlluminaHumanMethylationEPIC" "1052641"

201114530004_R03C01 "IlluminaHumanMethylationEPIC" "1052641"

201114530004_R04C01 "IlluminaHumanMethylationEPIC" "1052641"

201114530004_R05C01 "IlluminaHumanMethylationEPIC" "1052641"

201114530004_R06C01 "IlluminaHumanMethylationEPIC" "1052641"

201114530004_R07C01 "IlluminaHumanMethylationEPIC" "1052641"

201114530004_R08C01 "IlluminaHumanMethylationEPIC" "1052641"

201236480014_R01C01 "Unknown" "1051943"

201236480014_R02C01 "Unknown" "1051943"

201236480014_R03C01 "Unknown" "1051943"

201236480014_R04C01 "Unknown" "1051943"

201236480014_R05C01 "Unknown" "1051943"

201236480014_R06C01 "Unknown" "1051943"

Error in read.metharray(files, extended = extended, verbose = verbose, :

[read.metharray] Trying to parse different IDAT files, of different size and type.

Thanks,

Ainash

Tyler Gorrie-Stone

unread,

Feb 9, 2017, 5:09:24 PM2/9/17

to Epigenomics forum

Hi there, you seem to be reading in the data correctly (with the correct argument `Force=T`. However it appears that the idats on 2012364800014 were read using a different another set of dmaps(?) or that set of idats are very odd!

This was an issue in minfi (https://github.com/kasperdanielhansen/minfi/issues/67) but was otherwise patched months ago when the different dmaps were identified. So this appears to be an error with how read.metharray is handling the problematic idats. Specifically, read.metharray only guesses array types using:

(nProbes >= 1052000 && nProbes <= 1053000) or (nProbes >= 1032000 && nProbes <= 1033000), while your problematic idats are just shy of the boundary.

As a consequence, down-stream in the function, when minfi checks if all your arrays are the same type, minfi will return the given error, despite them being the correct arrays.

I believe that wateRmelon's idat reader (readEPIC) will be able to read in the epic arrays as it does not use such strict guidelines to guess the micro-array design.

And instead uses the chip-type information stored on the idat to discern the micro-array design. But as a result you will be on a methylumi work-flow rather than a minfi one.

Perhaps it may be worthwhile for Kasper/Minfi to change the .guessArrayTypes function to either relax the boundaries to guess EPIC arrays,

or guess the array type using ChipType from the idat instead.

Hopefully Minfi will get patched soon.

Tyler.

Kasper Daniel Hansen

unread,

Feb 9, 2017, 6:31:55 PM2/9/17

to epigenom...@googlegroups.com

Indeed, this is caused by the additional check on number of probes in minfi.

Tyler, are you sure that "Beadchip 12x8" unique identifies say EPIC arrays across IDAT files, including expression and genotyping products. Im not sure about that, which is why I went for number of probes.

Best,

Kasper

--
You received this message because you are subscribed to the Google Groups "Epigenomics forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsforum+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tyler Gorrie-Stone

unread,

Feb 10, 2017, 9:21:56 AM2/10/17

to Epigenomics forum

In the context of DNA methylation, the 27k, 450k and EPIC arrays (between the three) can be identified uniquely by ChipType.
I do agree that any products that share ChipType will be wrongly classified but this would depend 1) How readIDAT is being used in a given pipe-line and 2) the user attempts to read in two different products simultaneously.

Wouldn't gradually relaxing the number of probes used to estimate different EPIC dmaps also lead to array-ambiguity as other products (if any) could have similar probe counts.

Considering that users (in this context) are calling read.metharray - I think it would be safe to assume that they are providing methylation arrays to be processed and the array-ambiguity wouldn't necessarily be a problem. Unless Illumina launches another DNA methylation array that uses "BeadChip 8x5"!

I think your method is more appropriate but requires more maintenance if/when other EPIC dmaps are released.

Kind Regards,
Tyler

To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsfor...@googlegroups.com.

ainashch

unread,

Feb 10, 2017, 12:31:45 PM2/10/17

to Epigenomics forum

Thanks Tyler and Kasper for responding to my question!

I tried readEPIC in watermelon and it worked.

I actually just got there data recently. Everything was run on the EPIC array last week (we used 2 kits), so I am not sure why the number of probes is different.

Do you know if there a way to convert the file I get in watermelon into a minfi compatible file for any downstream analyses? I have been using minfi and champ for my analyses and have little experience with watermelon.

Thanks,

Ainash

Kasper Daniel Hansen

unread,

Feb 10, 2017, 1:14:06 PM2/10/17

to epigenom...@googlegroups.com

I'll patch minfi over the weekend.

Best,

Kasper

(Sent from my phone.)

ainashch

unread,

Feb 10, 2017, 1:46:17 PM2/10/17

to Epigenomics forum

Thanks, Kasper!

Best,

Ainash

Tim Triche, Jr.

unread,

Feb 10, 2017, 1:51:44 PM2/10/17

to Epigenomics forum

Interesting. I hacked up illuminaio::readIDAT() a long time ago to handle gzipped IDATs (since that's how GEO stores them, and the easiest way to merge-preprocess a bit set of files with SNPs is incremental, as you've noticed) and didn't really think about this much. I've seen some weird chips on both 450 and EPIC, but I just confirmed that the files should be distinguishable from other platforms:

R> library(illuminaio)

R> IDATs <- list.files(patt="idat")

R> names(IDATs) <- sapply(strsplit(IDATs, "_"), `[`, 1)

R> show(IDATs)

EPIC

"EPIC_200514040098_R04C01_Grn.idat.gz"

HM27

"HM27_5333963003_A_Grn.idat.gz"

HM450

"HM450_6057825116_R02C02_Grn.idat.gz"

HT12

"HT12_9345724096_A_Grn.idat"

HumanOMNI25

"HumanOMNI25_5790902030_R02C01_Grn_cancase2_3.idat.gz"

R> processed <- lapply(IDATs, readIDAT)

R> sapply(processed, `[[`, "ChipType")

EPIC HM27 HM450 HT12 HumanOMNI25

"BeadChip 8x5" "BeadChip 12x1" "BeadChip 12x8" "BeadChip 12x1" "BeadChip 4x10"

R> unlist(sapply(processed, `[[`, "nSNPsRead") )

EPIC HM27 HM450 HumanOMNI25

1052641 55300 622399 2624666

It would be nice to look into the wonky IDATs so that illuminaio::readIDAT can give minfi some hints on which chip is present.

lllumina's array formats are a PITA and always have been.

On Friday, February 10, 2017 at 6:21:56 AM UTC-8, Tyler Gorrie-Stone wrote:

Tim Triche, Jr.

unread,

Feb 10, 2017, 1:52:58 PM2/10/17

to Epigenomics forum

while you're at it, can you re-enable gzip support? ;-)

On Friday, February 10, 2017 at 10:14:06 AM UTC-8, Kasper Hansen wrote:

Reply all

Reply to author

Forward