RnBeads - Sentrix ID not recognized

Peter Mcerlean

unread,

Mar 20, 2020, 12:15:59 PM3/20/20

to Epigenomics forum

Dear RnBeads team,

Thanks for developing such an easy to use package!

I've been using the data juggler to analyze some EPIC data. I upload my data (.csv) and there seems to be two decimal points added to the Sentrix_ID column (image below).

This does not seem to affect the majority of the analysis. However, when I look for per sample methylation or identify in array's in MDS/PCA plots they aren't available?

I've compiled my annotation sheet in Excel and thought it may have been the issue but I've tried loads of different formats/encodings of the .txt or .csv and doesn't seem to matter.

Any help appreciated!

Best,

Peter

Screen Shot 2020-03-20 at 3.13.53 PM.png

Michael Scherer

unread,

Mar 23, 2020, 4:25:05 AM3/23/20

to Epigenomics forum

Hi Peter,

Thanks for using RnBeads! We have never encountered such an issue, so I don't exactly know what the problem might be. Could you check your input sample annotation sheet using a simple Text Editor such as Notepad and send the output here? We know that Excel sometimes creates problems with the Sentrix IDs, since it treats them as numeric values rather than categorical variables.

Best,

Michael

Peter Mcerlean

unread,

Mar 23, 2020, 7:12:37 AM3/23/20

to Epigenomics forum

Hi Michael,

Thanks for the response.

In TextEditor they look good, either as .txt or .csv. I've tried saving them in TextEditor to see if that makes a difference but it doesn't?

Truncated version of my sample sheet is attached.

Best,

Peter

Sample Annotation.csv

Kasper Daniel Hansen

unread,

Mar 23, 2020, 8:47:41 AM3/23/20

to Epigenomics forum

Guess: when you read it into R it gets recognized as a numeric (ie. a floating point number), whereas it should be a character. This issue probably needs to be addressed when you read the sample annotation file into R.

--
You received this message because you are subscribed to the Google Groups "Epigenomics forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsfor...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/epigenomicsforum/6e54306e-29e1-42a6-86a6-3bf7a5302d95%40googlegroups.com.

--

Best,

Kasper

Michael Scherer

unread,

Mar 24, 2020, 3:50:10 AM3/24/20

to Epigenomics forum

Hi Peter,

Indeed, your sample annotation sheet looks fine. I think I can also offer you a solution. As Kasper points out, the column SentrixID is recognized as a numeric/float value instead of a factor/character. To overcome this issue, you can set the column in quotation marks, such that it is recognized as a character value. You can, for instance, use the Excel export function to add these quotes for some of the columns.

I hope that helps,

Michael

Peter Mcerlean

unread,

Apr 3, 2020, 12:43:27 PM4/3/20

to Epigenomics forum

Hi Michael,

So I played around a bit and found that adding a ( ' ) to the first cell alone was sufficient for the annotation to load correctly (go figure).

Naturally however, I then had to add said ( ' ) to all the other idat files from that slide to be recognized for the analysis to begin running.

However, after this it seems the Sentrix_IDs are still only being recognized for some of the parameters (e.g. PCA yes, mean sample methylation no).

Any ideas?

Examples and analysis options file attached.

Best,

Peter

Sample annotation_redo.png

PCA_Sentrix_ID.png

PCA_Sentrix Postion.png

analysis_options.RData

Michael Scherer

unread,

Apr 5, 2020, 4:59:12 AM4/5/20

to Epigenomics forum

Hi Peter,

I am happy that at least parts of the analysis are correctly configured. Indeed, the issue that you report might be an internal issue, and we will have a look at it. We will let you know as soon as we figures it out.