Importing Matrices into R

1,121 views
Skip to first unread message

menon...@gmail.com

unread,
Sep 20, 2016, 9:11:36 AM9/20/16
to deepTools
Does anyone know how to Import the matrices generated by computeMatrix into R.
I have tried unzipping the .gz output file and imported in to R using read.csv and read.delim. I do not know what to make out of the output from these R objects.
Also is there a description of the header for these matrices somewhere. I could not find it.
I basically need to compare the scores between two samples to see if they are significantly different.
Thanks
Debu

Friederike Dündar

unread,
Sep 20, 2016, 9:47:15 AM9/20/16
to menon...@gmail.com, deepTools
Hi Debu,

a couple of questions to clarify:

* Do you use the matrix that is generated via the optional --outFileNameMatrix parameter? That file contains exactly the values that underlie the heatmap generated with plotHeatmap, i.e., each row represents the bins for one line in the BED file. If you have a look at the file in a simple text editor or via the command line (e.g. via head), you will see that the header contains some information about the genome bin size that was used etc.

* Do you use computeMatrix reference-point or computeMatrix scale-regions - this will have an impact on how to interpret the values you see in the file.

* Do you really want to compare all the values of the heatmap or would you rather compare the median or mean enrichments per region per sample?

Best,

Friederike


--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

menon...@gmail.com

unread,
Sep 20, 2016, 10:44:05 AM9/20/16
to deepTools, menon...@gmail.com
> To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Hi Friederike ,

1.I used the output from --outFileName (.gz). I will take a loot the the other output format you mentioned.
2. I ran computeMatrix in reference-point mode
3.You are correct I would rather compare the median or the mean enrichment between samples.
I am kind of new to this stuff so do you have any suggestions on how I might go about extracting the relevant information from the matrices to perform a statistical test. I was thinking a Fisher exact or a simple students t-test on the mean or median enrichment.
Appreciate the help.
Best,
Debu.

Friederike Dündar

unread,
Sep 20, 2016, 1:05:13 PM9/20/16
to menon...@gmail.com, deepTools
Hi Debu,

the default output (via --outFileName) will not help you, do go for the other options.
The table here: http://deeptools.readthedocs.io/en/latest/content/tools/plotProfile.html may also help you understand the types of different output that can be generated.

The issue with comparing the median enrichment between samples is the location - most likely, your signal will not always be in the same position for every region etc. computeMatrix and plotHeatmap/plotProfile are really mostly meant to visualize the types of signal distributions that you can see for different regions (which you supply in the BED file). However, if you're more interested in summarizing the enrichment per region, you may want to look into bigWigSummary of the Kent/UCSC utilities which can be downloaded from the UCSC Genome Browser web site (see the instructions here: http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/userApps/README)

Does that help?

Best,

Friederike

To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+unsubscribe@googlegroups.com.

menon...@gmail.com

unread,
Sep 20, 2016, 1:24:11 PM9/20/16
to deepTools, menon...@gmail.com
Thanks for the suggestions Friederike. I will look in to the bigWigSummary utility.

Devon Ryan

unread,
Sep 20, 2016, 3:31:48 PM9/20/16
to menon...@gmail.com, deepTools
Given a file named "output.mat.gz", in R:

d = read.delim("output.mat.gz", header=F, skip=1)

You'll want to remove the first 6 columns (the region information) and
convert the remainder to a matrix with "as.matrix()".

Devon
--
Devon Ryan, PhD
Bioinformatician / Data manager
Bioinformatics Core Facility
Max Planck Institute for Immunobiology and Epigenetics
Email: dpry...@gmail.com

menon...@gmail.com

unread,
Sep 20, 2016, 4:33:54 PM9/20/16
to deepTools, menon...@gmail.com

Thanks Devon

Reply all
Reply to author
Forward
0 new messages