question for using custom Affy Exon array CDF

12 views
Skip to first unread message

jing ma

unread,
Mar 31, 2009, 3:25:53 PM3/31/09
to aroma-af...@googlegroups.com
Dear Mark,

When doing background subtraction for the Affy exon array, do you think it is sufficient to use the CORE CDF (I usually use HuEx-1_0-st-v2,coreR3,A20071112,EP.cdf)?

Thanks,
Jing



On Mon, Mar 16, 2009 at 5:35 PM, Mark Robinson <mrob...@wehi.edu.au> wrote:

Hi Jing.

See below.

On 17/03/2009, at 2:54 AM, jing wrote:

> To whom it may concern,
>
> I'm analyzing some Affy human exon array data and hope to generate
> similar plots as seen in the supplementary figures in the Purdom 2008
> Bioinformatics paper. To do so, I need to get the normalized probe
> intensities and residuals.
>
> I've already followed the steps described in the Human Exon Array
> Analysis vignette and get the following:
>
> (1) ...
>    csN <- process(qn, verbose=verbose)
>
> (2) ...
>     res<-getResidualSet(plmTr)
>
> I tried the function "extractDataFrame(...,addNames=TRUE)" hoping to
> get the data plus column labels for my samples but it didn't work.  Is
> there any easy way to extract these two sets of data in matrix format
> similar to the FIRMA score matrix with probe ID and column labels?


Thats right.  extractDataFrame() is typically used for some kind of
summarized data.  For example, you can use extractDataFrame() for
pulling out FIRMA scores (summarized at the probeset level), or RMA
summarized data (summarized at probeset or gene level).

To pull out the raw/normalized data and the residuals, you can use
extractMatrix() or readUnits().  I prefer the former.  Probably its
best to suggest you look at the thread:

http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/46d609076d9580fb

Look for the commands after the "# starting from PLM ..." line.

I've been meaning to put up a page giving a summary of these commands,
including how to use exon array data with GenomeGraphs.  Hopefully I
can find some time shortly to do that.

Hope that helps.
Mark




> Thanks,
>
> Jing
>
> Jing Ma
> Hartwell Center for Bioinformatics & Biotechnology
> St. Jude Children's Research Hospital
>
> >

------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.rob...@garvan.org.au
e: mrob...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------







Mark Robinson

unread,
Mar 31, 2009, 6:01:20 PM3/31/09
to aroma-af...@googlegroups.com
Hi Jing.

To be honest, I haven't explored this in any great detail. I know
that Elizabeth sometimes used the bigger CDFs for the BG correction/
normalization steps and then switched to the 'core' CDF for fitting
the PLMs. I'd expect only subtle changes in the 'normexp' BG
adjustment since it would be fitted on >1M probes with either CDF, but
as I mentioned, I have not studied it.

If all of your downstream analysis is focussed on the 'core' CDF, then
it is probably sufficient to use the 'core' CDF for BG adjustment.
This is what I do for a lot of my work, at least.

Cheers,
Mark

jing ma

unread,
Mar 31, 2009, 10:08:35 PM3/31/09
to aroma-af...@googlegroups.com
Hi Mark,

Thank you very much for the quick reply!  It's very helpful.  My downstream analysis indeed focuses on those exons from 'core' design.

I asked the question because I analyzed some Affy SNP 6.0 array data and found a recurrent heterozygous deletion in one known gene. The deletion covers a few internal exons of the transcript.  Since Affy exon arrays were also run on the same samples, I also did FIRMA analysis.  For this gene, I would expect to see a large positive residual after fitting but often times didn't see it.  I just wanted to first rule out the possibility that I didn't do sufficient array preprocessing.

Best regards,
Jing

Elizabeth Purdom

unread,
Apr 21, 2009, 1:45:41 PM4/21/09
to aroma-af...@googlegroups.com
Hi Jing,
I would actually recommmend using the smaller cdf *if* there is no
downstream need for the remaining probes to be normalized. Generally it
doesn't seem to matter, but probes at background are not always
sensitive to certain types of differences between arrays compared to
those with signal and thus you can wind up with normalized data that
looks very different when you look at just the smaller set of probes
(and even with just core probes, you always have enough low-expressing
genes to be representative of background signal as well). Ideally you
could use this normalization on a smaller set and back normalize for the
other probes, but I don't think this is implemented in aroma.affymetrix.
Best,
Elizabeth
> > e: m.rob...@garvan.org.au <mailto:m.rob...@garvan.org.au>
> > e: mrob...@wehi.edu.au <mailto:mrob...@wehi.edu.au>
> > p: +61 (0)3 9345 2628
> > f: +61 (0)3 9347 0852
> > ------------------------------
> >
> >
> >
> >
> >
> >
> >
> >
> > >
> >
>
> ------------------------------
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.rob...@garvan.org.au <mailto:m.rob...@garvan.org.au>
> e: mrob...@wehi.edu.au <mailto:mrob...@wehi.edu.au>
Reply all
Reply to author
Forward
0 new messages