Accessing individual probe intensities

Maria Rodrigo-Domingo

unread,

Jul 6, 2011, 8:59:28 AM7/6/11

to aroma.affymetrix

Hi all,

I am using aroma.affymetrix for an analysis on Human Exon Array data
and would need to access the individual probe intensities mapped to
probe_id after background correction and normalization. Is there an
easy way to do this? Below you can see what I have tried so far.

With the notation on http://www.aroma-project.org/node/37 I am looking
at object csN - data at probe-level after BC and QN

> affyBatch <- extractAffyBatch(csN)
# I get an error message about package "huex10stv2cdf" not being
available from Bioconductor.
> probes_matrix <- extractMatrix(csN, cells=NULL, field=c("intensities", "stdvs", "pixels"), drop=FALSE, verbose=verbose) # I do not get any information about which probe the intensity belongs to
> intensities <- getIntensities(csN)
# no probe or array information
> unitIntensities <- getUnitIntensities(csN, verbose=verbose)
# I get transcript and probeset information and the individual
intensities of the 4 probes in each probeset, but numbered 1 to 4, so
I still cannot map them to their target sequence.

I have now run out of ideas so I would really appreciate any
suggestions. Thank you very much in advance for your help.

Best regards,
Maria Rodrigo

PhD student in Biostatistics
Department of Haematology, Medical Center Aalborg Hospital Science and
Innovation Center AHSIC Denmark

Henrik Bengtsson

unread,

Jul 9, 2011, 7:40:57 PM7/9/11

to aroma-affymetrix

Hi Maria.

On Wed, Jul 6, 2011 at 5:59 AM, Maria Rodrigo-Domingo <mro...@gmail.com> wrote:
> Hi all,
>
> I am using aroma.affymetrix for an analysis on Human Exon Array data
> and would need to access the individual probe intensities mapped to
> probe_id after background correction and normalization. Is there an
> easy way to do this? Below you can see what I have tried so far.
>
> With the notation on http://www.aroma-project.org/node/37 I am looking
> at object csN - data at probe-level after BC and QN
>
>> affyBatch <- extractAffyBatch(csN)
> # I get an error message about package "huex10stv2cdf" not being
> available from Bioconductor.

Yes, that approach, which tries to extract a "Bioconductor" AffyBatch
object, is basically just a convenient wrapper around ReadAffy() of
the affy package, and that is what needs a 'huex10stv2cdf' package.

>> probes_matrix <- extractMatrix(csN, cells=NULL, field=c("intensities", "stdvs", "pixels"), drop=FALSE, verbose=verbose) # I do not get any information about which probe the intensity belongs to

It might be that you've missed that *probes* (aka *cells* as we prefer
to call them) do *not* have annotations in Affymetrix arrays. It is
only *probesets* (aka *unit groups*) that have annotations.
Probesets, which are defined by the CDF you choose, specifies which
sets of probes should be grouped together.

For example, if you know you are interested in probes 1040-1054, you
can extra their probe intensities across all arrays in the
AffymetrixCelSet ('csN') as you do:

> cells <- 1040:1054;
> Y <- extractMatrix(csN, cells=cells, field=c("intensities", "stdvs", "pixels"), drop=FALSE, verbose=verbose);
> str(Y)
num [1:15, 1:6] 37 32 96 214 706 29 39 218 39 792 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "huex_wta_cerebellum_A" "huex_wta_cerebellum_B" ...

Note how there are column names, which corresponds to the array/sample
names, but there are no row names simply because probes don't have
names. Probes are specified by the (x,y) coordinates on the array and
there is a one-to-one mapping (x,y) <-> probe index. You can read
more about this in the help of the affxparser package:

> help("2. Cell coordinates and cell indices", package="affxparser")

>> intensities <- getIntensities(csN)
> # no probe or array information

Same reason as above. (Also, I recommend to use extractMatrix()
instead of getIntensities()).

>> unitIntensities <- getUnitIntensities(csN, verbose=verbose)
> # I get transcript and probeset information and the individual
> intensities of the 4 probes in each probeset, but numbered 1 to 4, so

No we're getting closer. The getUnitIntensities() returns a list
structure containing probe intensities. The list structure reflects
the structure of the CDF, i.e. which cells belongs to which unit
groups. Note how the items in the list have names. These are the so
called "unit names". Each item in turn consists of a sublist, which
corresponds to the units groups. They also have names which are the
so called "group names".

> I still cannot map them to their target sequence.

Q. So, does this mean that you are interested in finding the genomic
location of each probe? Is that the kind of annotation you are
looking for when you ask for "probe_id"? Note that when they designed
the custom CDFs for FIRMA, they did map the 25-mer probes to the
genome by their sequences, cf. subpages via
http://aroma-project.org/chipTypes/HuEx-1_0-st-v2

Q. Could it be that you are interested in transcript or exon
summaries? See how-to page 'Extract probeset summaries (chip
effects) as a data frame'
[http://www.aroma-project.org/howtos/extractDataFrame] and the example
for 'HuEx-1_0-st-v2' - is that what you're looking for?

The answer will depend on what you are really after.

/Henrik

>
> I have now run out of ideas so I would really appreciate any
> suggestions. Thank you very much in advance for your help.
>
> Best regards,
> Maria Rodrigo
>
> PhD student in Biostatistics
> Department of Haematology, Medical Center Aalborg Hospital Science and
> Innovation Center AHSIC Denmark
>

> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-af...@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

Maria Rodrigo-Domingo

unread,

Jul 11, 2011, 2:05:41 AM7/11/11

to aroma.affymetrix, martin....@rn.dk

Hi Henrik,

Thanks a lot for your reply.

I am aware that probes/cells are not annotated just like probesets or
transcript clusters, so I just didn't explain my problem properly. I
have already done the analysis at the exon and transcript level using
your ExonRmaPlm() with mergeGroups=TRUE or FALSE for each case so that
part of the analysis, the one most people are interested in, is fine.

I am interested in mapping intensities to sequences of nucleotides: I
have downloaded Affymetrix's library files and in one of them one can
find the target sequence corresponding to each "probe identity/index"
- from 1 to 6.5 million - that can be uniquely obtained using the
(x,y) coordinates of the probe on the cdf file. The reason is that I
would like to compare the performance of Exon arrays to NGS and that's
why the individual probes are so important for me. So my question is
whether it is possible to perform an analysis at the "sequence level".

Yet another question, maybe too general, I have also noticed that in
some cases the 4 probes interrogating the same exon (Probe Selection
Region) have extremely different intensities. Do you think it is a
consequence of the GC content of the individual probes or would you
blame it on cross-hybridazation? Would you somehow correct the probe
intensities by their GC content?

Again, thank you very much for your answer and for providing this
discussion group that gets direct support. I find it extremely useful.

Best regards,
María

On Jul 10, 1:40 am, Henrik Bengtsson <henrik.bengts...@aroma-
project.org> wrote:
> Hi Maria.

>
> On Wed, Jul 6, 2011 at 5:59 AM, Maria Rodrigo-Domingo <mrod...@gmail.com> wrote:
> > Hi all,
>
> > I am using aroma.affymetrix for an analysis on Human Exon Array data

> > and would need to access the individual probe intensities mappeda to

> > probe_id after background correction and normalization. Is there an
> > easy way to do this? Below you can see what I have tried so far.
>

> > With the notation onhttp://www.aroma-project.org/node/37I am looking

> genome by their sequences, cf. subpages viahttp://aroma-project.org/chipTypes/HuEx-1_0-st-v2

>
> Q. Could it be that you are interested in transcript or exon
> summaries? See how-to page 'Extract probeset summaries (chip
> effects) as a data frame'
> [http://www.aroma-project.org/howtos/extractDataFrame] and the example
> for 'HuEx-1_0-st-v2' - is that what you're looking for?
>
> The answer will depend on what you are really after.
>
> /Henrik
>
>
>
>
>
>
>
>
>
> > I have now run out of ideas so I would really appreciate any
> > suggestions. Thank you very much in advance for your help.
>
> > Best regards,
> > Maria Rodrigo
>
> > PhD student in Biostatistics
> > Department of Haematology, Medical Center Aalborg Hospital Science and
> > Innovation Center AHSIC Denmark
>
> > --
> > When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example.
>

> > You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with websitehttp://www.aroma-project.org/.

Henrik Bengtsson

unread,

Jul 11, 2011, 8:38:24 PM7/11/11

to aroma-affymetrix, martin.boegsted

Hi.

On Sun, Jul 10, 2011 at 11:05 PM, Maria Rodrigo-Domingo
<mro...@gmail.com> wrote:
> Hi Henrik,
>
> Thanks a lot for your reply.
>
> I am aware that probes/cells are not annotated just like probesets or
> transcript clusters, so I just didn't explain my problem properly. I
> have already done the analysis at the exon and transcript level using
> your ExonRmaPlm() with mergeGroups=TRUE or FALSE for each case so that
> part of the analysis, the one most people are interested in, is fine.
>
> I am interested in mapping intensities to sequences of nucleotides: I
> have downloaded Affymetrix's library files and in one of them one can
> find the target sequence corresponding to each "probe identity/index"
> - from 1 to 6.5 million - that can be uniquely obtained using the
> (x,y) coordinates of the probe on the cdf file.

I'm not 100% sure I know what your asking for, but maybe the following
helps. When doing:

Y <- extractMatrix(csN, cells=cells, field=c("intensities", "stdvs",
"pixels"), drop=FALSE, verbose=verbose);

the returned Y is a JxI matrix, where J is the number of cells
requested (==length(cells)) and I is the number of arrays
(==nbrOfArrays(csN)). Argument 'cells' is an index vector specifying
the cells for which the intensities should be retrieved. If NULL,
it's the same as cells <- 1:nbrOfArrays(csN). You can specify any
subset or order you'd like - the rows of Y will be ordered
accordingly.

So, with the above you will know the cell index for each row in Y.
From the cell index you can obtain the one-to-one (x,y) cell
coordinate (see my previous message). With (x,y) it sounds like you
can retrieve the target sequence you'd like from Affymetrix or other
annotation data sources - is that correct? If this is not enough, I'm
not sure what kind of mapping you're looking for.

If you are interested in the 25-mer *probe* sequences, they are
readily available in the Aroma Cell Sequence (ACS) file:

http://aroma-project.org/chipTypes/HuEx-1_0-st-v2

For example:

> cdf <- getCdf(csN);
> acs <- getAromaCellSequenceFile(cdf);
> acs
AromaCellSequenceFile:
Name: HuEx-1_0-st-v2
Tags: HB20080710
Full name: HuEx-1_0-st-v2,HB20080710
Pathname: annotationData/chipTypes/HuEx-1_0-st-v2/HuEx-1_0-st-v2,HB20080710.acs
File size: 162.50 MB (170394012 bytes)
RAM: 0.00 MB
Number of data rows: 6553600
File format: v1
Dimensions: 6553600x26
Column classes: raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw,
raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw
Number of bytes per column: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
, 1, 1, 1, 1, 1, 1, 1, 1
Footer: <createdOn>20080710 22:05:44 PDT</createdOn><platform>Affymetrix</platfo
rm><chipType>HuEx-1_0-st-v2</chipType><srcFile><filename>HuEx-1_0-st-v2.probe.ta
b</filename><filesize>553438704</filesize><checksum>1ff5d5bd59bd7345ecbab1973097
247e</checksum></srcFile>
Chip type: HuEx-1_0-st-v2
Platform: Affymetrix

> cells <- c(1024, 10902:10905);
> readSequences(acs, cells=cells)
[1] "CCTGCTGGTATCCTGGGAACCAGGT" "ATTATTTGAATGGAGATGGTTACAT"
[3] "TTTGTGGTTTAGAATGTCAGCCCAT" "TAGTGGTTGCCATAAGTCTTACCAT"
[5] NA

As you see, all sequences are not known, but several are. You can
also retrieve the probe sequence data in other ways, e.g.

> readSequenceMatrix(acs, cells=cells)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "C" "C" "T" "G" "C" "T" "G" "G" "T" "A"
[2,] "A" "T" "T" "A" "T" "T" "T" "G" "A" "A"
[3,] "T" "T" "T" "G" "T" "G" "G" "T" "T" "T"
[4,] "T" "A" "G" "T" "G" "G" "T" "T" "G" "C"
[5,] NA NA NA NA NA NA NA NA NA NA
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
[1,] "T" "C" "C" "T" "G" "G" "G" "A" "A"
[2,] "T" "G" "G" "A" "G" "A" "T" "G" "G"
[3,] "A" "G" "A" "A" "T" "G" "T" "C" "A"
[4,] "C" "A" "T" "A" "A" "G" "T" "C" "T"
[5,] NA NA NA NA NA NA NA NA NA
[,20] [,21] [,22] [,23] [,24] [,25]
[1,] "C" "C" "A" "G" "G" "T"
[2,] "T" "T" "A" "C" "A" "T"
[3,] "G" "C" "C" "C" "A" "T"
[4,] "T" "A" "C" "C" "A" "T"
[5,] NA NA NA NA NA NA
attr(,"map")
<NA> A C G T
00 01 02 03 04

> The reason is that I
> would like to compare the performance of Exon arrays to NGS and that's
> why the individual probes are so important for me. So my question is
> whether it is possible to perform an analysis at the "sequence level".

To " perform an analysis at the "sequence level" " is rather vague -
it sounds like a research project in itself. To compare arrays and
NGS estimates, why not compare at the summaries transcript of exon
levels? (Also, make sure to search the literature for studies
comparing exon arrays with NGS).

>
> Yet another question, maybe too general, I have also noticed that in
> some cases the 4 probes interrogating the same exon (Probe Selection
> Region) have extremely different intensities. Do you think it is a
> consequence of the GC content of the individual probes or would you
> blame it on cross-hybridazation? Would you somehow correct the probe
> intensities by their GC content?

I'm sure there is no simple answer, but if you look at the early
Affymetrix gene-expression papers (dChip, RMA, ...) you'll find that
so called "probe affinities" are substantial. There has been heaps of
papers trying to model the probe affinities from probe sequences and
similar. I leave it at this and for someone else to comment on it.

/Henrik

> You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/.

Maria Rodrigo-Domingo

unread,

Jul 12, 2011, 8:58:05 AM7/12/11

to aroma.affymetrix

Dear Henrik,

Thank you very much for your time and for your answers, they have been
extremely useful.

We are looking at different possibilities for the comparison between
NGS and the exon array, also at the gene and exon-levels as you
propose, so we would like to have as many options as possible.

Regarding the cross-hybridazation and the GC content, I know there are
models for differential splicing that work on probe intensities rather
than on exon intensities - the ANOSVA model, for example - so maybe I
can find something useful there.

Once again, thanks a lot.

Best regards,
María

On Jul 12, 2:38 am, Henrik Bengtsson <henrik.bengts...@aroma-