FIRMA and limma

Ana R. Grosso

unread,

Jul 23, 2008, 12:39:18 PM7/23/08

to aroma.affymetrix

Hi

First of all, I would like to congratulate you all for the excellent
work that you have done with aroma.affymetrix package and FIRMA
method.
I am comparing two groups with replicates using Affy Exon arrays.
I read your recent paper describing FIRMA and I want to apply your
method. I have some questions and I would like to know if I am doing
the correct analysis.

After the normalization I did the summarization:
>setCdf(dsQNorm, cdfEnsb)
>plmGene <- ExonRmaPlm(dsQNorm, mergeGroups=TRUE,tag=c("*","ensb"))
>fit(plmGene, verbose=verbose)
>rs<-calculateResiduals(plmGene,verbose=verbose)
>rs<-getResidualSet(plmGene)
>cesGene <- getChipEffectSet(plmGene)

1) To find the genes differentially expressed I extracted the chip
effects (and applied the log2) and proceed to the limma analysis. Just
to confirm, the chipEffects values correspond to expression levels and
the result of limma analysis will be the usual log2 fold-changes?
Do you recommend the filters that you applied in FIRMA paper for
detecting splicing events (minimum probes number and minimum
expression values) also for detecting differentially expressed genes?
How can I obtain the number of probes per PSR?

2) To find the splicing events I did:
>firma <- FirmaModel(plmGene,tag=c("*","ensb"))
>fit(firma,verbose=verbose,force=T)
>fst<-getFirmaScores(firma)
>fstDF<-extractDataFrame(fst,addNames=TRUE)

After this can I apply the same filters as described in 1) and proceed
to limma with fstDF object?
What it will be the fold-change value for a specific exon inclusion/
exclusion event, residuals or FIRMA scores?

Thank you in advance
Ana Grosso

Mark Robinson

unread,

Jul 24, 2008, 4:06:21 AM7/24/08

to aroma-af...@googlegroups.com

Hi Ana.

Some comments below.

On 24/07/2008, at 2:39 AM, Ana R. Grosso wrote:

>
> Hi
>
> First of all, I would like to congratulate you all for the excellent
> work that you have done with aroma.affymetrix package and FIRMA
> method.
> I am comparing two groups with replicates using Affy Exon arrays.
> I read your recent paper describing FIRMA and I want to apply your
> method. I have some questions and I would like to know if I am doing
> the correct analysis.
>
> After the normalization I did the summarization:
>> setCdf(dsQNorm, cdfEnsb)
>> plmGene <- ExonRmaPlm(dsQNorm, mergeGroups=TRUE,tag=c("*","ensb"))
>> fit(plmGene, verbose=verbose)
>> rs<-calculateResiduals(plmGene,verbose=verbose)
>> rs<-getResidualSet(plmGene)
>> cesGene <- getChipEffectSet(plmGene)
>
> 1) To find the genes differentially expressed I extracted the chip
> effects (and applied the log2) and proceed to the limma analysis. Just
> to confirm, the chipEffects values correspond to expression levels and
> the result of limma analysis will be the usual log2 fold-changes?

Yes, in fact if you run RMA background correction + quantile
normalization + ExonRmaPlm (mergeGroups=TRUE), you'll get practically
identical values to the defaults from 'fitPLM' in the 'affyPLM'
package. Therefore, limma on the chip effects from aroma.affymetrix
will be identical to limma on a 'exprs' extraction from a PLMset object.

> Do you recommend the filters that you applied in FIRMA paper for
> detecting splicing events (minimum probes number and minimum
> expression values) also for detecting differentially expressed genes?
> How can I obtain the number of probes per PSR?

If you are using the 'core' set or Ensembl set of probes, I'd say you
shouldn't need to filter out probes for expression analysis. Most of
the bad behaving probes would be downweighted in the RMA robust model
fit.

Somebody might know a quicker way, but to get the number of probes per
PSR, I would do something like:

# read in data for 1 file (here, i've just read the first 10 units)
x<-readCelUnits(getPathname(getFile(csN,
1)),cdf=getPathname(cdf),units=1:10)

# unwrap the list structure
np<-unlist(lapply(x,FUN=function(uu) { lapply(uu,FUN=function(u)
length(u$intensities)) }))

# to match up to extractDataFrame (again, i've read just 10 units)
exFirma <- extractDataFrame(fs,addNames=TRUE,units=1:10)

> length(np)
[1] 140
> dim(exFirma)
[1] 140 38

extractDataFrame and my calls to 'lapply' will unwrap the lists in the
same order, so you can just add this as another column to the
extracted data frame.

I would "proceed with caution" with FIRMA scores based on only 1 or 2
probes.

> 2) To find the splicing events I did:
>> firma <- FirmaModel(plmGene,tag=c("*","ensb"))
>> fit(firma,verbose=verbose,force=T)
>> fst<-getFirmaScores(firma)
>> fstDF<-extractDataFrame(fst,addNames=TRUE)
>
> After this can I apply the same filters as described in 1) and proceed
> to limma with fstDF object?
> What it will be the fold-change value for a specific exon inclusion/
> exclusion event, residuals or FIRMA scores?

Well, I don't have a good gauge on what fold change is indicative of
exon inclusion/exclusion, but I have done limma analysis on the FIRMA
scores and selected some of the top hits. What I typically do is plot
the probe level data or FIRMA scores in the context of the genome for
some of the high scoring regions. 'Supplementary Data 2' on the web
shows some pretty convincing examples. Elizabeth created these with
the 'GenomeGraphs' package, which is worth checking out.

Hope that helps.

Cheers,
Mark

------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.rob...@garvan.org.au
e: mrob...@wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------

Elizabeth Purdom

unread,

Jul 24, 2008, 9:14:13 AM7/24/08

to aroma-af...@googlegroups.com

Mark Robinson wrote:
>

>> Do you recommend the filters that you applied in FIRMA paper for
>> detecting splicing events (minimum probes number and minimum
>> expression values) also for detecting differentially expressed genes?
>> How can I obtain the number of probes per PSR?
>
> If you are using the 'core' set or Ensembl set of probes, I'd say you
> shouldn't need to filter out probes for expression analysis. Most of
> the bad behaving probes would be downweighted in the RMA robust model
> fit.
>

I would agree that in general, the filtering is not so necessary *for
these reduced set of probes*; more important is probably the number of
probesets (and probes) per gene. I would have a lot of doubt about 1 or
2 probesets being able to find subtle differences because of the level
of noise they would have, but it would just depend on the gene. The main
reason for the filtering for the alternative splicing is that it is very
difficult to tell with only four probes differential splicing events
from differences in non-expressed probesets due solely to background noise.

> Somebody might know a quicker way, but to get the number of probes per
> PSR, I would do something like:
>

There's actually a function in affyxparser package that gets the number
of entries per group (which is the number of probes per probeset for our
cdfs) call readCdfNbrOfCellsPerUnitGroup; it returns it in a list fashion
units<-1:10
x<-readCdfNbrOfCellsPerUnitGroup(getPathname(cdf),unit=units)

If you apply it to the default cdf distributed by Affy, where each
probeset is also a unit (no grouping into genes) then you can get
specific probesets more easily (rather than units/genes). For example,
if you have firma scores under fsDF (and if you chose add.names=TRUE)

mypbsets<-fsDF[1:10,"groupName"] #the first ten probesets, but could do all
affyUnits<-indexOf(cdfAffy,names=mypbsets) #gets their unit number in
the affy cdf, which I've called cdfAffy
nProbesPerExon<-readCdfNbrOfCellsPerUnitGroup(getPathname(cdfAffy),unit=mypbsets)
nProbesPerExonVector<-unlist(nProbesPerExon)

You could do the same thing, I guess, with the ensembl or other cdfs by
first doing
myunits<-unique(fsDF[1:10,"unit"])
allnProbesPerExon<-readCdfNbrOfCellsPerUnitGroup(getPathname(cdfEnsb),unit=myunits),recursive=FALSE)
nProbesPerExon<-unlist(allnProbesPerExon,recursive=FALSE)
nProbesPerExonVector<-unlist(nProbesPerExon)

but I haven't checked that.

I personally am very wary about unlisting complicated lists and matching
things up. I generally worry that I might have the wrong cdf here or
there and things won't match and I won't know. I used unlist above to
get a vector of number of probes, but if you don't you'll realize that
there only a single number per element of the list so it feels
comparatively 'safe' to me. I also like to make everything match up
names as a double check, rather than just the order.

Best,
Elizabeth

Ana R. Grosso

unread,

Jul 29, 2008, 11:52:23 AM7/29/08

to aroma.affymetrix

Hi

Thank you very much for your suggestions.
I made all the analysis but I am only detecting alternative splicing
events in one direction (meaning all skipped events).
All my FIRMA score are positive. Is this a problem of logarithms? Are
the FIRMA scores and residuals plotted in 'Supplementary Data 2' in
log2 scale?

Thank you in advance
Ana

Mark Robinson

unread,

Jul 29, 2008, 6:17:30 PM7/29/08

to aroma-af...@googlegroups.com

Hi Ana.

Yes, 'extractDataFrame' will pull out the 2^{FIRMA score} (since this
is how they need to be stored in the CEL files) .... so yes, take log2
before limma, plotting, etc.!

Noted. We will add this to the documentation.

Mark

------------------------------

Reply all

Reply to author

Forward