On 05/12/2007, Elizabeth Purdom <epu...@gmail.com> wrote:
> Hi,
> I have a question about manually giving your own tags in the analysis. What
> I would *like* to do is at different points in my analysis, use a different
> cdf, and note this by putting a tag (I'm working on exon arrays, so I have
> many I can choose from!). But this seems to overwrite the default tag rather
> than append to it. For example, for BG correction I do this:
> csBGTissue<-RmaBackgroundCorrection(csTissue,tags="main")
> #indicating I used cdf with all 'main' design probesets
>
> This appears to give me the folder "Affy_tissue_comm,main", not
> "Affy_tissue_comm,RBC,main", which I would prefer. I'm assuming there's not
> a way to do this, except manually change the folder name or tag, then
> redefine csBGTissue using AffymetrixCelSet$fromFile (I might like
> "appendTag" as well as 'tag' option on commands, but that seems a big
> change...). So my question is to double check how this trickles down -- I
> haven't run the continuing analyzes yet and I'd like to know.
There is a concept I call the "asterisk tag" allowing you to do:
csBGTissue <- RmaBackgroundCorrection(csTissue, tags=c("*", "main"))
print(getFullName(csBGTissue))
## [1] "Affymetrix-HeartBrain,RBC,main"
Using tags="*,main" should also do. For most Transform:s and Model:s
there is an internal getAsteriskTag() method that are used to generate
the asterisk tag(s) given class, parameters etc. I went through most
classes and made them utilize this, but it was not a complete scan,
for instance I avoided touching the exon classes since I knew you guys
were working on those. Before a "*" tag was (as currently in
ExonRmaPlm) identified and set/replaced in the constructor function,
whereas you don't want to parse this until you actually query the
tags, e.g. via getTags() or otherwise. I can do a 2nd scan through
all classes and update those - if so, make sure to commit what you
have and let me know when you're done.
> 1) When I quantile normalize, I assume I will get a folder
> "Affy_tissue_comm,RBC,main,QN", assuming I fix the tag for my background
> correction
Yes, every step is just appending tags.
>
> 2) Then when I do my plm call, I'm going to switch to the 'core' cdf (a
> subset of the probes in the 'main' cdf, so I'm assuming this won't create
> any problems). So if I didn't add any further tags in my call, I'd get a
> folder "Affy_tissue_comm,RCB,main,QN,RMA,merged" (because
> I'm doing ExonRmaPlm). The following call,
> ExonRmaPlm(csNTissue, mergeGroups=TRUE,tag="core")
> will give me a folder "Affy_tissue_comm,core", which is not what I want, so
> I need to do
> ExonRmaPlm(csNTissue,
> mergeGroups=TRUE,tag="RBC,main,QN,RMA,merged,core")
As describe above, the idea is to do:
plm <- ExonRmaPlm(csNTissue, mergeGroups=TRUE, tags="*,core")
print(getFullName(plm))
## [1] "Affymetrix-HeartBrain,RBC,main,QN,RMA,merged,core"
However, the current implementation puts "merged" at the end (try):
## [1] "Affymetrix-HeartBrain,RBC,main,QN,RMA,core,merged"
This is going to be corrected when I do the above updated. Let me know.
>
> 3) Then the FIRMA function seems to do something odd by default,
> comparatively, though I have not rerun it since last spring, so please
> correct me if this isn't current somebody, but it takes forever to run, so
> I'd like to fix it upfront rather than run it and then correct it.
I don't really see why FirmaModel should to be slower than in the
spring, but there might be updates and confounded effects causing
this. Mark, have you noticed a slowdown?
However, one this I have noticed but haven't troubleshooted yet is
that with R v2.6.0 (or R v2.6.1) I noticed, on both Linux and Windows,
that when some calls return to the R prompt, some warning takes
awfully long to display (generate), e.g. if I have six arrays and get
six warnings that some NAs where generated in a log transform, each of
the take a very long time to display (and Ctrl-C does not work). My
best guess this is something with R v2.6.x and not aroma.affymetrix,
but I might be wrong. I haven't done thorough tests.
> It makes
> a folder 'modelFirmaModel' at the level of 'plmData' and 'rawData', etc.
> Then under that it makes the folder '"Affy_tissue_comm,FIRMA", removing my
> previous collection of tags.
The 'modelFirmaModel', called a "root path", is specified in the
getRootPath() method, so you need to update getRootPath.FirmaModel().
In order to "pass down" tags from previous steps, you have to let
getTags() of FirmaModel to take care of this. The current
implementation seems to only return the tags for FirmaModel, cf.
this$.tags. I can update this to do the default, i.e. basically the
effect of "*,<new tags>". Let me know.
> And then the files are called
> 'xyz,FIRMAscores.cel'. That seems like an awful lot of 'FIRMA'. Mark has
> already a beta function to rewrite a bit of the firma functions to allow
> other options. I think its a good time to update this as well. e.g. a folder
> 'scoresData' (so there might be other scores for other chips, etc. that
> could go here), then at the next level, keep all the previous tags as well
> as give a tag 'FIRMA', then under that save all the .cel files for which
> ever of the possible options, with informative filename tags, e.g.
> 'xyz,medResScores.cel' and 'xyz,UQWtScores.cel' in the same folder.
> Feedback?
This is a design issue and I've been thinking about making a similar
move for segmentation data. Currently aroma.affymetrix provides to
segmentation methods via classes GladModel and more recently CbsModel.
For historical reasons, GLAD estimates are stored in gladData/ and
CBS estimates in cbsData/, e.g.
gladData/HapMap270,100K,CEU,ACC,-XY,+300,RMA,A+B,FLN/
cbsData/HapMap270,100K,CEU,ACC,-XY,+300,RMA,A+B,FLN/
However, note I do *not* add tags GLAD and CBS. Since more
segmentation methods will come around, I've been thinking of doing the
following instead:
cnsData/HapMap270,100K,CEU,ACC,-XY,+300,RMA,A+B,FLN,GLAD/
cnsData/HapMap270,100K,CEU,ACC,-XY,+300,RMA,A+B,FLN,CBS/
where 'cns' is short for "copy-number segmentation" (or something
similar). However, I haven't made the move yet, because first of all
I want to be sure it is a good one before break peoples existing
folders (although manual renaming is not that hard so that existing
results will be automatically found). On a more philosophical level
there is a design decision that has to be made, namely, should the
above two subdirectories hold identical file types or not? In the
current setup, aroma.affymetrix know that all files found under
gladData/ is of a certain kind, and same for cbsData/. If instead
cnsData/ is used, the file format has to be inferred from the tags -
and they you starting to add restrictions and information in the tags.
It is important to think through consequences of such a move.
Before making the big move, one could keep the current root paths, but
add tags, e.g.
gladData/HapMap270,100K,CEU,ACC,-XY,+300,RMA,A+B,FLN,GLAD/
cbsData/HapMap270,100K,CEU,ACC,-XY,+300,RMA,A+B,FLN,CBS/
So at this stage I'm not sure if it "safe" to use the generic name
scoresData/ for everything. I am happy to discuss it though (either
online or in person).
But I agree, it is better to make the big changes now before too many
people get affected.
>
> By the way, what if I was not working with cdfs that are subsets of each
> other? Are the cells not contained in the cdf just left blank for the
> various processes, or are the original values copied over?
Are you asking about outputted CEL files?
For most (all?) cases where the output CEL file is of the same
dimension (nrow*ncol) as the input CEL file, the method createFrom()
of AffymetrixCelFile is used. Two of the arguments it takes are
'methods=c("copy", "create")' and 'clear=FALSE' (default values).
Basically, the first one allows you to either do a file copy the
existing CEL file or create/build one from scratch. If copying,
clear=TRUE will afterwards go and blank all CEL values (set them to
zero). If "creating", the created CEL file is already blank, so
clear=FALSE will then read the data from the input file and write it
to the create file. Thus, effectively, when clear=TRUE, the new CEL
will be blank and when clear=FALSE the new CEL will contain the same
values as the input file. FYI, the createFrom() call is "atomic",
that is, if something fails or its interrupted during the call, there
will be no output file. This is why it is safe to hit Ctrl-C almost
anytime.
In the case where chip-effect CEL files are created, they will always
be blank, because they are not created from a input CEL file but from
the CDF.
Hope this helps
Henrik
>
> Thanks,
> Elizabeth
>
> >
>
>
>