On Wed, Jun 4, 2008 at 1:42 AM, cstratowa
<Christian...@vie.boehringer-ingelheim.com> wrote:
>
> Dear Henrik
>
> We are trying to run GLAD on different cluster nodes using a perl
> script which distributes an R-script to different cluster nodes. As
> mentioned in an earlier discussion thread, we are facing the problem
> of data overwrite when using ChromosomeExplorer, see:
> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/2d3a7c10d512c539#
>
> I did follow your advice to use tags:
>> ce <- ChromosomeExplorer(glad, tags="*,foo")
>
> and this seemed to solve the problem, e.g. the following directories
> were created:
> ./reports/TestBatch/ACC,-XY,QN,RMA,A+B,FLN,-XY,1
> ./reports/TestBatch/ACC,-XY,QN,RMA,A+B,FLN,-XY,2
> ./reports/TestBatch/ACC,-XY,QN,RMA,A+B,FLN,-XY,3
> ./reports/TestBatch/ACC,-XY,QN,RMA,A+B,FLN,-XY,5
Ok.
>
>
> However, when running GLAD for six chips on six cluster nodes (one for
> each chip), four of the tasks (1,2,3,5) actually completed, but the
> others had some file/directory conflicts - see below. The order the
> subtasks finished was arbitrary (e.g. subtask 1 was one of the last
> ones to complete).
>
> Tasks 4 and 6 failed probably due to the same problem:
> 4: Exception: Could not create file path: reports/includes/images
> 6: Exception: Could not create file path: reports/includes/js/
> ChromosomeExplorer4
>
> Do you have any idea why I get this message?
The *Explorer write common files to reports/includes/ each "run".
They are always the same. The clash is that your hosts try to
create/write the same directories/files at the same time (or when the
file systems have locked these). In this case, I don't think you have
to worry.
> Why do only 2 of 6 tasks have this problem?
By pure chance. Has nothing to do with aroma.affymetrix.
> Do you have any idea how to solve this problem?
Don't think it is a problem. Just try the *Explorer and see if it works.
Cheers
Henrik
ok, now I see what your problem is. So, when you call process() on
the ChromosomeExplorer, an race-condition exception might occur while
creating the reports/includes/ files causing process() to
interrupt/quit before it calls fit() on the GladModel and plotts the
results. Since the setup of reports/includes/ is "orthogonal" to the
latter, one could let the Explorer to ignore these kind of errors.
I've updated the Explorer class with a alpha-version method
setParallelSafe(). After installing the patch:
library(aroma.affymetrix);
downloadPackagePatch("aroma.affymetrix");
call setParallelSafe(ce, TRUE) before calling process() on each of the
machines. That will make the Explorer to ignore "non-important"
exceptions that may occur due to parallel race conditions.
Tech details: Note that it this is very much a quick-and-dirty alpha
implementation; a much better solution would be to use a mutex
mechanism handling race conditions, but that is current not available
in R.
Let us know if this helps
Henrik
On Thu, Jun 5, 2008 at 7:36 AM,
<christian...@boehringer-ingelheim.com> wrote:
> Dear Henrik
>
> First, I am trying to reply in this way, since today it is for some reason
> impossible for me to signin to the aroma.affymetrix google group: I have
> cleared browser cash and cookies, reset the password, etc. as suggested by
> google help, but nothing worked.
>
> I would appreciate if you could reply to the following thread even though I
> could not put my question there:
> http://groups.google.com/group/aroma-affymetrix/browse_thread/thread/1ce7b462
> 1731e457#
> Thank you in advance
> Christian
>
> So here is my reply:
>
> I must admit that I do NOT understand your answer.
> As you see from the output, tasks 4 and 6 throw an exception, which means
> that GLAD did NOT compute the copy numbers for these two tasks, at least no
> data for the corresponding chips were stored in the subdirectory of
> "gladData" and the same is true for the "reports" subdirectory. Since no data
> for these two chips were computed, opening *Explorer does not solve the
> problem.
>
> Best regards
> Christian
the patch I provided was only for the very basic race conditions in
ChromosomeExplorer, but not in preceeding models/algorithms, e.g.
GladModel, AvgCnPlm etc.
The whole paralizing of aroma.affymetrix (and other computations) is
complicated and the only safe solution to deal with race conditions
like the ones you are reporting is to work with a semaphore/mutex
mechanism. Such a solution is not available in R and until that it
available it does not make sense to put much effort to work around it
in aroma.affymetrix. I don't think the R language itself will provide
this feature in a long time. Instead I think "we" have to to create
"mutex" package that provides the necessary features. The challenges
is to make such a package to work on all platforms and more
importantly be bullet proof.
FYI, I've looked into options like using RSQLite to provide the mutex
functionality, but unfortunately it turns out that SQLite itself is
not parallel safe and depends heavily on the underlying OS. It seems
that the easiest solution is to have one dedicated server handing out
semaphores via a TCP/IP connection (opening and closing connections
are race-condition safe to the best of my knowledge). However, such a
solution is not immediately transparent to aroma.affymetrix, so I've
procrastinated that path. One can also work with lock files as
semaphores, but they are not bullet proof and there are cases where
two processes can lock the same file - it is better than nothing but
not safe.
To summarize, don't expect parallel computation in aroma.affymetrix
until there exist a well tested "mutex" package in R.
Cheers
Henrik
On Mon, Jun 16, 2008 at 7:07 AM, cstratowa
<Christian...@vie.boehringer-ingelheim.com> wrote:
>
> Dear Henrik
>
> This is sad, thus my current trial is to create directories for each
> chip and create symlinks to directories rawData, annotationData and
> later to plmData.
> However, I am not sure if a symlink to plmData will help, or if I will
> get the following error again:
>
> Exception: Failed to rename temporary file: plmData/TestPerl2,ACC,-
> XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Sty/.average-intensities-median-
> mad,def1a76532f24e3899d90fcec50fa3ac.CEL.tmp -> plmData/TestPerl2,ACC,-
> XY,QN,RMA,A+B,FLN,-XY/Mapping250K_Sty/.average-intensities-median-
> mad,def1a76532f24e3899d90fcec50fa3ac.CEL
>
> Can you explain what this error means? When and why is a temporary
> file created?
> Does this ".average-intensities-median-mad,xxx.CEL" file contain the
> average intensities of the normalized CEL-files?
> Is there a way to create this ".average-intensities-median-
> mad,xxx.CEL" file with a defined method after normalization, so that
> GLAD "knows" that this average file exists already and does not to try
> to create this file itself?
Yes, that file contains the average of the CEL files in a data set.
It is actually when getAverageFile() of AffymetrixCelSet calls
createFrom() of AffymetrixCelFile, when the temporary file is created
and exists. In your case, the different hosts happens to be running
createFrom() at the same time. The createFrom() method creates a
valid CEL files either by copying a template CEL file or creating it
from scratch. It also takes an argument 'clear' which will make sure
the CEL files has all zeros if clear=TRUE. If a template CEL file is
used, clear=TRUE has to "update" the CEL with all zeros, if created
from scratch, this is not needed. However, since createFrom() can be
a two-step process, working with a temporary file is the only way to
"guarantee" that you get what you requested (in case of interrupts
etc).
Ideally, getAverageFile() of AffymetrixCelSet would also work with a
temporary file first and when complete rename it to the final name.
That would the method more "atomic", i.e. either complete or fail.
Right now, I think it is possible to interrupt getAverageFile() and
get a incomplete file. Yet another thing on the todo list.
/Henrik