Recently I moved from Physics to Biology world and I am trying to learn
as much as I can, so any input / advice / suggestion / opinions or
experience is welcome and greatly appreciated.
So right now I in a project of building a new DNA sequencing system with
Solexa / Genome Analyzer from Illumina, and in preparation for the
machine to come this September, we have to figure out what IT
infrastructure system we should employ for the Solexa with the most
efficient and economic way. Since we do not have much space in the lab,
we try to keep everything as neat as possible, and my questions relate
to the down-stream analysis. There are two choices: buying a powerful
computer (quad-core chips, some 128GB of memory etc...) or using the
supercomputer and linux cluster (a lot of memory and nodes, unlimited
storage with a Gb connection) we have here in the institute. My
questions are:
1. what are the disadvantages and inconvenience we may have if we do not
have a stand alone computer and use the linux cluster?
2. what are the advantages if we buy a computer instead of using the
cluster?
Since I am new to the field, I will appreciate any advice you have.
Thank you all in advance,
D.
We utilize our school's cluster for analysis. Perks - lots of cpus - cheaper - redundant systems - someone else administers systems
cons - no large memory machines. the largest we have are 32 GB - not in total control over system, others may trump our jobs
**************************
Bioinformatics Specialist
Research Technology
Support Facility
S20-A Plant Biology Lab
Michigan State University
East Lansing, MI 48824
Ph: (517) 355-6759 x102
Fax:(517) 355-6758
**************************
> From: Andrew Gagne <aga...@gmail.com>
> Reply-To: Solexa User Group <sol...@googlegroups.com>
> Date: Sun, 14 Jun 2009 15:17:33 -0400
> To: Solexa User Group <sol...@googlegroups.com>
> Subject: Re: solexa IT infrastructure advice
>
If you want to use a cluster you have to have SGE installed and configured with an appropriate parallel environment and run your pipeline jobs using qmake. The documentation merely mentions that this is an option for running the pipeline on a cluster, it offers no guidance on how to do it.
people have used LSF instead of SGE -- e.g. WashU. -- a script for this used to be available from David Dooling.
http://genome.wustl.edu/pub/software/lsgmake-gap/
--
David Dooling
LSF scales much better than sge. If your cluster needs to handle tens
of thousands of jobs simultaneously, sge or PBS (and its descendants)
will choke.
--
David Dooling
Thanks for the information. I must admit, it was several years ago
that we evaluated sge (and several other batch schedulers). At the
time, LSF was the only one that could meet our needs. Given that LSF
is pricey, it is good to hear sge has improved. Presently, we are
looking at Condor as a replacement for LSF.
http://www.cs.wisc.edu/condor/