Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Macintosh Sequence Aligner/Editor?

Skip to first unread message

Robert Rumpf

unread,

Sep 16, 1992, 8:52:16 PM9/16/92

We are looking for a Macintosh program which allows one to enter, edit, and
align (partial auto and manual) multiple sequences. We have seen the public
domain program SeqApp, but it is still rather buggy. Does anyone know of a
commercial product which handles these (and perhaps other) tasks?

Bob

Bruce Roe

unread,

Sep 17, 1992, 8:09:00 AM9/17/92

In article <1992Sep17.0...@magnus.acs.ohio-state.edu>, rru...@magnus.acs.ohio-state.edu (Robert Rumpf) writes...

Hi,
Here is a copy of an excellent review of Sequence Analysis
Programs recently posted to bionet.software by Harry Mangalam. Since
maybe some of you don't read the bionet.software news group, here's
a copy.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\ Bruce A. Roe Dept. Chemistry and Biochemistry /
/ BR...@aardvark.ucs.uoknor.edu University of Oklahoma \
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

---------------------------- cut here --------------------------------
Subj: Reviews of Seq Analysis Progs (longish)
From: mang...@SALK-SC2.SDSC.EDU
Newsgroups: bionet.software
Date: 8 Sep 92 23:18:28 GMT

Harry Mangalam Vox:(619) 453-4100, x250
Dept of Biocomputing Fax:(619) 552-1546
The Salk Institute mang...@salk-sc2.sdsc.edu
10010 N Torrey Pines Rd mang...@salk-sgi.sdsc.edu
La Jolla CA 92037 mang...@salk.bitnet
Greetings, Netlandos,

In response to the recent spate of advisories, queries, and warnings
about sequence analysis programs (SAPs), I thought I'd muddy the metaphoric
waters further by throwing my own $.02 into the ring.

I work in a mostly Mac environment hence the emphasis is on Mac
software. A few PC packages are included mostly because they are useful
enough to warrant putting up with the hassle of transferring and modifying
files. I have recently begun to work in an X window situation and
therefore a few X Window programs are also included. Notably absent from
this review is Steve Smith's Genetic Data Environment (GDE), but I will try
to get to it Real Soon Now. It is, for those who don't know, an
extensible, X window app for molecular biology whose server program
(client, in the bizzarro world of X) runs on a Sun SPARC hardware (and now
on SGI, thanks to Anthony Persechini). But enough of GDE for now.

If you're looking for a program that will do everything from restriction
maps to multiple alignments, will make only a polite dent in your wallet,
is well-debugged, takes advantage of the latest network services, co-exists
peacefully with the rest of your applications and is easy to use, you will
have to search beyond the narrow confines of this planet.
Many of the packages mentioned below are commercially produced and thus
have to make a profit. Because of the restricted market for sales (as
opposed to a general purpose package like a spreadsheet or word processor),
they must be relatively expensive and many address the all-too-common
practice of software piracy by requiring the presence of a hardware lock.
Those that are freeware obviously cannot support the level of debugging and
support of a commercial program, however there are some that are of
surprisingly high quality.

The following notes are not meant as an exhaustive, objective review.
They are my opinions (and a few measurements) based on having used the
programs or the demo versions and are obviously biased by my approach to
various problems - what I may dismiss without a thought may be a critical
determinant for someone else. And certainly, don't take my word for it -
I've (fitfully) tried to include the email addresses of people who have
taken opposing views. Also, while I have made a reasonable attempt to be
accurate, features, updates, and prices change so often in this field that
like many reviews, this one is sliding into obsolescence as you read it.
All prices are approximate (and sometimes negotiable, especially at the end
of a fiscal period).
Consider this a work in progress - as time allows, I plan to post
reviews of other packages not covered here and increase the detail of the
reviews, but for now a quick overview will have to do. I invite comments,
corrections, and flames and will certainly post explanations, expansions,
apologies, and retractions if they are warranted.

Additional Sources of Information on SAPs:
You can also search the archived biosci postings for additional
information by WAIS and gopher. One gopher path is:

Title: BioSci-Bionet-News.src
Host: fly.bio.indiana.edu
Path: waissrc:/Other-Bio-Gophers-Etc/Wide-Area-Info-Servers/
BioSci-Bionet-News.src
Type: Query

There are additional reviews on sequence analysis software, notably by
Peter Markiewicz, available from the Bio-archives. His review (titled
pm-macinmolbio.txt on the Indiana archives) has a very good introduction
and covers more ground than this review. I highly recommend it.
Dan Jacobson (da...@welchgate.welch.jhu.edu) recently posted a more
extensive review of public domain primer/oligo analysis programs, including
chunks from their documentation. You can access this via the gopher
mentioned above.

Disclaimer:
The views expressed below are my own. I have not received any
considerations, monetary or otherwise, from any of the entities mentioned
here. I have acted as an uncompensated beta tester for the "Sequencher"
program, for a module of DNASTAR, and a friend (Lisa Caballero) wrote much
of the guts for IBI's AssemblyLign (a competitor to Sequencher, incidentally).

In most cases of freeware, I have included the author's email address;
these should be used sparingly - you should first try the appropriate
archive, read the included documentation, and only as a last resort or to
report a bug, contact the author - let them keep working to keep bringing
us these programs.
At the end of the text is a table that compares a (very) few of the
execution times for some of the programs that do approximately the same
things.

Opening Diatribe:
Crippled demos are a lousy idea - DNASTAR, whatever you might think
about their corporate leadership or programs, has implemented the correct
introductory strategy - a 60 day free trial of the full,
everything-enabled, working program; after 60 days, the program
irreversibly suicides. It is a rare demo that gives you a good feel for
the program when you can't save your work, or print, or import your own
data. There are some exceptions (see the blurb for Gene Construction Kit
below), but in general, crippled demos are not worth the floppies they rode
in on. Rather, see if you can get the company to give you a 30 or 60 day
trial period.

High Quality Freeware/Shareware:

The Don Gilbert Collection:
{I nominate Don Gilbert for the BioGNUdos Prize (apologies to Dan Jacobson
for the nested pun), awarded annually to the author of the most useful free
software for the Biological Sciences.}
Just about everything I have ever tried by Don Gilbert
(gilb...@sunflower.bio.indiana.edu - also keeper of the Indiana Archives)
has been exceptionally useful. This includes:

- DottyPlot, a diagonal comparison program that plots identities or
similarities between 2 easily input sequences. You can magnify the area of
interest and save the output for further evaluation or as a PICT file for
inclusion into a graphic. DNA Strider 1.2 now provides a very similar
comparison, but to my surprise, Dottyplot is faster by quite a bit. Using
proteins of 789 aas (M11969) and 1127 aas (M69238), I measured the
following times on a MacIIci:

DottyPlot DNA Strider 1.2
Window:15 7.5" 27"
Match:7

Window:9 9.5" 32"
Match:4

Window:7 13.5" 18"
Match:3

- READSEQ, a sequence format converter that is unsurpassed in flexibility
and portability.

- GopherApp, a Mac version of the U of Minn gopher protocol that allows you
to attach the computer resources of the Internet to your Mac as sort of an
extended hard disk. Not surprisingly, DG's BioGopher hole at Indiana is
one of the best, with his gopher access to Genbank beating most of the CD
ROMS on our local network ("but", he whined, "when are you going to
implement Boolean searches?"). This is one of those programs that is so
useful that it is worth buying a Mac (and ethernet card) for.

[Another aside - assuming you are within reach of an ethernet backbone,
the single most cost-effective piece of equipment you can buy for your Mac
or PC is an ethernet card. For ~$200, you get (almost) instantaneous
access to terabytes of helpfully sorted information, reasonably supported
software, BBSs, e-mail, etc.]

- loopDloop, a visual RNA secondary structure editor, sort of like Canvas
specifically for RNA. It takes the as input, the output from the Zuker RNA
folding programs and helps you turn them into quite pretty figures.

- SeqApp, an Internet aware, extensible, multiple sequence editor and
analysis package. This is what sequence analysis packages of the future
will look like if they want to sell. It is still in 'alpha' testing, and
as such, is still rough around the edges, but it is definitely the shape of
things to come. From within this program, you can send and receive mail via
a POP mailer, send off sequences for FASTA, BLAST, GRAIL, and GeneID
searches, retrieve Genbank sequences, initiate gopher sessions, inter
convert sequence formats as well as a number of the usual sequence analysis
functions. And, if the function you want is not included, you can also add
your own. (clustalv, a multiple sequence alignment program is included as
an example). As well, there is an almost-hypertextual help system and, possibly the
most responsible gripe reporter available - instant mail to the author from
within the program.
A warning - because of it's neonatal state, it's not yet ready for those
who need to be spoon fed, as DG says himself - "expect it to fail in many
ways." However, if you are reasonably Mac-fluent and could use some of the
tools that _do_ work_, I highly recommend it..

Other Useful Freeware/Shareware:

Primer Analysis Programs:

- Amplify by Bill Engels (WREn...@wisc.macc.edu). A native Mac
application to do oligo/primer analysis. Not quite glitzy or as full
featured as Oligo (National Biosciences Inc, 800 747 4362, for those of you
with HHMI funding), but then it doesn't cost $800 ($640 nonprofit) either.
It will, given oligos, search a target sequence for near matches and
graphically display the results of using the various primers. It will test
the oligos for matching sequence and examine them for internal repeats but
will not search for the best primers to use given a target sequence and a
set of conditions as will Primer and OSP (see below).

Two other freeware primer/oligo programs spring to mind, both of them
more capable than Amplify, but both harder to use.

- Primer (by Stephen E. Lincoln, Mark J. Daly, and Eric S. Lander;
pri...@genome.wi.edu, FAX 617-258-6505) was one of the first of these types
of programs and has been ported to just about every platform that you'd
ever find (although the DOS version suffers from the memory limitations of
that purported OS and no one yet has posted a DJGPP-compiled or otherwise
'DOS-extended' version that would break the normal DOS memory limitations).
It is also one of the most capable, containing just about every feature
that you'd want in an oligo program, from testing oligos to looking for
them, under a very large number of conditions. It is, for portability, a
command line-driven program that requires that you know how to use an
external editor to make up or alter data files.

- OSP was originally an X window app whose authors (LaDeana Hillier,
l...@elegans.wustl.edu and Phil Green (p...@genome.wustl.edu); FAX license
requests to (314) 362-2985 c/o Paula; DJ reminded me - it's free, but you
have to sign a licensing agreement to get it) graciously eviscerated it and
stuffed the guts into a text-window Mac app for the rest of us. The X-win
version is much nicer (and allows you to access TED, their ABI trace
editor), but both will get the job done. OSPX is closer to Primer in
abilities, but closer to Amplify in ease of use. It is a very nice piece
of work.
A Suggestion: If you have an ethernet connection and a Sun SPARC machine
available to you (almost everyone does, whether they know it or not), you
can get an X-window emulator for your Mac or PC for ~$400, run native-mode
OSP and still save $200 compared to buying Oligo.

Other programs:

- Speakquencer (not to be confused with Sequencher, see below) by Christian
Fritze (fri @midway.uchicago.edu). This utility allows you to add
variable-speed voice readback to any program. This function has been added
to most commercial programs, but if you want to go the PD route, it's a
very nice utility to have.

- NCSA Gelreader/ContigAsm, is a gel reading program somewhat similar in
idea to Helix (see below), which has been a work in progress for the last 2
or 3 years. It, like all the NCSA programs (Image, PALedit, Telnet,
Datascope, and many others), is free, of surprisingly high quality, and can
be obtained by FTP to zaphod.ncsa.uiuc.edu.
In short, what this combination intends to be (someday) is a gel image
analysis and sequence assembly program rolled into one. As of now, the
Gelreader part of it is useful for analyzing gels to a point (it can read
in gels that have been digitized into a TIFF format and can do video
densitometry and fragment sizing, and the ContigAsm part of it can be used
to assemble restriction maps into simple physical contigs.

- COMAP, a program by Kay Hoffman (KHOF...@cipvax.biolan.Uni-Koeln.DE),
and available from most bio-archives, provides a somewhat similar functionality as the
above-mentioned GelReader/ContigAsm for DOS machines.

- MACAW - from Greg Schuler at NCBI (sch...@ncbi.nlm.nih.gov; FTP to
ncbi.nlm.nih.gov, in /pub/macaw) is the only strictly molecular biology
program of which I know that runs under MS Windows (which should give you
an idea of how easy it is to program for Windows). It is an exceptionally
nice bit of work, allowing you to look for matching blocks of homology in a
restricted set of protein or nucleic acid sequences. You can load up to
about 16 sequences easily and up to at least 21 sequences if you're
prepared to wait up to 30 min. (!) - a bug in the file handling routine of
1.03 that may have been fixed in 1.05. You can graphically pick the
sequences (or subsequences) that you want to compare and after the block
searching (start with a very high cutoff and work low!), you can lock
segments of homology together as you deem fit. You can view the homologies
as sequence or as graphic blocks, with or without color accents. The
program will also let you print out the graphics, but you may have to
expend some energy massaging your printer to make the output reflect the
screen.

Commercial Programs:

- DNA Strider - The only for-pay DNA analysis program that I can
unhesitatingly recommend. For $200, you can't buy more program. It can't
draw pretty pictures by reading Genbank Feature tables (but if it's already
in Genbank, most scientists I know aren't interested in the sequence per
se, but what can be done with it, and if you modify it, then the
annotations don't coincide, so you still have to draw the picture yourself
anyway). Strider can output its graphics in PICT form, so that they import
smoothly into drawing programs like Canvas. It doesn't pretend to predict
secondary or (God forbid) tertiary structure. It does not support color. It
doesn't speak to you in multiple voices. What it does do, though, it does
incredibly fast. I've only seen ONE program that comes close to Strider for
speed in restriction mapping, and the screen output, while perhaps a little
spare (in the best Edward Tufte tradition) is _useful_! It's interface is
simple and smooth and easy to do real work with.
It will do some limited protein analysis, such as hydrophobicity, and
acid/base prediction, and the latest version (1.2) includes the ability to
do some primer analysis and Diagonal comparisons (much like DottyPlot).
Also, when you paste sequence into a Strider window, it intelligently
strips spaces, numbers, etc., allowing you to add sequences from nonStrider
formats relatively easily. Strider does not allow you to analyze more than
32.5K bases at a time, a nasty fault in my view and like most of the
commercial packages, doesn't allow you to add your own routines.
DNA Strider is available only from its author Christian Marck at the
following address:

Dr. Christian Marck
Service de Biochimie et de Genetique Moleculaire
Bat. 142 Centre d'Etudes de Saclay
91191 GIF-SUR-YVETTE CEDEX FRANCE
fax: (33 1) 69 08 47 12

(warning: he has been known to be slow in responding to correspondence not
containing cheques in the amount of US$200)

- MacVector and Geneworks:
The two most commonly mentioned packages in this forum are MacVector
from IBI/Kodak (800 243 2555, 203 786 5600, fax 203 624 3143) and Geneworks
from Intelligenetics (800 876 9994, 415 962 7300, fax 962 7302). Both cost
on the order of $3000 per machine, enforced by the use of a hardware lock.
From personal experience and watching the BBSs, IBI has historically been
more willing to make deals on multiple purchases and/or selling additional
locks so that installing the software on additional machines doesn't have
to cost the full amount (Geneworks will sell you one additional 1 lock per
package for $500). For example, the local area was offered a deal whereby
academic users could buy MacVector for $1500 and then could buy an
additional lock for another $200. (This deal has since expired, but
certainly ask your local rep if he has a similar deal). Both have their strengths and weakness'. MacVector has been around
longest and therefore _should_ be the most stable. From my experience and
others', this is a questionable assumption - see recent threads for
examples.
Since this is not an intensive examination of each program and since
these are the most popular of the SAPs, I will leave it to others (or at
least to another day) to go into niggling, nitpicking, picayune detail
about their blemishes. Instead, a brief overview, mostly as to value.
Both come with almost everything but the kitchen sink; Geneworks
includes a skeletal sequence assembly program for which MacVector makes you
pay extra. I assume that because of the care (and time....;^)...) that it
has taken IBI since it began being advertised, the IBI AssemblyLign will be
quite a nice piece of work, but I believe that it has still not been
released.
Geneworks, according to promotional material, was designed to take
advantage of the latest techniques in object-oriented programming - there
has been a recent thread about this topic on the bio-soft group. Well,
there's good and bad in that. It seems to take the approach that you
should always be looking at the 'gestalt' of the analysis, and to this end,
all the views of a sequence are linked; modify one and the rest change to
reflect that change. That's fine, except that sometimes you don't care
about the other 5 views on-screen, and you don't want to wait the extra XX
seconds it takes for them all to update. A personal preference.
MacVector takes the approach (or did - our version is 3.5, about a year
old) that it shouldn't do anything unless you specifically tell it exactly
what to do. Want a restriction map? Fine, fill in the (admittedly)
extensive selection menu as to HOW you want the restriction map presented.
As mentioned previously, I am not at all happy about it's stability. While
my view of MacVector is on the cool side, a vote of support has been
recently forwarded by st...@jeeves.ucsd.edu (Blaine Stine), so for the
benefits of MacVector, contact him (her?).

- MacDNASIS/PROSIS aka MacDNAsis Pro is from Hitachi Software (sold locally
through NOVEX 800 456 6839). I just tested their latest demo (1.01) and
frankly, it's a little schizophrenic. It's cheaper than the rest (~$1500),
not hardware copy-protected, and in fact the reps at the demo said that
they considered that limited copying (within one lab) was within the
license agreement, altho I'm not sure that assessment is the official legal
line.
What bothered me most about DNASIS was that its menu system was not very
intuitive - it very much looked like the product of programmers who had
limited input from a bench scientist, with features grouped algorithmically
rather than by use. They have also taken broad liberties with the Mac
interface, which steepens the learning curve quite a bit.
The other glaring fault was that restriction digests are horrifically
slow, fully 2 _orders_of_magnitude_ slower than Strider or DNASTAR (see the
performance table at end). It will take sequences larger than 32.5K, but
you wouldn't want to analyze them.
It's protein analysis routines were also not particularly quick nor
complete and were scattered around the menus haphazardly. It will do
sequence alignments, but will not allow you to set very many parameters and
the ones it does allow you to change are not well explained by the
otherwise quite good help system.
Surprisingly, it's CD ROM database searching routines are surprisingly
well thought-out and relatively fast, subjectively quite a bit faster than
others I've tried, including ENTREZ. The interface between the CD searcher
and the rest of the program is a little cumbersome, but if you're not close
to an Internet connection, it's one of the better CD searchers around. It
is still slower than gopher, but it has the ability to do boolean searches
on specified fields on multiple databases
It also comes with a sequence assembly program, but again, it isn't
particularly well thought-out or full featured, although it does support the use of a standard electromagnetic digitizer (as
opposed to some programs which require that you buy the company-modified
digitizer (the PC version of DNASTAR was one, IBI was another). Watching
others use it, it appeared that it was not very intuitive either.
There were some unexpected pleasures - DNAsis includes quite a nice
'Plasmid Artist'-like drawing program and it claims to be able to do Zuker
RNA folding analyses (the rep said that it could do 2000 bases overnight on
a Quadra - isn't that a little fast?) MacDNAsis also includes a primitive
primer analysis tool.
All of the above criticisms made to the reps were answered with "yeah,
we took care of that in Ver 2.0". I managed to take a peek at it and it
does look a bit better, but not tremendously compelling. They also said
that they were working on bringing their PC version (at present, almost
unusable) up to speed by re-writing it for Windows NT but I wouldn't hold
my breath.
In short, I don't consider it a particularly good deal. Hitachi makes a
terrific plunge router (I have one); it is almost as good a cloning tool as
MacDNAsis.

MacMolly Tetra Ver 1.0 by Soft Gene (030-8326342, fax 030-8219764) is a
rewrite and rename of one of the first Mac SAPs. I'm surprised that it
hasn't gotten more air time than it has (although
John.H...@lambada.oit.unc.edu recently posted a short, relatively
positive note on it).
If I remember the first correctly, they have merged some of the
original's features, but it still comes as multiple programs (like DNASTAR
Mac), but as far as I know, you cannot buy them individually. Would that
they had spiffed up the code as much as the packaging. In the central
module "Analyze", you can still only import a very restricted number of
formats, and their restriction mapper is positively Devonian for a recent
Mac application. While enzyme selection is the _best_ I've seen; easy to
use and very flexible (select and deselect by 6, 5, and 4 cutters,
extensions, from all, asymmetric, or convertible sites, heat tolerant or
inactivated enzymes, sensitive or insensitive to dam or dcm methylation,
salt sensitivity, single strand or star activity!), and it does support a
digitizer, it is extraordinarily slow and its output is basically a TEXT
WINDOW (Aack!). It does have basic oligo/primer tools included, but again,
it is on the slow side. There is also no on-line help.
The Complign module does a reasonable job of multiple alignments but not
really any faster than clustalv available with Don Gilbert's SeqApp (see
timing chart at end).
In short, Tetra looks like a beta test of what might turn out to be a
reasonably good SAP, but it's really not up to snuff compared to the other
programs available.

- DNASTAR Mac (608 258 7420) is the Mac rewrite of the popular (maybe
_popular_ is the wrong word - maybe widespread is better) program for the
PC. In keeping with its PC past, instead of a huge, monolithic program,
there are a number of smaller ones, each with a different focus. EditSeq,
Mapdraw, Protean, XRay, Seqman, Geneman, etc. An advantage of this is that
you can buy them separately. If you don't want the sequence assembly
module (which is functional; better than Geneworks, but lackluster compared
to Sequencher, for example) it's an easy way to save $800, or if you use
gopher or other network solutions to access sequence databases, you can
save another $800 by clipping Geneman (not to mention the cost of the
CDROM), or leave out Align because it will only align 2 sequences at
present. This last problem is being addressed by DNASTAR and they were
quite willing to let us try out a "pre-alpha" version of their multiple
alignment package. For a "pre-alpha", it is surprisingly well put together
and full-featured; the main point being that it does in fact exist and
their protestations of it being released 'Real Soon Now' have some basis in
fact.
Compared to it's PC parent, this version is infinitely smoother, much
more intuitive, and it's on-line help is complete, if organized in a
peculiar manner (it's not at all alphabetical, it seems to be only slightly
context-sensitive, and it lacks hypertext links). It is also not as
capable as the parent; the PC version, like the GCG suite to which it is
somehow related, could (eventually, with enough aggro) do just about
anything. The Mac version is somewhat more restricted in scope, but much
better integrated.
With all these reasons not to buy it, why bother with it at all?
Because the restriction mapper is _quite_ flexible and FAST, as fast as or
faster than DNA Strider for most things. One part about it that I don't
like is that you have to use the separate sequence editor module to
import/export sequences. Editseq does this reasonably well and covers most
popular formats, although not as many as DG's READSEQ (see above) does.
DNASTAR's protein analysis module (Protean) is _quite_ spectacular and the
rest of the modules are acceptably good.
Maybe most importantly to a group of researchers, it is the only SAP
available with a token check-out licensing scheme. This drives down the
price for the full setup from ~$3,000 per user to $1700 per user for 5,
getting cheaper the more you buy) _AND_ it's not tied to particular
machines - the 5 people using it can be anywhere your network extends. It
is because of this last feature in particular that we're evaluating it with
a view to installing it as an Institute service - the rest of you molbio
software companies, take note!
An additional point, at least obliquely related to this topic: It was
one of the principles of DNASTAR (Fred Blattner) who recently instigated
the brouhaha with NCBI, alleging that a government organization had no
business in the development of biotech software or information services and
should therefore have its budget cut severely (Science 257:156-7, 1992).
Those of you who have used and appreciated NCBI's services may feel that
this was not a welcome turn of events and may change your feelings toward
DNASTAR.

- Helix (~$2000, including a hand scanner; less if you already have a
compatible scanner) is a new sequencing product from, of all people,
General Atomics (800 424-3549). It's a good idea - use a handscanner to
scan in the lanes that you want read, then have the software interpret it
for you. Why buy a gigantically expensive 11x17" scanner when you usually
only want part of the film read anyway? They claim (see the August 1992
Biotechniques Vol 13 (2), p207) that it will do a scan's worth of reading
in under 2 min. ).
I have tried it and it does work - sort of. It requires system 7 and a
MacII+FPU with 10 (!) MB of memory (but you can get away (slowly) with
using virtual memory); it also uses a hardware lock.
The program is surprisingly well-designed for a Ver. 1.0 release.
Installing the hardware and software is easy and it has a nice interface.
The problem is that it just doesn't work well enough to warrant spending
the money. The problems are: a) it's difficult to get the scans to enter
completely straight and if the scans are warped, you cannot get the lane
guides to track the lanes correctly. Unlike NCSA's GelReader, which uses a
'comb' aid to set the lanes, you can use an expandable box to include the
lanes you want and then _individually_ set the lanes within this box, so
that if the lanes are at an angle, you can correct for it - a nice touch;
however, you cannot correct for warped lanes. b) Most problematic of all,
it doesn't determine sequence very well, at least from the default settings
and I fed it some pretty nice gels (which it inhaled with surprising ease
via the handscanner and then compressed on the fly (to about 1.5 - 2 MB per
full length scan of 32 lanes). Like some of the more upscale automated
scanners, it then shows the digitized image on the screen, supplements it
with the traces of the densitometry, much like an ABI trace output and then
interprets it. For the amount of computation involved, it runs
surprisingly fast, perhaps _too_ fast, because the base calls that it makes
are very inaccurate. I could excuse some of the confusion in difficult areas, but it ignores sequence that is clear and unambiguous.
Granted, it does allow you to oversee its calls (and the interface for
doing so is nicely thought out with a window for the image, another for the
textual sequence, and links between the two such that if you touch a called
base in the image, the corresponding base in the text window is highlighted
and vice versa, but the sequence itself is so riddled with errors that the
whole point of the software is bypassed.
In short, this version is not quite there, but I would imagine that with
a little more work on the interpretation algorithm, it will be a very nice
product. I just heard from the programmers today (8-27-92) who said that
an error had been introduced while trying to speed the interpretation
algorithm and a new version will be in my hands shortly - more as it
becomes available...

- Sequencher is (a silly name for) an exceptional sequence assembly program
from GeneCodes (313 769 7249; $1200 w/ hardware lock, site/multiple copy
discounts). It is specifically for gel assembly, not for general purpose
analysis, although it does include a very nice (though not blazingly fast)
restriction mapper. It is one of the easiest, fastest interfaces for what
was once a horrible job. From my experience with the demo on some truly
awful sequence: I threw the sequence at the program and it threw the contig
back at me in about the same time as it takes for the GCG GelAssembly to
load. Of course, it also allows assisted manual manipulation of the
sequence as well. It also allows you to view the contig diagramatically or
as sequence in a very nicely designed, scalable window (sort of like the
"pretty" output from the GCG programs). It also does something that few
other programs will do - take tracefile output from an ABI autosequencer
and allow you to view it for verification while assembling the contig (it
will also compress the tracefiles automatically on exit). Of course, if
you use film, it also supports a standard interface) electromagnetic
digitizer. One of the strengths of the program is that the parent company
is small, aggressive, and very willing to listen to suggestions. Their
response time with fixes or enhancements is nothing short of miraculous.
After sending me the demo, a rep from Gene Codes (the president, as it
turned out) called me to see how I liked it. I had some suggestions for the
interface and some questions; within 2 weeks I had another version
incorporating those suggestions. The same has been the case for another lab
here. I was very impressed.

Gene Construction Kit by Textco (vox/fax: 603 643 1471) is a relatively new
entry but it is an extremely interesting cloning tool. All of the
previously mentioned programs allow you to 'clone electronically', but GCK
is optimized for it. It remembers the history of the sequences you use to
clone, what enzymes you used to clip the sequences, what you did to the
ends, whatever you create, you have a history. For any lab that does a
significant amount of cloning, it is easily worth the money . Besides
being a great idea, it is also relatively easy to use. The last program
that attempted this trick was a piece of work from Intelligenetics called
Strategene (A great idea, if only they hadn't made it for _only_ a Xerox
workstation and demanded about $50,000 for it). Despite my usual rage at
demos, there is an excellent demo-tutorial of GCK available at the usual
archives (ftp.bio.indiana.bio, in the IUBIO Software+Data/molbio/mac folder
by gopher).

Timing Charts:

The following times were obtained using MacIIfx (40Mhz 68030, 68881fpu,
8 MB RAM, 507MB Fujitsu HD, System 6.0.5, 19" E-machines color monitor).
Times are also listed for DNA Strider using a Mac (MacSE, 8 Mhz 68000,
no fpu, 2.5 MB RAM, 40 MB CMS HD (Seagate mechanism), system 6.0.5,
built-in B+W monitor) and a MacIIci (25MHz 68030+68881, 8MB RAM, 105 MB HD
(Conner LPS mechanism). The times were measured to the nearest half second
using Douglas Adams' instrument of societal dissolution, the digital watch.
The sequence was 32.5 kb of lambda, the largest amount of sequence that Strider can
handle in one window. The other programs can, unless noted, handle more
sequence, although with varying facility. Square brackets indicate an
explanatory note or additional information listed at bottom.

Program Time to Restriction Restriction 6 Frame
Load Map - Text Map - Graphic ORF Map

DNA
Strider 1.2 2.5[1] 3.5/8.5[2] 1 1
(Mac SE) 19.2 42/1'43" 5.5 8
(MacIIci) 8 9.5/17 1.5 3.5

MacVector 3.5 6 45/32[3] 29 4[4]

Geneworks 2.0 14 14/27[5] 18.5 4.5
(Demo)
DNAsis 1.0 5 10 Minutes[6] [7] 38[8]
(Demo)

MacMolly Tetra 3.5 10.5[10] [10] 12.5
Analyze [9]

Sequencher 10 [11] 3[11] n/a
(Demo)

SeqApp 5 42[12] [12] [12]

DNASTAR Mac 10 4.5[13] 16[14] []

[1] At startup, Strider reads in and recalculates the hash table for it's
Restriction Enzyme library, an unnecessary waste of time, especially for
slower Macs. This should really be made optional.
[2] Times are for "Restriction Map"/"Complete Restriction Map"; the latter
not only displays the textual restriction map, but all the enzyme sites
sorted by number and location of cuts. (170 enzymes)
[3] After 45", the program stopped with the error "too many sites"; after
<10 sites were selected, it completed in 32". (163 enzymes)
[4] MacVector can find ORFs by defining an ORF in a number of ways
including Ficketts Method. I used the default ORF which looks for ORFs
beginning with an atg and ending with a stop, with a minimum translation of
25 aas.
[5] 14" for all 100 enzymes; 27" for those that cut <=2 times (the
default).
[6] DNAsis could not import the text file of sequence and when I tried to
cut and paste the sequence into a "new" window, it reported a number of
invalid residues which it then stripped out, leaving the sequence about 400
nt short (there were no ambiguous nt in the sequence), so I pasted in the
missing # of nt from the same sequence. I did not wait for the restriction
to complete; the program throws up a "remaining" thermometer and when it
hit 15%, I stopped it and did the math. DNASIS can show either actual site
cleavage or use the 5' end of the recognition sequence (as Strider does);
it takes the same time either way. (247 enzymes; a contributing factor)
[7] I could not determine how to make DNAsis give me a graphic map without
modifying the enzyme file.
[8] DNAsis not only gives you an ORF map, but also includes a ORF position
table.
[9] Tetra comes in multiple modules; Analyze is the module that performs
the restriction and translation functions.
[10] The restriction enzyme selection process is quite flexible, allowing
you to select or deselect by # of bases in the recognition site or by
overhang (but not by number of times of cutting). After 10.5 minutes, the
first part of the screen appeared then an irreversible memory error
occurred. After rebooting, I selected only 6 cutters and after 1 minute
the list had only gotten through the enzymes starting with "A", so I
canceled the digest; the typical Mac 'cancel' ("command" + ".") works
smoothly and does not kill the application. (324 enzymes).
[11] The Sequencher demo comes with only 21 enzymes entered in its
database, a strange omission, but Ver 1.0 comes with 59.
[12] No doubt a function of it's 'alpha' test status, SeqApp has some
quirks. Despite it's incorporation of READSEQ, it was not able to smoothly
import the 32.5 kb sequence file and when I tried to copy/paste it into a
'new' window, the sequence could be pasted, but it was not 'intelligently
pasted' as it would have been in Strider - i.e. uneven lines, incomplete
numbering, and most bizarrely, no sequence showing up in the multiple
sequence window. The sequence name was there and when selected again, the
sequence would appear in the editing window (still unevenly formatted, but
with the correct numbering), but I could not get it to show up in the
multiple sequence window. After 42", the text window appeared showing the
enzyme cut sites, but I was unable to make the complete text sequence
window appear.. SeqApp does not support (yet?) graphic maps, and the translation is
supposed to show up underneath the text sequence, but doesn't. (189
enzymes)
[13] I was astonished at how fast DNASTAR was able to produce the initial
map, including the sequence, complement, all restriction sites (146
enzymes), and 3 letter translations in all 6 frames, any and all of which
can be modified or removed using a single easy-to-understand menu. It,
like a number of other programs has screen sensing so that it will expand
the window to full size initially. This may be annoying, because it
obscures the rest of your screen.

Multiple Alignments:
For this exercise, I used 5 POU domain proteins ranging in size from 235
to 451 aas. Because of the various implementations of the homology search,
I was not able to do exact comparisons, but I tried to use similar
parameters when possible. Times are in min. (') and sec (").

Time (min=', sec=")

DNASTAR: 40"[1]
(MegAlign)

Geneworks: 2'43"[2]

MacMolly Tetra: 2'47"[3]
(Complign)

MacDNAsis Pro: 175'(yes, minutes)[4]

ClustalV: 33"[5]

MACAW: 17"[6]
(386/387 PC)

[1] This is not quite fair because this is not a released version, but it
gives an idea of what the finished product will perform like. MegAlign
flatters Geneworks quite a bit in terms of it's presentation and format,
but manages to improve on the speed considerably. It also introduces a
number of improvements on the interface. DNASTAR and Geneworks are by far
the easiest to set up and use. Their use of color for shading differs
slightly but both are helpful in determining conserved sequences.

[2] See the comments above. Geneworks gives you more access to the
controlling parameters and to the coloring scheme.

[3] Don't these guys use their own software? Don't they check out the
competition? The menu system for Complign is among the most
ill-thought-out that I've ever seen. They have managed to bury a
reasonably fast implementation under a serpentine, recursive, oddly
phrased, and unnecessary set of menus and procedures so complex that it
made me want to scream. Let's see - where to start? First you can't set up
a multiple homology. For reasons of their own, they give you the choice of
comparing 1 protein to either one or multiple others. It then takes 3 sets
of menus to start the alignment, the output of which is displayed on
precisely overlapping sets of text windows. Finally, the alignment itself
proceeds dynamically, being modified onscreen as you watch. This may be
eye-catching the first time, but it is a tremendous time sink if all you
want done is the alignment. It looks like they didn't take out the
debugging statements. Finally, while I admittedly did not take the time to
delve into these programs do the last combination of options, the alignment
that Complign finally presented to me was completely wrong. I suspect that
the gap value it presents as a parameter does not mean the same thing as in
the other programs.

[4] Again, MacDNAsis finishes a distant last, somewhat perplexing because
of the generally positive things in the press about Hitachi Software.
Again, I missed the end because of impatience; after 8% (14 minutes), I
killed it.

[5] I included ClustalV because many of the commercial programs are
implementations of it and you can get an idea of how much speed you gain
(or lose) by spending a great deal of money. ClustalV is fastest among Mac
programs (using the MacII+FPU version supplied as an additional module to
Don Gilbert's SeqApp; it runs as a stand-alone, text-window application)
but it's output is a non-proportional font text file. Don't expect
whizbang graphics and autoshading of identities, but if you're doing
exploratory alignments, it is certainly worth having if only because it's
very fast. For final figure production, the text is easily importable into
Canvas, so it can be shaded to your heart's content.

[6] MACAW 1.05, running under Windows 3.1 on a 25 MHz 386/387 with 8MB
extended memory. Macaw is included in this report because it is free and
relatively easy to use; it can import 16 sequences with ease, 6 more uneasily, and is
quite fast. It does not do an autoalignment like the others, but instead
highlights local blocks of homology above a user-defined level. For this
reason, I like it more than the autoalign programs for exploratory work.
Also, because it chooses only local blocks, you can detect homologies that
might go undetected if you forced a global alignment. For example, if
there was an EGF homology at the Nterm of one protein and an HLH motif at
the Cterm and this was reversed in an another protein, you would only pick
up the stronger of the two homologies by the general multiple alignment
programs. After you have examined the local similarities, you can link the
ones you think are significant with a 'link' command that automatically
introduces gaps and enhances the visibility of the blocks.

Let me know if this has been useful or what it needs to become so.
Please address correspondence to the 'salk-sc2' address and if you want to
send me mail, please prefix the Subject line with HJM!

Cheers,

Harry

0 new messages