[Genome] looking for some guidance...FW: chain/net/axt

65 views
Skip to first unread message

Miller, Jonathan

unread,
Jun 19, 2006, 6:26:55 PM6/19/06
to
Hi,

it may be that some or most of these questions represent unreasonable requests...

basically I'm looking for

(i) a sufficiently detailed description of how (for example) the mm6/vsSelf/axtNet files were
created that I could reproduce them myself starting, if not from the mm6 genome sequence
(perhaps too complicated to describe?) then perhaps at least starting from the .chain files?

(ii) and I'd like to generate .axt files from .chain files, for some of the alignments (and there
are a number) for which .axt files are not available at hgdownload - but in fact the .chain
and/or .net files are available.

the most relevant archived mailing list item that I've been able to find so far: http://www.soe.ucsc.edu/pipermail/genome/2005-October/008710.html
seems to have some significant omissions; even if I am overlooking something
in that file, there remains an issue for the self-alignments (vsSelf) of how and at what
stage the exact (diagonal) matches are discarded. (as I am trying to accomplish
it now, I can generate the .net files from the .chain files, but netToAxt crashes yielding an empty
.axt file).

many thanks and best wishes

Jonathan Miller
Assistant Professor
Biochemistry
Baylor College of Medicine

________________________________

From: Miller, Jonathan
Sent: Mon 6/19/2006 10:58 AM
To: Donna Karolchik
Subject: RE: chain/net/axt


Donna,

[ presumably you'll forward to the appropriate individual if not yourself... ]

in fact there seem to be some omissions from http://www.soe.ucsc.edu/pipermail/genome/2005-October/008710.html
(chainNet should evidently follow chainPreNet, and in mm6/vsSelf the readme mentions "netChainSubset," to name a couple).

any chance that I could obtain from you a full script together with intermediate output for, e.g.
generating goldenPath/mm6/vsSelf/axtNet/* from the .chain files? (even better, from the genome sequence itself,
but that is not essential).

command1 file1 file2 ..
ls -l file1 file2...
command2 .....

etc.

once I have that I can probably produce anything else I need myself...

many thanks

jm

________________________________

From: Miller, Jonathan
Sent: Sat 6/17/2006 9:48 PM
To: Donna Karolchik
Subject: RE: chain/net/axt


Hi Donna,

OK, I've been partly straightened out by http://www.soe.ucsc.edu/pipermail/genome/2005-October/008710.html

so that I can probably figure out how to reconstruct the .axt files from the .chain files.

a. the only unresolved question I have is about the "vsSelf" comparisons. Presumably the
very best match (e.g. to itself) is discarded at some stage?

b. would you have a transcript of the blastz commands you execute to produce the
".lav" files for one of these self-alignments?

c. apparently .axt files can be made directly from the .lav files, although the protocol described on the above archive message suggests that the .axt files posted in the "axtNet" folders are in fact created through a more circuitous route in order to add various labels. Do I understand correctly
that the .axt files created by lavToAxt would in fact contain the same sequence alignments as the .axt files found in the "axtNet" folders?

d. would any other "vsSelf" comparisons have been carried out that could be posted to hgdownload?
all I can find are hg17, mm6, and rn4.

many thanks and best wishes

Jonathan Miller
Assistant Professor
Biochemistry and HGSC
Baylor College of Medicine

________________________________

From: Miller, Jonathan
Sent: Thu 6/15/2006 8:11 PM
To: Donna Karolchik
Subject: chain/net/axt


hi Donna,

this was terrific, thanks very much.

now I have another question:

in a number of cases, particularly the "vsSelf" alignments,
there exist only "chain" files, and no net or axt files.

I have found an email on the web that seems to indicate
how to construct the axt files from the chain and net files:
http://www.cse.ucsc.edu/pipermail/genome/2006-March/010064.html

however, if I am starting with only "chain" files, this email doesn't
quite indicate what I need to do to obtain net files (but from there
hopefully I can continute along the lines of the email to produce
axt files).

more or less, there are a fair number of alignments in goldenPath
for which there are only "chain" files or only "chain" and "net" files;
very often the "Self" alignment files are among this class.
I should like to get "axt" files for many of these, so most likely I
will have to produce them myself starting from what you already
have available. Would be grateful for instructions
on how to generate "net" files that are as clear as the ones in the
email above to create "axt" files...

many thanks and best wishes

jm

Donna Karolchik

unread,
Jun 20, 2006, 11:07:01 AM6/20/06
to
hi Jonathan,

We don't have a document that thoroughly describes the chains/nets process down
to the finest detail. However, we have a fairly good step-wise description in
our make*.doc build documentation files in kent/src/hg/makeDb/ (in the Genome
Browser source tree). These files -- whose names correspond to the various
assemblies (e.g. makeHg18.doc) -- show the actual sequence of programs that are
run to produce each genome assembly. The README.txt and chain*.html files that
accompany the assemblies also supply supplemental information. It will
definitely take some persistence if you plan to reproduce this process at your
site.

Here are some comments from one of our engineers that may help you get started:

The code that drops alignments along the diagonal is in our lavToAxt program
(kent/src/hg/mouseStuff/lavToAxt/lavToAxt.c), which is called by
kent/src/utils/blastz-run-ucsc during the big
cluster blastz run -- before the chaining stage.

We don't usually offer self chain axt files for download because they are quite
large.

We have a chainToAxt program (kent/src/hg/mouseStuff/chainToAxt/chainToAxt.c) --
it is not hooked
into the top-level "make". It seems to require nib.

Generating .net files from chains may cause some chains or parts of chains to be
lost because the netter is a kind of filter; therefore, making a net and then
running netToAxt might not be quite what you want. Nevertheless, netToAxt
shouldn't crash. If you can send the exact command line that results in the
crash, we can try to reproduce the
error locally and help you out.

-Donna
-----------------------------------
Donna Karolchik
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> _______________________________________________
> Genome maillist - Genome at soe.ucsc.edu
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>

Miller, Jonathan

unread,
Jun 20, 2006, 3:00:44 PM6/20/06
to
Dear Donna,

Thanks very much for your suggestions.

Do I understand correctly that one route suitable for my purposes
to creating the .axt files would be to run blastz and then apply lavToAxt? Do I lose something
by not chaining first and applying chainToAxt?

I will persist, but I won't contact you again until I've worked fairly hard at it.

many thanks and best wishes

jm

________________________________

Angie Hinrichs

unread,
Jun 21, 2006, 11:24:28 AM6/21/06
to
Hi Jonathan,

Before chaining, the alignments are shorter and more fragmentary
because, well, they haven't been chained together yet. :) The
chaining stage includes a minimum score filter so it does discard some
of the smallest fragments that can't be chained to anything. If you
would benefit from working with longer alignments (e.g. if you are
looking for large duplicated regions), then you would lose something
by skipping the chaining. However, if you are only going to break up
the alignments at gaps into gapless blocks, then you would not lose
anything by using unchained blastz alignments.

It should be pretty easy to run chainToAxt. However, it is definitely
non-trivial to run the blastz process. We run it on a compute cluster
with several hundred processors because the total compute time is well
in excess of a thousand CPU hours. We split the genome into 10Mbase
chunks for the cluster run, and while the average per-job time was 7
minutes for mm6 self, the longest job time was 40 hours. Because our
process uses a compute cluster, several layers of scripts are
involved, and those scripts assume a certain directory structure and
lots of locally installed auxiliary programs (some of ours, some from
Penn State). It's possible with quite a bit of effort to translate
the whole process to another computing environment, but if you can use
our chain downloads, it will save you a lot of time.

If you still would like to run blastz locally, then the "BLASTZ SELF"
section of kent/src/hg/makeDb/makeMm6.doc and the main script,
kent/src/utils/doBlastzChainNet.pl, are the places to start.

Hope that helps,

Angie
--
angie at soe.ucsc.edu
Software Developer, UCSC CBSE / Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages