UCSC application "chainAntiRepeat"

19 views
Skip to first unread message

Oh, Dong Ha (NIH/NLM/NCBI) [C]

unread,
Sep 2, 2022, 2:58:48 PM9/2/22
to gen...@soe.ucsc.edu, Kodali, Vamsi (NIH/NLM/NCBI) [E], Murphy, Terence (NIH/NLM/NCBI) [E]

Hi,

We noticed that the UCSC utility page (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/ ) included a tool named “chainAntiRepeat.”  

 

The help page looks like this:

 

$ chainAntiRepeat

chainAntiRepeat - Get rid of chains that are primarily the results of repeats and degenerate DNA

usage:

   chainAntiRepeat tNibDir qNibDir inChain outChain

options:

   -minScore=N - minimum score (after repeat stuff) to pass

   -noCheckScore=N - score that will pass without checks (speed tweak)

 

Questions:

  • Could you let us know a bit more detail what this application does?  Will it score alignment chains based on how likely a genomic region covered by alignments from repeats?  
  • I can convert .fasta to .nib format with the faToNip included in  the blat package – faToNip command has options to soft- or hard-mask the sequences.  Wil chainAntiRepeat needs input files (in .nib format) that are masked?
  • We are interested in removing alignments derived from shared repeats, say, those falling within q or t genomic regions covered with alignments by more than N times (N can be 5, 10, or other user-defined value).  Will chainAntiRepeat be able to do that, or is there any UCSC (or other) application that could be helpful?  

 

Thanks a lot!

Dong-Ha

___
Dong-Ha Oh PhD (he/him)

NCBI contractor
Personal web: https://ohdongha.github.io/

 

Luis Nassar

unread,
Sep 21, 2022, 7:27:52 PM9/21/22
to Oh, Dong Ha (NIH/NLM/NCBI) [C], gen...@soe.ucsc.edu, Kodali, Vamsi (NIH/NLM/NCBI) [E], Murphy, Terence (NIH/NLM/NCBI) [E]

Hello, Dong-Ha.

Thank you for your interest in the Genome Browser and its tools.

I'll address your questions below:

Could you let us know a bit more detail what this application does? Will it score alignment chains based on how likely a genomic region covered by alignments from repeats?

chainAntiRepeat removes entire chains based on their score, such as those made by the axtChain utility. It does not produce any new scores itself.

I can convert .fasta to .nib format with the faToNip included in the blat package – faToNip command has options to soft- or hard-mask the sequences. Wil chainAntiRepeat needs input files (in .nib format) that are masked?

You can use either a 2bit or nib file, and you would want them to be masked.

We are interested in removing alignments derived from shared repeats, say, those falling within q or t genomic regions covered with alignments by more than N times (N can be 5, 10, or other user-defined value). Will chainAntiRepeat be able to do that, or is there any UCSC (or other) application that could be helpful?

Unfortunately chainAntiRepeat has no concept of depth/coverage at any region, only the score. And the score can be influenced by different factors so there is no clear way to associate that with coverage either.

One option that may accomplish what you are seeking is the pslUnpile utility:

pslUnpile - Removes huge piles of alignments from sorted
psl files (due to unmasked repeats presumably).
usage:
   pslUnpile in.psl out.psl
options:
   -query - removes piles on query side
   -target - removes piles on target side (default)
   -nohead - suppress psl header in output.
   -maxPile - maximum number of hits allowed in a pile before filtering
              (default 100)
   -minPile - min number of hits allowed in a pile before filtering
              (default 1)
in.psl should be sorted by query or target as appropriate

You would have to convert your chain files to psl, but then the utility allows you to through out alignment 'piles' based on min/max hits.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/BLAPR09MB6756690EC34308313722C594E77A9%40BLAPR09MB6756.namprd09.prod.outlook.com.
Reply all
Reply to author
Forward
0 new messages