Question about using BLAT

34 views
Skip to first unread message

SeungJ...@mdc-berlin.de

unread,
Sep 29, 2017, 11:07:11 AM9/29/17
to Cath Tyner, gen...@soe.ucsc.edu
Hi Cath,

do you mind if I ask you about using blat instead of UCSC Genome browser?

Currently, I tried to use blat for alignment and I realized that it takes too much time than I thought(more than a week).

Below is my command line:

blat -stepSize=5  -tileSize=8 -minScore=20 -minIdentity=50 -fine -q=rna -oneOff=1 -dots=1000 hg38.fa blat_test.fasta blat_test.psl

hg38.fa is a reference genome
blat_test.fasta is the file that I would like to see (which is 1000 long reads with 29 sequences)

So my question is the order of my command below is right.
blat [option] [ref] [query] [output]

If it is right, then can you tell me why it takes too much time for alignment?

I know blat is a fast alignment tool and I would like to use for analyzing.

Let me know is there something wrong with my command.

Thanks in advance.

Best regards,

Seung Kim



보낸 사람: Cath Tyner [ca...@ucsc.edu]
보낸 날짜: 2017년 8월 3일 목요일 오후 7:34
받는 사람: Kim, Seung Joon
참조: gen...@soe.ucsc.edu
제목: Re: 전달: [genome] Question about using UCSC genome browser

Hello Seung Kim,

Thank you for contacting the UCSC Genome Browser support team. Below are some options for visualizing your psl results in the browser.

Web-based BLAT: Directly add as custom track

1. We recently released an update to the browser which provides the ability to save web-based BLAT search results as a custom track there is no need to use any utilities. This update is only for the web-based BLAT tool, not command-line BLAT utility.

Command-line BLAT utility

2. It is recommended to use option 3, below, but I wanted to note that the uploading of a psl file is supported as a custom track in the browser. You can simply load your url to your file into custom tracks. For example, at the Custom Tracks tool in the "Paste URLs or data" box, you can paste in "http://hgwdev.cse.ucsc.edu/~cath/temp/smallExample.psl" to see psl results in the browser, as seen in this session.

3. psl to bigPsl

The best way to visualize psl results in the UCSC Genome Browser as acustom track is to first compress your psl file into a binary, indexed bigPsl file for faster performance. While it is possible to convert a psl to a bed file, it's not recommended because you will loose information in the conversion process.

You can follow these detailed instructions to convert psl to bigPsl and then load the bigPsl as a custom track in the browser:

Instructions for converting from psl to bigPsl and uploading a bigPsl custom track

You'll be using several utilities from the utilities directory (pslToBigPsl, bedToBigBed, and fetchChromSizes) to convert your psl file.

A summary of steps, which are detailed in the instructions, are:

1. Download utilities (pslToBigPsl, bedToBigBed, and fetchChromSizes)
2. Use pslToBigPsl to convet your psl file to a sorted bigPsl.txt file.
3. Use 'fetchChromSizes' to get the chrom.sizes files for whichever assembly you need (e.g., get the file hg38.chrom.sizes). 

Note: Using the utility "fetchChromSizes" is not necessary if you instead wish to 1) simply download the needed file from the downloads directory (e.g., here is the hg38.chrom.sizes file for hg38), OR 2) you can simply provide the url to the file in your command. 

4. Follow this link to the needed bigPsl.as file and save the file to the directory you are working in.

You should now have 3 files, 1)bigPsl.as 2)bigPsl.txt 3) hg38.chromsizes. The bedToBigBed utility needs all three of these files to create the output bigPsl.bb file.

5. Use the bedToBigBed utility to create the final bigPsl.bb file, which is the compressed binary version of your psl file.
6. Move the final bigPsl.bb file to a byte-range supported web-accessible server (such as Cyverse).
7. Paste in the url to your bigPsl.bb file in Custom Tracks and load the file into the browser.

Please carefully read the bigPsl instructions page, where you will find detailed examples that you can copy.

If you have problems or questions, please feel free to reply to this forum!
Thank you for contacting the UCSC Genome Browser support team. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-announce+subscribe...@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsubscri...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


On Thu, Aug 3, 2017 at 1:17 AM, SeungJ...@mdc-berlin.de <SeungJ...@mdc-berlin.de> wrote:
Dear Ann

Hi,I send you because Brian is on vacation.
Can I ask you about using psl2bed and BedToBigBed?

I did alignment using BLAT and the output is psl format.
I want to change to Bed format to see the result on UCSC browser so first I used psl2bed and made a Bed file and I follow the like website to make bb file
http://blog.naver.com/jinp7/221064735045
However, using BedToBigBed the below message comes out

bedToBigBed: Relink `/gnu/store/88wvqp60hbrdvbp0xsqad5c6njjfshcw-libpng-1.6.28/lib/libpng16.so.16' with `/gnu/store/ybpgv1v7606xw7mafda66w10hiynpiw2-glibc-2.25/lib/libpthread.so.0' for IFUNC symbol `longjmp'
pass1 - making usageList (452 chroms): 14763 millis
Error line 1 of sorted_ccs1a.bed: thickStart after thickEnd

Can you explain about the error message?
Also, can you advise me any better way to do from psl -> bb file?

Thank you in advance and I look forward to your reply.

Best wishes,

Seung Kim








보낸 사람: Kim, Seung Joon
보낸 날짜: 2017년 8월 2일 수요일 오후 2:20
받는 사람: Brian Lee
참조: gen...@soe.ucsc.edu
제목: 회신: [genome] Question about using UCSC genome browser

Thanks, now it works well!! Thanks for the advice

Seung Kim



보낸 사람: Brian Lee [bria...@soe.ucsc.edu]
보낸 날짜: 2017년 8월 1일 화요일 오후 8:36
받는 사람: Kim, Seung Joon
참조: gen...@soe.ucsc.edu
제목: Re: [genome] Question about using UCSC genome browser

Dear Seung Kim,

Thank you for using the UCSC Genome Browser and your question about using bigBed data in the Browser.

Unlike the BED files that you have been successfully uploading to the Browser, bigBeds instead must reside at an internet accessible location, referring to the location as a "bigDataUrl" that is accessed by ftp/http.

How bigBeds work is that only small segments of the files are passed over the internet, so even if your bigBed is several gigabytes in size, only the small section you are currently visualizing in the Browser is sent over the internet.

To do these internet transfers of only small segments of the files, the location that is hosting the bigBed must allow byte range requests. Most free services like DropBox do not enable this kind of access, as it could also result in people hosting pirated video on their free servers.

A free science based file sharing service does exist called CyVerse, sort of like the DropBox of scientific data. By uploading your bigBeds to their file server, you can create "Send to Browser" or "View in Browser" links that allow the file to have byte-range requests enabled.

Please see this previously answered question about using CyVerse:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/_Ws2jxNfJV4/0c5aGb_PAQAJ

Once you have a file at CyVerse (or any location with byte-range access, your institution may provide a location as well), you can paste the URL to your track on the Custom Track page.

For example, here is a bigBed at our local server that could be pasted in: http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb

And here is the same file hosted at CyVerse: http://de.cyverse.org/anon-files/iplant/home/brianlee/bigBedExample.bb

Please note you can add additional parameters to your custom track lines, such as defining a name beyond the file name, and giving a description and other attributes, see this link for more information: http://genome.ucsc.edu/goldenPath/help/customTrack.html#TRACK

track type=bigBed name="CV bigBed" description="A bigBed file at Cyverse" color=0,255,255 visibility=full bigDataUrl="http://de.cyverse.org/anon-files/iplant/home/brianlee/bigBedExample.bb"

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

Edit Watch

Also available in: Atom PDF

Powered by Redmine © 2006-2016 Jean-Philippe Lang

On Tue, Aug 1, 2017 at 5:08 AM, SeungJ...@mdc-berlin.de <SeungJ...@mdc-berlin.de> wrote:
>
> Dear UCSC Genome browser,
>
> hi, I am trying to use UCSC Genome browser and I have questions for using it.
>
> My work has been done making BedToBigBed file and I am trying to look at it.
>
> When I went to Custom track and upload my .bb file it comes with this message:
>
> Error File 'ccs1.bb' - It appears that you are directly uploading binary data of type bigBed. Custom tracks of this type require the files to be accessible by public http/https/ftp, and file URLs must be passed as the bigDataUrl setting on a "track" line. See bigBed custom track documentation for more information and examples.
>
> Can you explain it to me how can I access data by http or ftp?
> Although I read explanation, I cannot really understand. Some discussion mentioned I have to
> upload data file to Dropbox and use it. But I am not sure this is the right way.
> Then, is there any way to access data through directly from server?
>
> Also, when I upload my BED file, it works well and directly move to Genome browser.
> Can you tell me what is difference between them and also, is it possible to analyze data with BED instead of BB file?
>
> I am stuck in this step and I really want to utilize UCSC for my analyzing data.
> Thanks in advance and I look forward to your reply.
>
> Best regards,
>
> Seung Kim
>
>
>
>  
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
> To post to this group, send email to gen...@soe.ucsc.edu.
> Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
> To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/B2AF25D96D54DE4E89697F8383F4136B31267D08%40DAGONE.mdc-berlin.net.
> For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/B2AF25D96D54DE4E89697F8383F4136B31267E4A%40DAGONE.mdc-berlin.net.

Matthew Speir

unread,
Oct 4, 2017, 2:09:34 PM10/4/17
to SeungJ...@mdc-berlin.de, Cath Tyner, gen...@soe.ucsc.edu
Hi Seung Kim,

Thank you for your question about the command-line BLAT program.

There are a number of things that could be contributing to the slowness of your BLAT run. The two options for BLAT that are the most expensive time-wise are "-tileSize=8" and "-oneOff=1". You should use the defaults for these parameters, "-tileSize=11" and "-oneOff=0".

Your query could also be the root of the issue as queries that are repeats or are part of repetitive areas will slow BLAT down. If you are attempting to search for repetitive sequences, we cannot recommend using BLAT.

When choosing your BLAT settings, it will always be a trade-off between specificity and speed as some settings that increase specificity make the BLAT process exponentially slower. You will need find the right balance of specificity and speed for your application; it may be that you don't need to use those settings for maximum sensitivity.

Lastly, if you have the ability to parallelize the process somewhat, that should help increase the speed. For example, if you have a designated compute cluster or even a single server with a fair number of CPUs and a good amount of memory, you could separate the genome into individual chromosomes, run BLAT commands simultaneously for all of them, and then concatenate all of the output PSLs into a single file. Using an "ooc" file will help with tile-filtering consistency across all chromosomes. You can read about creating an "ooc" file in this answer to a previous mailing list question here: https://groups.google.com/a/soe.ucsc.edu/forum/#!search/ooc/genome/S6RY8Cx6eVM/JNCjWMUpBHUJ.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.


Matthew Speir
UCSC Genome Bioinformatics Group

SeungJ...@mdc-berlin.de

unread,
Oct 5, 2017, 11:35:37 AM10/5/17
to Matthew Speir, Cath Tyner, gen...@soe.ucsc.edu
Dear Matthew,

thank you very much. It was lots of help to me using BLAT.

Best regards,

Seung Kim



보낸 사람: Matthew Speir [msp...@soe.ucsc.edu]
보낸 날짜: 2017년 10월 4일 수요일 오후 8:09
받는 사람: Kim, Seung Joon; Cath Tyner
참조: gen...@soe.ucsc.edu
제목: Re: [genome] Question about using BLAT

SeungJ...@mdc-berlin.de

unread,
Dec 28, 2017, 3:03:40 PM12/28/17
to gen...@soe.ucsc.edu


Dear UCSC,

thanks for the help using BLAT and UCSC genome browser.

I have a question from your instruction.

I need 3 files for bedToBigBed utility which are 1)bigPsl.as 2)bigPsl.txt 3) hg38.chromsizes

I have txt and chromsizes files but I have no idea how to create bigPsl.as file. Not many explanations on Internet.

My file is 11.psl so my command is:

bedToBigBed -as=bigPsl.as 11.psl

Nothing was created.

Let me know how to do with this file.

Thanks in advance and happy holiday.

Best,

Seung Kim




보낸 사람: Kim, Seung Joon
보낸 날짜: 2017년 9월 29일 금요일 오전 10:55
받는 사람: Cath Tyner
참조: gen...@soe.ucsc.edu
제목: Question about using BLAT

Christopher Lee

unread,
Dec 28, 2017, 3:04:10 PM12/28/17
to SeungJ...@mdc-berlin.de, gen...@soe.ucsc.edu
Hi Seung,

Thank you for your question about creating a bigPsl file. As described
on the bigPsl help page:
http://genome.ucsc.edu/goldenPath/help/bigPsl

The example bigPsl.as file can be found here:
http://genome.ucsc.edu/goldenPath/help/examples/bigPsl.as

The correct command for making a bigPsl file is as follows:
bedToBigBed -as=bigPsl.as -type=bed12+13 -tab bigPsl.txt chrom.sizes bigPsl.bb

Where the chrom.sizes file is the chromosome sizes file for your
assembly of interest, and does not necessarily need to be for hg38.

Please let us know if you have any further questions!

Christopher Lee
UCSC Genomics Institute

Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining

Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.

> * Subscribe: Email genome-annou...@soe.ucsc.edu
> * Unsubscribe: Email genome-announ...@soe.ucsc.edu
> https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/B2AF25D96D54DE4E89697F8383F4136B4CA05519%40DAGONE.mdc-berlin.net.
Reply all
Reply to author
Forward
0 new messages