faToTwoBit

38 views
Skip to first unread message

Yeroslaviz, Assa

unread,
Dec 9, 2016, 11:42:56 AM12/9/16
to gen...@soe.ucsc.edu
Hi,

i'm having trouble working with the faToTwoBit tool.
I would like to do a BLAT search against nr.
I have downloaded the complete nr from the ncbi site and now I am trying tp convert it into 2bit format. 

Unfortunately i keep getting this error message 

Error in faToTwoBit, index overflow at WP_013989450.1. The 2bit format does not support indexes larger than 4Gb,
please split up into smaller files.

Is there something one can do about it?
Or do I really need to cut my nr into 20 pieces?

thanks in advance 

Assa

P.S. 
I am working on an ubuntu server with GNU/Linux 4.4.0-47-generic x86_64 (Ubuntu 16.04.1 LTS)

——

Assa Yeroslaviz, PhD
Application service, Bioinformatics group 
Max Planck Institute for Biochemistry
Am Klopferspitz 18, 82152 Martinsried
Germany
Tel:     +49 89 8578 2427
Email: yeros...@biochem.mpg.de

Hiram Clawson

unread,
Dec 9, 2016, 12:13:29 PM12/9/16
to Yeroslaviz, Assa, gen...@soe.ucsc.edu
Good Morning Assa:

Yes, you will need to break up your NR sequence into manageable chunks
that will fit into a 2bit file.

You would need to do this anyway for your analysis because if you could get
all the sequence in one file, that would then be too large to run an
efficient blat against it.

Partition both your query and your target sequence chunks into reasonable
sizes and numbers to obtain a reasonable run time for a blat of one
target set to the query set. Run all possible query to target
combinations in a compute cluster, filter the psl results to
your desired match criteria.

--Hiram
> Email: yeros...@biochem.mpg.de<mailto:yeros...@biochem.mpg.de>
>
Reply all
Reply to author
Forward
0 new messages