How do I index GTF for pysam

447 views
Skip to first unread message

Alex Rogozhnikov

unread,
Aug 19, 2020, 1:55:12 AM8/19/20
to Pysam User group
Tried to open GTF file with pysam, index seems like a hard requirement to do anything (even plain fetch).
OSError: index `.../Homo_sapiens.GRCh38.96.gtf.tbi` not found

I've tried pysam.index (that's for bam, I know, but I should've tried!)
/Homo_sapiens.GRCh38.96.gtf" is in a format that cannot be usefully indexed\n'

I've tried tabix - also doesn't work. Super helpful error:
tbx_index_build failed: Homo_sapiens.GRCh38.96.gtf

If there is a simple way to get mapping transcript to genome in python - please let me know

Thank you!

Alex Rogozhnikov

unread,
Aug 19, 2020, 3:40:01 PM8/19/20
to pysam-us...@googlegroups.com
Ok, here is the way that worked for me

0. install tabix with apt get

1.sort (there are some perl/python utils around which I didn’t check)

(grep -v "Parent=" Homo_sapiens.GRCh38.96.gtf|sort -k1,1 -k4,4n -k5,5n;grep "Parent=" Homo_sapiens.GRCh38.96.gtf|sort -k1,1 -k4,4n -k5,5n)| sort -k1,1 -k4,4n -s > Homo_sapiens.GRCh38.96.sorted.gtf

2. compress with bgzip (indexing works only on compressed)

bgzip -c Homo_sapiens.GRCh38.96.sorted.gtf > Homo_sapiens.GRCh38.96.sorted.gtf.gz

3. create index with tabix

tabix Homo_sapiens.GRCh38.96.sorted.gtf.gz




--
You received this message because you are subscribed to a topic in the Google Groups "Pysam User group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pysam-user-group/RhipAYoOB_U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pysam-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pysam-user-group/a3b1aea1-b9b6-4bd0-b052-f7def0faaeaen%40googlegroups.com.

Alex Rogozhnikov

unread,
Aug 29, 2020, 3:41:24 AM8/29/20
to Pysam User group
even better solution is to use pyensembl package which handles downloading/parsing and has a nice python interface.
So you'll not meet this problem
Reply all
Reply to author
Forward
0 new messages