Transcriptome Generation inquiry

75 views
Skip to first unread message

Alex Cameron Harvey

unread,
Nov 23, 2022, 12:20:44 PM11/23/22
to gen...@soe.ucsc.edu

Hello,

 

I am attempting to generate a transcriptome fasta file from a human reference genome and human gene annotation (GTF) file using gffread to do quasi mapping with salmon, but I am running into an issue that I want to check I’m using the correct and compatible files. I’ve used this:

 

gffread -w genome/transcripts_homosapien.fa -g genome/hg38.fa annotations/hg38_UCSCgenes.gtf

 

Which returns:

 

Warning: couldn't find fasta record for 'chr1_KN196472v1_fix'!

Error: no genomic sequence available (check -g option!).

 

It seems to generate the transcripts fasta file, but I am concerned it’s not trustworthy because of the returned error message. The reference genome file I’ve used is: hg38.fa.gz from Dec. 2013 and the gene annotations file I’m using is Mammal, Human, Dec. 2013 GRCh38/hg38, Genes and Gene Predictions, NCBI Refseq, UCSC Refseq (refgene) in GTF output file format (screenshot attached).

 

I’ve run the same command format with C. elegans files (ce10.fa.gz & Refseq Genes, refGene) which seemed to work ok and did not return that error message.

 

Is there anything that is notable about the annotations or genome file for human from NCBI that would cause this?

 

Please let me know if there’s any other information I can provide to help. I appreciate any clarification you might be able to provide!

 

Best,

 

Alex Harvey

PhD Student

Department of Molecular Biology & Genetics

Aarhus University

Email: alha...@mbg.au.dk

Screenshot 2022-11-23 at 4.10.08 PM.pdf

Jairo Navarro Gonzalez

unread,
Nov 23, 2022, 5:49:15 PM11/23/22
to Alex Cameron Harvey, gen...@soe.ucsc.edu

Hello,

Thank you for using the UCSC Genome Browser and sending your inquiry.

There have been patches to the hg38 genome assembly since its initial release and more information about these patches can be found the following blog post:

https://genome-blog.soe.ucsc.edu/blog/2019/02/22/patches/

You can find a FASTA file with the latest patch sequences from the following downloads page:

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/latest/

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro

UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/DB9PR01MB94156A7C3566032A2BA08714950C9%40DB9PR01MB9415.eurprd01.prod.exchangelabs.com.

Alex Cameron Harvey

unread,
Nov 28, 2022, 1:16:37 PM11/28/22
to Jairo Navarro Gonzalez, gen...@soe.ucsc.edu
Hi Jairo,

Thank for all the info! I believe this helps clarify things. I’ll be sure to reach out with more questions if not. Thanks again.

Best,
Alex

Sent from my iPhone

On Nov 23, 2022, at 11:49 PM, Jairo Navarro Gonzalez <jnav...@ucsc.edu> wrote:


Reply all
Reply to author
Forward
0 new messages