Hello,
I am attempting to generate a transcriptome fasta file from a human reference genome and human gene annotation (GTF) file using gffread to do quasi mapping with salmon, but I am running into an issue that I want to check I’m using the correct and compatible files. I’ve used this:
gffread -w genome/transcripts_homosapien.fa -g genome/hg38.fa annotations/hg38_UCSCgenes.gtf
Which returns:
Warning: couldn't find fasta record for 'chr1_KN196472v1_fix'!
Error: no genomic sequence available (check -g option!).
It seems to generate the transcripts fasta file, but I am concerned it’s not trustworthy because of the returned error message. The reference genome file I’ve used is: hg38.fa.gz from Dec. 2013 and the gene annotations file I’m using is Mammal, Human, Dec. 2013 GRCh38/hg38, Genes and Gene Predictions, NCBI Refseq, UCSC Refseq (refgene) in GTF output file format (screenshot attached).
I’ve run the same command format with C. elegans files (ce10.fa.gz & Refseq Genes, refGene) which seemed to work ok and did not return that error message.
Is there anything that is notable about the annotations or genome file for human from NCBI that would cause this?
Please let me know if there’s any other information I can provide to help. I appreciate any clarification you might be able to provide!
Best,
Alex Harvey
PhD Student
Department of Molecular Biology & Genetics
Aarhus University
Email: alha...@mbg.au.dk
Hello,
Thank you for using the UCSC Genome Browser and sending your inquiry.
There have been patches to the hg38 genome assembly since its initial release and more information about these patches can be found the following blog post:
You can find a FASTA file with the latest patch sequences from the following downloads page:
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/latest/
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/DB9PR01MB94156A7C3566032A2BA08714950C9%40DB9PR01MB9415.eurprd01.prod.exchangelabs.com.
On Nov 23, 2022, at 11:49 PM, Jairo Navarro Gonzalez <jnav...@ucsc.edu> wrote: