>gnl|UMD3.1|GK000010.2 Chromosome 10 AC_000167.1
GTGATAGCCACGTGATAAATGCATGATCATTTGCATGATCAGTGCATGGGCAGTCAGGTGATCAGTGTAT...
I think the way both files would be compatible it's by placing the chr number at the beginning of the header in the reference, so it would look like this:
>10 gnl|UMD3.1|GK000010.2 Chromosome 10 AC_000167.1
GTGATAGCCACGTGATAAATGCATGATCATTTGCATGATCAGTGCATGGGCAGTCAGGTGATCAGTGTAT...
I have posted a similar question in Biostars to specifically address the problem of re-ordering the header:
But now I think that the solution is not so simple, as unplaced genomic scaffolds are found in the genome, having the following notation:
>gnl|UMD3.1|GJ060418.1 GPS_000344858.1 NW_003101163.1
Here, of course there is no chromosome ID to match with the entries of the gtf file.
My questions are the following:
- How to make UMD3.1 compatible with its corresponding annotation (Bos_taurus.UMD3.1.87.gtf)?
- Has anyone performed indexing of the bos taurus genome?
would the solution be to convert the gff3 file into gtf with cufflinks?, It seems too complicated, as there is already a gtf file available.
- Is there another GTF file that will work?
Thank you in advance