Dear Andrea,
Thank you for using the UCSC Genome Browser and your question about rmsk annotation that will include GRCh38.p13.
The short answer is that you can find most recent patch 13 annotation data in the rmsk track here: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/rmsk.txt.gz
This file will have the annotations for patch sequences, for instance if you load this session, you will see an example: http://genome.ucsc.edu/s/brianlee/chr19_ML143376v1_fix
This session has chr19_ML143376v1_fix for the sequence https://www.ncbi.nlm.nih.gov/nuccore/ML143376.1 found in GCA_000001405.28_GRCh38.p13_genomic.fna.gz highlighted in the middle, where annotations for both genes and the Repeating Elements by RepeatMasker are highlighted.
While the file you referenced is larger, rmskOutCurrent.txt.gz 2018-10-28 03:34 at 162M, the newer rmsk.txt.gz file (2021-09-03 14:58) at 147M has 640 sequence names reflecting new patches. Your question has sparked us to create a work ticket to update the rmskOutCurrent file.
To get the information for all the genome we recommend extracting the new patch sequence annotations from the above referenced rmsk file and adding those annotations to the existing rmskOutCurrent file, in order to get combined and relatively up-to-date annotations. These steps would require filtering out rows from rmsk if the genoName (chrom) is also found in rmskOutCurrent.
Here are some steps that could be used:
gunzip -c rmskOutCurrent.txt.gz | cut -f 6 | uniq > seqsInRmskOutCurrent.txt gunzip -c rmskOutCurrent.txt.gz > rmskOutCurrentCombined.txt gunzip -c rmsk.txt.gz| grep -vFwf seqsInRmskOutCurrent.txt >> rmskOutCurrentCombined.txt
In case you may ever wish to see the processes taken to build these files we do document our run steps inside our source tree and we can provide a link to this information if it would be helpful. However, we probably think you may just be interested in the RepeatMasker and library versions used summarized below. For rmskOutCurrent, and for sequences added to rmsk in patch releases through p12, these versions of RepeatMasker and its libraries were used:
For sequences added to rmsk in patch release p13, these versions were used:
Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further public questions, please send new questions to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum to help others find answers to similar questions. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu, which is a private internal list to our support team.
All the best,
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/32810B05-82CF-47AA-8E52-114E4A659615%40cshl.edu.