How to generate ribosome BED file

163 views
Skip to first unread message

Yue Zhao

unread,
Dec 7, 2020, 10:34:58 PM12/7/20
to rseqc-discuss
Hello,

I am working on mouse (GRCh39) and would like to get the Ribosome BED file from UCSC table browser for RSeQC. You mentioned a few times that this can be easily generated from UCSC table browser. However, when I followed this instruction:
:

Change the output from "gtf" to "bed" from the above instruction gave me a 6 column bed file, which doesn't work with RSeQC:

chr1    7846020 7846076 LSU-rRNA_Hsa    291     +

chr1    15712400        15712441        LSU-rRNA_Hsa    252     +

chr1    24143181        24143258        5S      242     -

Directly download the ribosome gtf file and convert the gtf file to bed file by gtf2bed from the BEDOPS tool gave me a 10 column bed file which didn't work either:

chr1    7846020 7846076 LSU-rRNA_Hsa    291.000000      +       mm39_rmsk       exon    .       gene_id "LSU-rRNA_Hsa"; transcript_id "LSU-rRNA_Hsa";

chr1    15712400        15712441        LSU-rRNA_Hsa    252.000000      +       mm39_rmsk       exon    .       gene_id "LSU-rRNA_Hsa"; transcript_id "LSU-rR

chr1    24143181        24143258        5S      242.000000      -       mm39_rmsk       exon    .       gene_id "5S"; transcript_id "5S";


Could you please advise me how to get the ribosome BED file from UCSC table browser?

Thank you very much!

Yue

Yue Zhao

unread,
Dec 7, 2020, 10:53:04 PM12/7/20
to rseqc-discuss
Hi,

I think I figured it out after sending the previous email.
Actually the original 6 column bed file from USCS table browser will work after adding column 7 to 12:

column 7 = column 2
column 8 = column 3
column 9 = 0
column 10 = 1
column 11 = column 3 - column 2
column 12 = 0

Thanks,
Yue

payal banerjee

unread,
Jun 21, 2022, 3:42:24 PM6/21/22
to rseqc-discuss
Hello, 

I am facing the same problem. I downloaded the mm39 gtf file from Gencode. Then extracted just the rRNA. I also used gtf to bed. The output is not compatible with Rseqc.
Can you please explain in detail, how to download mm39 rRNA bed file from UCSC table browser or create bed file that works with Rseqc? Or give a header example of  how the bed file looks like before and after transformation?

Thanks,
Payal

Liguo Wang

unread,
Jun 21, 2022, 5:03:02 PM6/21/22
to rseqc-...@googlegroups.com
This is what I did.

Step-1
I downloaded the mouse (mm39) GENCODE BED file from the UCSC Table browser, and saved it as "GRCm39_GENCODE_VM27.bed". In this file, the 4th column contains transcript IDs, next I only need to find out which transcripts are rRNA genes.

Step-2
I downloaded GTF file from GENCODE website (https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M29/gencode.vM29.annotation.gtf.gz). Uncompress this file.

Step-3
I use the command below to extract all the rRNA transcript ID from this GTF file, and saved as "rRNA.ID"
    cat gencode.vM29.annotation.gtf | awk '$3 == "transcript"' | grep "rRNA" | awk '{print $12}' | perl -p -e 's/[";]//g'  >rRNA.ID

Step-4
Extract rRNA.ID from "GRCm39_GENCODE_VM27.bed" and save as "GRCm39_GENCODE_VM27_rRNAs.bed". Using command below:
   cat GRCm39_GENCODE_VM27.bed | grep -f rRNA.ID > GRCm39_GENCODE_VM27_rRNAs.bed


"GRCm39_GENCODE_VM27.bed" and "GRCm39_GENCODE_VM27_rRNAs.bed" are attached. This is just a demonstration, you will notice that the GTF file is version 29, while BED file is version 27. 

-Liguo

--
You received this message because you are subscribed to the Google Groups "rseqc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rseqc-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rseqc-discuss/f1716611-e6b9-4eb6-928b-88584646dc52n%40googlegroups.com.
GRCm39_GENCODE_VM27.zip

payal banerjee

unread,
Jun 22, 2022, 12:43:19 PM6/22/22
to rseqc-discuss
Thank you very much. This bed file works. Please post it on RSeQC website. I am sure many will benefit.

Payal

Reply all
Reply to author
Forward
0 new messages