Question about bigbedtobed output

9 views
Skip to first unread message

Jie Song

unread,
Nov 4, 2025, 6:02:54 PMNov 4
to UCSC Genome Browser Public Support
Hello!

I'm converting .bb file to .bed with bigBedToBed tools, but got weird content in bed file, like this:

  chr1 3206472 3215682 ENSMUSG00000051951[2,2,3] 0 - 3215682 3215682 0,0,0 2 845,2244, 0,6966, HAVANA ENSMUSG00000051951.5 Xkr4 ENSMUSG00000051951[2,2,3] Xkr4[2,2,3] 1.09 Max score(s) in <br>hippocampus_14d: 1.06<br>hippocampus_2mo: 1.09 <table> <tr><td>adrenal_10d</td><td>0.0</td></tr> <tr><td>adrenal_14d</td><td>0.0</td></tr> <tr><td>adrenal_18-20mo</td><td>0.0</td></tr> <tr><td>adrenal_25d</td><td>0.0</td></tr> <tr><td>adrenal_2mo</td><td>0.0</td></tr> <tr><td>adrenal_36d</td><td>0.0</td></tr> <tr><td>adrenal_4d</td><td>0.0</td></tr> <tr><td>adrenal_gland</td><td>0.39</td></tr> <tr><td>c2c12_myoblast (cell line)</td><td>0.0</td></tr> <tr><td>c2c12_myotube (cell line)</td><td>0.0</td></tr> <tr><td>cortex</td><td>0.05</td></tr> <tr><td>cortex_14d</td><td>0.0</td></tr> <tr><td>cortex_18-20mo</td><td>0.62</td></tr> <tr><td>cortex_2mo</td><td>0.66</td></tr> <tr><td>f1219 (cell line)</td><td>0.0</td></tr> <tr><td>forelimb_e11</td><td>0.0</td></tr> <tr><td>forelimb_e13</td><td>0.0</td></tr> <tr><td>gastroc_10d</td><td>0.0</td></tr> <tr><td>gastroc_14d</td><td>0.0</td></tr> <tr><td>gastroc_18-20mo</td><td>0.0</td></tr> <tr><td>gastroc_25d</td><td>0.0</td></tr> <tr><td>gastroc_2mo</td><td>0.0</td></tr> <tr><td>gastroc_36d</td><td>0.0</td></tr> <tr><td>gastroc_4d</td><td>0.0</td></tr> <tr><td>heart_14d</td><td>0.0</td></tr> <tr><td>heart_18-20mo</td><td>0.0</td></tr> <tr><td>heart_2mo</td><td>0.0</td></tr> <tr><td>hippocampus</td><td>0.44</td></tr> <tr><td>hippocampus_10d</td><td>0.0</td></tr> <tr><td>hippocampus_14d</td><td>1.06</td></tr> <tr><td>hippocampus_18-20mo</td><td>0.33</td></tr> <tr><td>hippocampus_2mo</td><td>1.09</td></tr></table>

  
The gene_id is ENSMUSG00000051951[2,2,3] instead of ENSMUSG00000051951, and there are some html content in the end. The original bb file is https://hgdownload.soe.ucsc.edu/gbdb/mm10/encode4/encode4LongRnaTranscripts.bb

What should I do to get a clean bed?

Thanks for your time. Looking forward for your reply.

Jie Song

Matthew Speir

unread,
Nov 10, 2025, 8:16:21 PMNov 10
to Jie Song, UCSC Genome Browser Public Support
Hello, Jie.

Thank you for your question about extracting data from a bigBed file.

The associated track description page contains details about the track data, including an explanation of what the numbers in the item names represent: https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm10&g=encode4LongRnaTranscripts. You can click on the "Data schema/format description and download" to see a full description of the fields in this file.

A bigBed file can contain any number of extra fields beyond those in a standard BED file. For the file you are looking at, there are 8 extra fields, including one which contains a html table that will be displayed on the track description page.

To get a BED file contain just the standard fields, you can use the Unix cut command to strip the last 8 fields:

bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/mm10/encode4/encode4LongRnaTranscripts.bb stdout | cut -f1-12 > encode4LongRnaTranscripts.bed

If you want to strip the extra details from the item names, you could add a sed command with a regex pattern to remove this.

Even if you don't change the item names, once you've removed those extra fields, you should be able to run bedToGenePred and genePredToGtf to create a gtf file. 

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.


---

Matthew Speir

UCSC Genome Browser, User Support


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/e7179702-b8c1-44d0-823c-63a4e7770f4an%40soe.ucsc.edu.
Reply all
Reply to author
Forward
0 new messages