Filter Bed tracks on additional columns like 'score'

40 views
Skip to first unread message

Kevin Gillinder

unread,
Sep 12, 2016, 11:18:53 AM9/12/16
to genome...@soe.ucsc.edu
Hi,

I’d like to add custom bed tracks to our browser mirror with extra fields - bed 6 + 7 - that can be filtered in the same way score can. Sadly, I’ve tried looking everywhere but can’t seem to find examples of *how* to do this. the best example I can find of what I’m trying to achieve is the "Transcription Factor Binding Sites by ChIP-seq from ENCODE/PSU” where the “Peaks” view has the options to filter by:
Minimum Signal value: (0 to 18241)
Minimum P-Value (-log10): (0 to 300)
Minimum Q-Value (-log10): (0 to 300)

I’ve tried using bigBed files and am suspicious these won’t work for this feature?

I've tried loading standard bed tracks into a mysql table - which works - but the extra fields given by my autosql file do not appear as column names and there is only the option to filter by score. So something isn’t working right ...

Any help is appreciated.

Kevin

Christopher Lee

unread,
Sep 12, 2016, 7:17:10 PM9/12/16
to Kevin Gillinder, genome...@soe.ucsc.edu

Hi Kevin,

Thank you for your question about filtering on BED fields other than score. This feature is currently not supported for BED or bigBed files, although the feature is supported for the broadPeak file format, where you can filter on the signalValue, pValue and qValue fields:
http://genome.ucsc.edu/FAQ/FAQformat.html#format13

Here is the relevant trackDb lines from the example broadPeak track you noted, "Transcription Factor Binding Sites by ChIP-seq from ENCODE/PSU". You can modify these example stanzas according to your data:

#####Peak Tracks###########
track wgEncodePsuTfbsViewPeaks
shortLabel Peaks
view Peaks
visibility pack
#viewUi on
subTrack wgEncodePsuTfbs
signalFilter 0
signalFilterLimits 0:18241
pValueFilter 0
pValueFilterLimits 0:300
qValueFilter 0
qValueFilterLimits 0:300

track wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk
parent wgEncodePsuTfbsViewPeaks
shortLabel G1E-ER4 CTCF
longLabel G1E-ER4+E2 CTCF TFBS ChIP-seq Peaks from ENCODE/PSU
subGroups view=Peaks age=E0 factor=CTCF cellType=G1E_ER4_E2 control=INPUT sex=M strain=S129
type broadPeak
color 153,38,0
# subId=4793 dateSubmitted=2011-08-19

Here the wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk track is the actual broadPeak table, and it is a subtrack of wgEncodePsuTfbsViewPeaks (which itself is a subtrack of wgEncodePsuTfbs, whose trackDb stanza is not shown).

If you format your data as a broadPeak type track, with the fields you would like to filter on as any of the signalValue, pValue, or qValue fields, and you have a corresponding trackDb statement that defines the filters, your tracks should filter appropriately.

Another option is to pre-filter your files, in our example case the wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk file could be filtered three times to make 3 BEDs like so:
wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk signalValue 0 - 18241
wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk pValue 0 - 300
wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk qValue 0 - 300

where the score field of each BED is the data that would be in the signalValue, pValue or qValue field.

You could then group these three tracks into a composite track to represent each of the 3 "filtering" options.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Christopher Lee
UCSC Genomics Institute



Kevin

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser mirror site discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirror+unsubscribe@soe.ucsc.edu.


Kevin Gillinder

unread,
Sep 13, 2016, 12:08:46 PM9/13/16
to Christopher Lee, genome...@soe.ucsc.edu

Hi Christopher,

 

Thank you for the very helpful response. It makes much more sense now!

 

Though I do have one remaining request. Can you give an example of loading custom bed files into the database? I am using hgLoadbed with the –as=/path/to/autoSql.as and –type=bed6+7 options, but the generated databases are still using the standard bed column headers. The autoSql works fine with bedToBigbed, but I’d like to have the tracks within the database rather than *.bb as I assume this will load faster?

 

Help is appreciated.

 

Kevin

To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirro...@soe.ucsc.edu.

 

Brian Lee

unread,
Sep 16, 2016, 2:21:16 PM9/16/16
to Kevin Gillinder, Christopher Lee, genome...@soe.ucsc.edu

Dear Kevin,

Thank you for using the UCSC Genome Browser and your question about an example of loading custom bed files into the database with hgLoadBed.

If you wish to have the extra sorting feature that Christopher shared (signalFilterLimits and pValueFilter), it appears you will need to have your data formatted in the broadPeak format. In our source tree libraries there are some example .as and .sql files for different file types including broadPeak.sql that you can use:

wget "http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob_plain;f=src/hg/lib/encode/broadPeak.sql;hb=HEAD" -O broadPeak.sql

With the broadPeak.sql file you can use the -sqlTable and -renameSqlTable options to load BED6+3 data with a command like this:

hgLoadBed desiredDb desiredTableName bed6plus3.data -sqlTable=broadPeak.sql -renameSqlTable

Here is some example input bed6plus3.data from the mouse mm9 assembly that uses the broadPeak format:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -ANe 'select * from wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk limit 2;' mm9 | cut -f 2-
chr1 4132626 4133056 . 180 . 14.39 13.762 0.0004
chr1 4322426 4323112 . 273 . 28.42 46.306 0.0002

Here are some example trackDb edits to enable this track in a database:

track desiredTableName
shortLabel testBPeak
longLabel Test loading broadPeak track
type broadPeak
color 153,38,0


signalFilter 0
signalFilterLimits 0:18241
pValueFilter 0
pValueFilterLimits 0:300
qValueFilter 0
qValueFilterLimits 0:300

Unfortunately if you add more columns of data and move away from the defined broadPeak format the code is not designed to interpret the additional columns. We have created a feature ticket to add support of allowing filters on extra bigBed fields as our engineers agree such filtering options would be very useful. At this time it is not clear when such a new feature will be released, but your email has been added to the ticket.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute


To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirror+unsubscribe@soe.ucsc.edu.

 

--


Reply all
Reply to author
Forward
0 new messages