Hi Kevin,
Thank you for your question about filtering on BED fields other than
score. This feature is currently not supported for BED or bigBed files,
although the feature is supported for the broadPeak file format, where
you can filter on the signalValue, pValue and qValue fields:
http://genome.ucsc.edu/FAQ/FAQformat.html#format13
Here is the relevant trackDb lines from the example broadPeak track
you noted, "Transcription Factor Binding Sites by ChIP-seq from
ENCODE/PSU". You can modify these example stanzas according to your
data:
#####Peak Tracks########### track wgEncodePsuTfbsViewPeaks shortLabel Peaks view Peaks visibility pack #viewUi on subTrack wgEncodePsuTfbs signalFilter 0 signalFilterLimits 0:18241 pValueFilter 0 pValueFilterLimits 0:300 qValueFilter 0 qValueFilterLimits 0:300 track wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk parent wgEncodePsuTfbsViewPeaks shortLabel G1E-ER4 CTCF longLabel G1E-ER4+E2 CTCF TFBS ChIP-seq Peaks from ENCODE/PSU subGroups view=Peaks age=E0 factor=CTCF cellType=G1E_ER4_E2 control=INPUT sex=M strain=S129 type broadPeak color 153,38,0 # subId=4793 dateSubmitted=2011-08-19
Here the wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk track is the actual broadPeak table, and it is a subtrack of wgEncodePsuTfbsViewPeaks (which itself is a subtrack of wgEncodePsuTfbs, whose trackDb stanza is not shown).
If you format your data as a broadPeak type track, with the fields you would like to filter on as any of the signalValue, pValue, or qValue fields, and you have a corresponding trackDb statement that defines the filters, your tracks should filter appropriately.
Another option is to pre-filter your files, in our example case the
wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk file could be filtered three
times to make 3 BEDs like so:
wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk signalValue 0 - 18241
wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk pValue 0 - 300
wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk qValue 0 - 300
where the score field of each BED is the data that would be in the signalValue, pValue or qValue field.
You could then group these three tracks into a composite track to represent each of the 3 "filtering" options.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Christopher Lee
UCSC Genomics Institute
Kevin
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser mirror site discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirror+unsubscribe@soe.ucsc.edu.
Hi Christopher,
Thank you for the very helpful response. It makes much more sense now!
Though I do have one remaining request. Can you give an example of loading custom bed files into the database? I am using hgLoadbed with the –as=/path/to/autoSql.as and –type=bed6+7 options, but the generated databases are still using the standard bed column headers. The autoSql works fine with bedToBigbed, but I’d like to have the tracks within the database rather than *.bb as I assume this will load faster?
Help is appreciated.
Kevin
To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirro...@soe.ucsc.edu.
Dear Kevin,
Thank you for using the UCSC Genome Browser and your question about an example of loading custom bed files into the database with hgLoadBed.
If you wish to have the extra sorting feature that Christopher shared (signalFilterLimits and pValueFilter), it appears you will need to have your data formatted in the broadPeak format. In our source tree libraries there are some example .as and .sql files for different file types including broadPeak.sql that you can use:
wget "http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob_plain;f=src/hg/lib/encode/broadPeak.sql;hb=HEAD" -O broadPeak.sql
With the broadPeak.sql file you can use the -sqlTable and -renameSqlTable options to load BED6+3 data with a command like this:
hgLoadBed desiredDb desiredTableName bed6plus3.data -sqlTable=broadPeak.sql -renameSqlTable
Here is some example input bed6plus3.data from the mouse mm9 assembly that uses the broadPeak format:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -ANe 'select * from wgEncodePsuTfbsG1eer4e2CtcfME0S129InputPk limit 2;' mm9 | cut -f 2- chr1 4132626 4133056 . 180 . 14.39 13.762 0.0004 chr1 4322426 4323112 . 273 . 28.42 46.306 0.0002
track desiredTableName
shortLabel testBPeak
longLabel Test loading broadPeak track
type broadPeak
color 153,38,0
signalFilter 0
signalFilterLimits 0:18241
pValueFilter 0
pValueFilterLimits 0:300
qValueFilter 0
qValueFilterLimits 0:300
Unfortunately if you add more columns of data and move away from the defined broadPeak format the code is not designed to interpret the additional columns. We have created a feature ticket to add support of allowing filters on extra bigBed fields as our engineers agree such filtering options would be very useful. At this time it is not clear when such a new feature will be released, but your email has been added to the ticket.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genomics Institute
--To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirror+unsubscribe@soe.ucsc.edu.