Brian,
I tried the bigBed approach, but seem to have run into a 255 char limit on the size of the “details” column, which is a pretty small limit when dealing with HTML fragments.
Is there a way around this (perhaps a TEXT column type J) or do I just need to trim down what data I’m presenting (seems too draconian – in particular I’d have to drop the flanking sequence on the probe…)
Regards,
Curtis
[curtish@cheaha immuno_beadchip]$ bedToBigBed -tab -as=immuno_beadchip_11419691_b.hg19.detail.as -type=bed8+2 -extraIndex=illuminaID,probeDetailsHTML immuno_beadchip_11419691_b.hg19merge.sort.detail.bed hg19.chrom.sizes immuno_beadchip_11419691_b.hg19merge.detail.bb
pass1 - making usageList (25 chroms): 2480 millis
Error line 1 of immuno_beadchip_11419691_b.hg19merge.sort.detail.bed: expecting length (1453) of string (<table border=1><tr><td>IlmnID</td><td>imm_1_898835-1_T_F_1767696649</td></tr><tr><td>Name</td><td>imm_1_898835</td></tr><tr><td>IlmnStrand</td><td>TOP</td></tr><tr><td>SNP</td><td>[A/C]</td></tr><tr><td>AddressA_ID</td><td>0038649493</td></tr><tr><td>AlleleA_ProbeSeq</td><td>CACACTTCGGAAACATCACACTCGCCCCTCTATGCCGACCCCTACACACC</td></tr><tr><td>AddressB_ID</td><td></td></tr><tr><td>AlleleB_ProbeSeq</td><td></td></tr><tr><td>GenomeBuild</td><td>36</td></tr><tr><td>Chr</td><td>1</td></tr><tr><td>MapInfo</td><td>898835</td></tr><tr><td>Ploidy</td><td>diploid</td></tr><tr><td>Species</td><td>Homo sapiens</td></tr><tr><td>Source</td><td>immunochip</td></tr><tr><td>SourceVersion</td><td>1</td></tr><tr><td>SourceStrand</td><td>TOP</td></tr><tr><td>SourceSeq</td><td>TGCCCCTGACCACACTTCGGAAACATCACACTCGCCCCTCTATGCCGACCCCTACACACC[A/C]CCCGCCACCTCCCACCGCAGGGTCACAGATGTCCGGGGCCTGGAGGAGGTCAGGCCCCTG</td></tr><tr><td>TopGenomicSeq</td><td>TGCCCCTGACCACACTTCGGAAACATCACACTCGCCCCTCTATGCCGACCCCTACACACC[A/C]CCCGCCACCTCCCACCGCAGGGTCACAGATGTCCGGGGCCTGGAGGAGGTCAGGCCCCTG</td></tr><tr><td>BeadSetID</td><td>285</td></tr><tr><td>Name</td><td>1kg_1_92921299</td></tr><tr><td>Chr</td><td>1</td></tr><tr><td>Coordinate</td><td>92921299</td></tr><tr><td>GeneSymbol</td><td>EVI5</td></tr><tr><td>GeneLocation</td><td>INTRON</td></tr><tr><td>ExonLocation</td><td>NA</td></tr><tr><td>CodingStatus</td><td>NA</td></tr><tr><td>AminoAcid1|AminoAcid2</td><td>NA</td></tr></table>) not to exceed 255 in field probeDetailsHTML
[curtish@cheaha immuno_beadchip]$
[curtish@cheaha immuno_beadchip]$ cat immuno_beadchip_11419691_b*as
table iChiphg19
"Illumina Human Immno BeadChip probes, mapped to hg19 with Illumina IDs and HTML details"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position of feature on chromosome"
uint chromEnd; "End position of feature on chromosome"
string name; "Name of gene"
uint score; "Score"
char[1] strand; "+ or - for strand"
uint thickStart; "Coding region start"
uint thickEnd; "Coding region end"
string illuminaID; "IlluminaID"
string probeDetailsHTML; "Probe details, formated as an HTML table"
)
From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: Tuesday, March 11, 2014 5:00 PM
To: Curtis Hendrickson (Campus)
Subject: Re: [genome-www] display of bedDetail description is broken (truncated to 1 character)
Hi Curtis,
Yes, the bedToBigBed utility, which I forgot to mention is obtainable from here, http://hgdownload.cse.ucsc.edu/admin/exe/, needs the .as for any non-standard BED formats.
After you download and make it executable, you can run bedToBigBed alone to see the usage statements, note the "-as=fields.as" option:
$bedToBigBed
bedToBigBed v. 2.6 - Convert bed file to bigBed. (BigBed version: 4)
usage:
bedToBigBed in.bed chrom.sizes out.bb
Where in.bed is in one of the ascii bed formats, but not including track lines
and chrom.sizes is two column: <chromosome name> <size in bases>
and out.bb is the output indexed big bed file.
Use the script: fetchChromSizes to obtain the actual chrom.sizes information
from UCSC, please do not make up a chrom sizes from your own information.
The in.bed file must be sorted by chromosome,start,
to sort a bed file, use the unix sort command:
sort -k1,1 -k2,2n unsorted.bed > sorted.bed
options:
-type=bedN[+[P]] :
N is between 3 and 15,
optional (+) if extra "bedPlus" fields,
optional P specifies the number of extra fields. Not required, but preferred.
Examples: -type=bed6 or -type=bed6+ or -type=bed6+3
(see http://genome.ucsc.edu/FAQ/FAQformat.html#format1)
-as=fields.as - If you have non-standard "bedPlus" fields, it's great to put a definition
of each field in a row in AutoSql format here.
-blockSize=N - Number of items to bundle in r-tree. Default 256
-itemsPerSlot=N - Number of data points bundled at lowest level. Default 512
-unc - If set, do not use compression.
-tab - If set, expect fields to be tab separated, normally
expects white space separator.
-extraIndex=fieldList - If set, make an index on each field in a comma separated list
extraIndex=name and extraIndex=name,id are commonly used.
Also, here's a link to the hg19.chrom.sizes: http://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes
All the best,
Brian
On Tue, Mar 11, 2014 at 2:46 PM, Curtis Hendrickson (Campus) <cur...@uab.edu> wrote:
In otherwords, the .as file is required, even if you’re “coloring inside the lines” of bedDetail?
Thanks
Curtis
From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: Tuesday, March 11, 2014 4:42 PM
To: Curtis Hendrickson (Campus)
Subject: Re: [genome-www] display of bedDetail description is broken (truncated to 1 character)
Hi Curtis,
You can just send the following customTrack lines to host the created bigBed versus adding the three additional hub text files:
browser position chr7:127471196-127500000
track type=bigBed visibility=3 bigDataUrl=http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/hubBedDetail/hg18/bedDetail.bb
You should be able to put any combination needed, if you look under the "Example 3" example referenced you'll see these lines:
BigBed files can store extra fields in addition to the predefined BED fields. If you add extra fields to your bigBed file, you must include a .as (AutoSQL) format file describing the fields. See this paper for information on AutoSQL.
All the best,
Brian
On Tue, Mar 11, 2014 at 2:28 PM, Curtis Hendrickson (Campus) <cur...@uab.edu> wrote:
Brian,
Great. Thanks for the work around.
Do I need to go the distance to create a TrackHub, or would it be sufficient to just host the bigbed (.bb) file on my public HTTP site?
Also, when making bedDetail into .bb, must type be -type=bed9+2 or can I put essentially any appropriate number in place of the 9? Mine are bed8+2, essentially.
Regards,
Curtis
From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: Tuesday, March 11, 2014 3:45 PM
To: Curtis Hendrickson (Campus)
Cc: genom...@soe.ucsc.edu
Subject: Re: [genome-www] display of bedDetail description is broken (truncated to 1 character)
Dear Curtis,
There is a solution that uses the hub feature that encompasses many additional benefits.
Hubs are the remote hosting of binary files, and three text files, that the browser accesses quickly and allows you to ensure the longevity of your data compared to custom tracks. Read more about hubs here in this archived mailing list question: https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!searchin/genome/trackDb.txt$20genomes.txt$20brianlee/genome/mUH5GtWXWaw/AVrt7hGy7dcJ
Hubs use can bigBed files which can be created from bedDetail BED files. For example, a file that contained entries such as the following:
chr7 54028 73584 uc003sii.2 0 - 54028 54028 255,0,0 . AL137655
...
Can be displayed in the browser through a bigBed hub so that it would appear as:
Name of gene: uc003sii.2
Score: 0
+ or - for strand: -
Coding region start: 54028
Coding region end: 54028
Green on + strand, Red on - strand: 255,0,0
Gene Symbol: .
SWISS-PROT protein Accession number: AL137655
As defined by a custom ".as file" as seen here for this example: http://genome.ucsc.edu/goldenPath/help/examples/bedExample2.as
Here is an example hub URL you can paste into our Hub page, http://genome.ucsc.edu/cgi-bin/hgHubConnect, and use as a working example to copy if you wish to pursue this route:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/hubBedDetail/hub.txt
Please note that it is displaying data on the hg18 assembly and only on chr7.http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hubUrl=http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/hubBedDetail/hub.txt&position=chr7%3A1-158821424
To obtain the bigBed from the bedDetail BED, you would follow the Example 3 steps found here: http://genome.ucsc.edu/goldenPath/help/bigBed.html
bedToBigBed -as=bedExample2.as -type=bed9+2 -extraIndex=name,geneSymbol bedExample2.bed hg18.chrom.sizes myBigBed2.bb
You could use the provided trackDb.txt to display the resulting file if renamed to bedDetail.bb:
track bigBed1
bigDataUrl bedDetail.bb
shortLabel bigBed bedDetail example
longLabel This is a bigBed created from a bedDetail file using the .as file option outlined here:http://genome.ucsc.edu/goldenPath/help/bigBed.html
type bigBed
visibility full
There are many advantages to pursuing the option of remotely hosting your files through a hub and it is highly recommended.
All the best,
Brian Lee
UCSC Bioinformatics Group
On Tue, Mar 11, 2014 at 12:57 PM, Brian Lee <bria...@soe.ucsc.edu> wrote:
Dear Curtis,
Thank you for using the UCSC Genome Browser and bringing our attention to this bug in our bedDetail custom tracks.
We have filed a ticket to correct this error, but are unable to predict an exact date for when the fix will be available. Please know that we will contact you as soon as there is a resolution.
Thank you again for your using and helping improve the UCSC Genome Browser by reporting this bug. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
All the best,
Brian Lee
[… trimmed …]
--
Brian,
Looks good on the test server.
How can I make sure I’m on the appropriate announcement list?
Also, will I need to download a new version of bedToBigBed ? After changing “string” => “lstring” in m .as file, the current bedToBigBed (2014-03-05 version) gives me
# compile details.bed + .as --> big bed
bedToBigBed -tab -as=immuno_beadchip_11419691_b.hg19.detail.as -type=bed8+2 -extraIndex=illuminaID,probeDetailsHTML out/immuno_beadchip_11419691_b.hg19merge.sort.detail.bed hg19.chrom.sizes out/immuno_beadchip_11419691_b.hg19merge.sort.detail.bb
Sorry for now can only index string fields.
Hi Curtis,
Thank you for trying the bigBed approach, the bedDetail custom track changes will be out after next Tuesday. An announcement list does not exist for software updates, they regularly occur every three weeks. However, when there are big new features we do send out announcements to genome-...@soe.ucsc.edu, you can subscribe via a link from the bottom of this page: http://genome.ucsc.edu/contacts.html If you are curious to track the software changes you can view this page: http://genecats.cse.ucsc.edu/builds/versions.html
There will not be a new bedToBigBed, the -searchIndex option is only for strings as identifiers, and so the error you are getting is correct when you try to build an index on the lstring. However, there is an approach that will work, but first note that searching on bigBeds will only work via the Track Hub approach with the necessary trackDb.txt definitions, not as custom tracks.
I set up and example hub you can use to model: http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/searchHubExample/hub.txt
Here are the steps you would likely take to build a bigBed searchable on the illuminaIDs and then link to your text about each.
1) Run your bedToBigBed with only the one index -extraIndex=illuminaID (leaving out the long probeDetailsHTML)
2) Place that bb file in hub and in the trackDb.txt add the line "searchIndex illuminaID".
You can see where I put "searchIndex name" in my example when I created an extraIndex on the name field:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/searchHubExample/hg19/trackDb.txt
Before proceeding be sure this is working properly, now you should be able to search your hub's bigBed with your illuminaIDs
3) Now you can create a new text file that will have your illuminaID followed by text, in my example I call it infoSJs.txt. You might have something like illumIDs.txt looking like the following:
imm_1_898835-1_T_F_1767696649 Name imm_1_898835 IlmnStrand TOP SNP [A/C] AddressA_ID 0038649493 ....
id2 Text.......
id3 Text.....(I suggest removing <html> tags by doing something like "cat original.illumIDs.txt | sed 's/<[^>]\+>/ /g' >illumIDs.txt"):
4). With this text file you can run the ixIxx program, ixIxx illumIDs.txt illum.ix illum.ixx
5). To your bigBed's trackDb.txt stanza you would then add "searchTrix illum.ix"
6). Then, when browsing the hub you should be able to search for items like AddressA_ID "0038649493" and it will bring up the position of the related identifier, imm_1_898835-1_T_F_1767696649, as defined in the illumIDs.txt file.
I put some notes on in the hg19/trackDb.txt example above that you can read and follow to recreate that example if you want to try that first.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group
Brian,
Thanks for the thorough response.
Apologies if I tried to serachIndex the HTML – you’re correct that I only want to display it.
Indexing of the IlluminaID’s is nice, but secondary to displaying the detailed information about the probe in the details page.
I’ll go through the details of your example next week.