FW: [genome-www] display of bedDetail description is broken (truncated to 1 character)

151 views
Skip to first unread message

Curtis Hendrickson (Campus)

unread,
Mar 18, 2014, 11:50:39 AM3/18/14
to gen...@soe.ucsc.edu

Brian,

 

I tried the bigBed approach, but seem to have run into a 255 char limit on the size of the “details” column, which is a pretty small limit when dealing with HTML fragments.

Is there a way around this (perhaps a TEXT column type J) or do I just need to trim down what data I’m presenting (seems too draconian – in particular I’d have to drop the flanking sequence on the probe…)

 

Regards,

Curtis

 

 

[curtish@cheaha immuno_beadchip]$ bedToBigBed -tab -as=immuno_beadchip_11419691_b.hg19.detail.as -type=bed8+2 -extraIndex=illuminaID,probeDetailsHTML immuno_beadchip_11419691_b.hg19merge.sort.detail.bed hg19.chrom.sizes immuno_beadchip_11419691_b.hg19merge.detail.bb

pass1 - making usageList (25 chroms): 2480 millis

Error line 1 of immuno_beadchip_11419691_b.hg19merge.sort.detail.bed: expecting length (1453) of string (<table border=1><tr><td>IlmnID</td><td>imm_1_898835-1_T_F_1767696649</td></tr><tr><td>Name</td><td>imm_1_898835</td></tr><tr><td>IlmnStrand</td><td>TOP</td></tr><tr><td>SNP</td><td>[A/C]</td></tr><tr><td>AddressA_ID</td><td>0038649493</td></tr><tr><td>AlleleA_ProbeSeq</td><td>CACACTTCGGAAACATCACACTCGCCCCTCTATGCCGACCCCTACACACC</td></tr><tr><td>AddressB_ID</td><td></td></tr><tr><td>AlleleB_ProbeSeq</td><td></td></tr><tr><td>GenomeBuild</td><td>36</td></tr><tr><td>Chr</td><td>1</td></tr><tr><td>MapInfo</td><td>898835</td></tr><tr><td>Ploidy</td><td>diploid</td></tr><tr><td>Species</td><td>Homo sapiens</td></tr><tr><td>Source</td><td>immunochip</td></tr><tr><td>SourceVersion</td><td>1</td></tr><tr><td>SourceStrand</td><td>TOP</td></tr><tr><td>SourceSeq</td><td>TGCCCCTGACCACACTTCGGAAACATCACACTCGCCCCTCTATGCCGACCCCTACACACC[A/C]CCCGCCACCTCCCACCGCAGGGTCACAGATGTCCGGGGCCTGGAGGAGGTCAGGCCCCTG</td></tr><tr><td>TopGenomicSeq</td><td>TGCCCCTGACCACACTTCGGAAACATCACACTCGCCCCTCTATGCCGACCCCTACACACC[A/C]CCCGCCACCTCCCACCGCAGGGTCACAGATGTCCGGGGCCTGGAGGAGGTCAGGCCCCTG</td></tr><tr><td>BeadSetID</td><td>285</td></tr><tr><td>Name</td><td>1kg_1_92921299</td></tr><tr><td>Chr</td><td>1</td></tr><tr><td>Coordinate</td><td>92921299</td></tr><tr><td>GeneSymbol</td><td>EVI5</td></tr><tr><td>GeneLocation</td><td>INTRON</td></tr><tr><td>ExonLocation</td><td>NA</td></tr><tr><td>CodingStatus</td><td>NA</td></tr><tr><td>AminoAcid1|AminoAcid2</td><td>NA</td></tr></table>) not to exceed 255 in field probeDetailsHTML

[curtish@cheaha immuno_beadchip]$

[curtish@cheaha immuno_beadchip]$ cat immuno_beadchip_11419691_b*as

table iChiphg19

"Illumina Human Immno BeadChip probes, mapped to hg19 with Illumina IDs and HTML details"

(

string  chrom;          "Reference sequence chromosome or scaffold"

uint    chromStart;     "Start position of feature on chromosome"

uint    chromEnd;       "End position of feature on chromosome"

string  name;           "Name of gene"

uint    score;          "Score"

char[1] strand;         "+ or - for strand"

uint    thickStart;     "Coding region start"

uint    thickEnd;       "Coding region end"

string  illuminaID;     "IlluminaID"

string  probeDetailsHTML;               "Probe details, formated as an HTML table"

)

 

From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: Tuesday, March 11, 2014 5:00 PM


To: Curtis Hendrickson (Campus)
Subject: Re: [genome-www] display of bedDetail description is broken (truncated to 1 character)

 

Hi Curtis,

Yes, the bedToBigBed utility, which I forgot to mention is obtainable from here, http://hgdownload.cse.ucsc.edu/admin/exe/, needs the .as for any non-standard BED formats.

After you download and make it executable, you can run bedToBigBed alone to see the usage statements, note the "-as=fields.as" option:

$bedToBigBed 
bedToBigBed v. 2.6 - Convert bed file to bigBed. (BigBed version: 4)
usage:
   bedToBigBed in.bed chrom.sizes out.bb
Where in.bed is in one of the ascii bed formats, but not including track lines
and chrom.sizes is two column: <chromosome name> <size in bases>
and out.bb is the output indexed big bed file.
Use the script: fetchChromSizes to obtain the actual chrom.sizes information
from UCSC, please do not make up a chrom sizes from your own information.
The in.bed file must be sorted by chromosome,start,
  to sort a bed file, use the unix sort command:
     sort -k1,1 -k2,2n unsorted.bed > sorted.bed
 
options:
   -type=bedN[+[P]] : 
                      N is between 3 and 15, 
                      optional (+) if extra "bedPlus" fields, 
                      optional P specifies the number of extra fields. Not required, but preferred.
                      Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 
                      (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1)
   -as=fields.as - If you have non-standard "bedPlus" fields, it's great to put a definition
                   of each field in a row in AutoSql format here.
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 512
   -unc - If set, do not use compression.
   -tab - If set, expect fields to be tab separated, normally
           expects white space separator.
   -extraIndex=fieldList - If set, make an index on each field in a comma separated list
           extraIndex=name and extraIndex=name,id are commonly used.

Also, here's a link to the hg19.chrom.sizes: http://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes

All the best,
Brian

 

On Tue, Mar 11, 2014 at 2:46 PM, Curtis Hendrickson (Campus) <cur...@uab.edu> wrote:

In otherwords, the .as file is required, even if you’re “coloring inside the lines” of bedDetail?

 

Thanks

Curtis

 

 

From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: Tuesday, March 11, 2014 4:42 PM
To: Curtis Hendrickson (Campus)


Subject: Re: [genome-www] display of bedDetail description is broken (truncated to 1 character)

 

Hi Curtis,

You can just send the following customTrack lines to host the created bigBed versus adding the three additional hub text files:

browser position chr7:127471196-127500000
track type=bigBed visibility=3 bigDataUrl=http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/hubBedDetail/hg18/bedDetail.bb

You should be able to put any combination needed, if you look under the "Example 3" example referenced you'll see these lines:

BigBed files can store extra fields in addition to the predefined BED fields. If you add extra fields to your bigBed file, you must include a .as (AutoSQL) format file describing the fields. See this paper for information on AutoSQL.

All the best,
Brian

 

On Tue, Mar 11, 2014 at 2:28 PM, Curtis Hendrickson (Campus) <cur...@uab.edu> wrote:

Brian,

 

Great. Thanks for the work around.

Do I need to go the distance to create a TrackHub, or would it be sufficient to just host the bigbed (.bb) file on my public HTTP site?

 

Also, when making bedDetail into .bb, must type be -type=bed9+2 or can I put essentially any appropriate number in place of the 9? Mine are bed8+2, essentially.

 

Regards,

Curtis

 

From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: Tuesday, March 11, 2014 3:45 PM
To: Curtis Hendrickson (Campus)
Cc: genom...@soe.ucsc.edu
Subject: Re: [genome-www] display of bedDetail description is broken (truncated to 1 character)

 

Dear Curtis,

There is a solution that uses the hub feature that encompasses many additional benefits.

Hubs are the remote hosting of binary files, and three text files, that the browser accesses quickly and allows you to ensure the longevity of your data compared to custom tracks. Read more about hubs here in this archived mailing list question: https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!searchin/genome/trackDb.txt$20genomes.txt$20brianlee/genome/mUH5GtWXWaw/AVrt7hGy7dcJ

Hubs use can bigBed files which can be created from bedDetail BED files. For example, a file that contained entries such as the following:
chr7 54028 73584 uc003sii.2 0 - 54028 54028 255,0,0 . AL137655
...

Can be displayed in the browser through a bigBed hub so that it would appear as:

Name of gene:    uc003sii.2
Score:    0
+ or - for strand:    -
Coding region start:    54028
Coding region end:    54028
Green on + strand, Red on - strand:    255,0,0
Gene Symbol:    .
SWISS-PROT protein Accession number:    AL137655

As defined by a custom ".as file" as seen here for this example: http://genome.ucsc.edu/goldenPath/help/examples/bedExample2.as

Here is an example hub URL you can paste into our Hub page, http://genome.ucsc.edu/cgi-bin/hgHubConnect, and use as a working example to copy if you wish to pursue this route:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/hubBedDetail/hub.txt

Please note that it is displaying data on the hg18 assembly and only on chr7.http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hubUrl=http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/hubBedDetail/hub.txt&position=chr7%3A1-158821424

To obtain the bigBed from the bedDetail BED, you would follow the Example 3 steps found here: http://genome.ucsc.edu/goldenPath/help/bigBed.html

bedToBigBed -as=bedExample2.as -type=bed9+2 -extraIndex=name,geneSymbol bedExample2.bed hg18.chrom.sizes myBigBed2.bb

You could use the provided trackDb.txt to display the resulting file if renamed to bedDetail.bb:

track bigBed1
bigDataUrl bedDetail.bb
shortLabel bigBed bedDetail example
longLabel This is a bigBed created from a bedDetail file using the .as file option outlined here:http://genome.ucsc.edu/goldenPath/help/bigBed.html
type bigBed
visibility full

There are many advantages to pursuing the option of remotely hosting your files through a hub and it is highly recommended.

All the best,

Brian Lee
UCSC Bioinformatics Group

 

On Tue, Mar 11, 2014 at 12:57 PM, Brian Lee <bria...@soe.ucsc.edu> wrote:

Dear Curtis,

 

Thank you for using the UCSC Genome Browser and bringing our attention to this bug in our bedDetail custom tracks.

 

We have filed a ticket to correct this error, but are unable to predict an exact date for when the fix will be available.  Please know that we will contact you as soon as there is a resolution. 

 

Thank you again for your using and helping improve the UCSC Genome Browser by reporting this bug. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

 

All the best,

 

Brian Lee

 [… trimmed …]

Brian Lee

unread,
Mar 18, 2014, 3:55:54 PM3/18/14
to Curtis Hendrickson (Campus), gen...@soe.ucsc.edu
Dear Curtis,

Thank you for your message sharing your troubleshooting of creating the bigBed files and hitting a 255 char limit when using string for your definition field in the .as file.

The browser uses a special SQL type for long stretches of text, especially used for our html pages that we call longblob.  Now that the bedDetail error that you discovered and helped us address has been fixed on our development server, and should be out on the public server soon with our next software release, you can see an example of your desired track working by clicking this link: http://genome-test.soe.ucsc.edu/cgi-bin/hgTracks?hgt.customText=http://hgwdev.cse.ucsc.edu/~brianlee/customTracks/bedDetailMLQ12873

Once you have loaded the the link to genome-test, select the Table Browser link from under the Tools drop down menu.  Then make the selection "Group: Custom Tracks" and "track: HbVar" and "table: ct_HbVar_####".  Then click the "Describe table schema" button.  You will see that the description line has this unique longblob as the MySQL type to capture lengthy strings of text.

Our engineers share that using string becomes varchar(255) in the browser, and shares if you rename the line in your .as definitions you have from string to lstring, then it will be translated into the browser as our longblob.

 lstring probeDetailsHTML; "Probe details, formated as an HTML table"

Thank you for all your efforts and using the browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Curtis Hendrickson (Campus)

unread,
Mar 18, 2014, 5:15:30 PM3/18/14
to Brian Lee, gen...@soe.ucsc.edu

Brian,

 

Looks good on the test server.

How can I make sure I’m on the appropriate announcement list?

 

Also, will I need to download a new version of bedToBigBed ? After changing “string” => “lstring” in m .as file, the current bedToBigBed (2014-03-05 version) gives me

 

# compile details.bed + .as --> big bed

bedToBigBed -tab -as=immuno_beadchip_11419691_b.hg19.detail.as -type=bed8+2 -extraIndex=illuminaID,probeDetailsHTML out/immuno_beadchip_11419691_b.hg19merge.sort.detail.bed hg19.chrom.sizes out/immuno_beadchip_11419691_b.hg19merge.sort.detail.bb

Sorry for now can only index string fields.

Brian Lee

unread,
Mar 21, 2014, 4:02:24 PM3/21/14
to Curtis Hendrickson (Campus), gen...@soe.ucsc.edu

Hi Curtis,

Thank you for trying the bigBed approach, the bedDetail custom track changes will be out after next Tuesday. An announcement list does not exist for software updates, they regularly occur every three weeks. However, when there are big new features we do send out announcements to genome-...@soe.ucsc.edu, you can subscribe via a link from the bottom of this page: http://genome.ucsc.edu/contacts.html If you are curious to track the software changes you can view this page: http://genecats.cse.ucsc.edu/builds/versions.html

There will not be a new bedToBigBed, the -searchIndex option is only for strings as identifiers, and so the error you are getting is correct when you try to build an index on the lstring. However, there is an approach that will work, but first note that searching on bigBeds will only work via the Track Hub approach with the necessary trackDb.txt definitions, not as custom tracks.

I set up and example hub you can use to model: http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/searchHubExample/hub.txt

Here are the steps you would likely take to build a bigBed searchable on the illuminaIDs and then link to your text about each.

1) Run your bedToBigBed with only the one index -extraIndex=illuminaID (leaving out the long probeDetailsHTML)
2) Place that bb file in hub and in the trackDb.txt add the line "searchIndex illuminaID".
You can see where I put "searchIndex name" in my example when I created an extraIndex on the name field:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/searchHubExample/hg19/trackDb.txt
Before proceeding be sure this is working properly, now you should be able to search your hub's bigBed with your illuminaIDs

3) Now you can create a new text file that will have your illuminaID followed by text, in my example I call it infoSJs.txt. You might have something like illumIDs.txt looking like the following:


imm_1_898835-1_T_F_1767696649    Name  imm_1_898835    IlmnStrand  TOP    SNP  [A/C]    AddressA_ID  0038649493  ....
id2 Text.......
id3 Text.....

(I suggest removing <html> tags by doing something like "cat original.illumIDs.txt | sed 's/<[^>]\+>/ /g' >illumIDs.txt"):

4). With this text file you can run the ixIxx program, ixIxx illumIDs.txt illum.ix illum.ixx
5). To your bigBed's trackDb.txt stanza you would then add "searchTrix illum.ix" 
6). Then, when browsing the hub you should be able to search for items like AddressA_ID "0038649493" and it will bring up the position of the related identifier, imm_1_898835-1_T_F_1767696649, as defined in the illumIDs.txt file.

I put some notes on in the hg19/trackDb.txt example above that you can read and follow to recreate that example if you want to try that first.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

Curtis Hendrickson (Campus)

unread,
Mar 21, 2014, 4:08:36 PM3/21/14
to Brian Lee, gen...@soe.ucsc.edu

Brian,

 

Thanks for the thorough response.

Apologies if I tried to serachIndex the HTML – you’re correct that I only want to display it.

Indexing of the IlluminaID’s is nice, but secondary to displaying the detailed information about the probe in the details page.

 

I’ll go through the details of your example next week.

Reply all
Reply to author
Forward
0 new messages