access to tables

13 views
Skip to first unread message

Eydoux, Patrice

unread,
Oct 19, 2023, 4:46:41 PM10/19/23
to gen...@soe.ucsc.edu

Hello,


Any reason that I can no longer access some tables with an SQL query, both with coding and direct access? It happened to me with the mane and dgvGold tables.


The reasons why I would prefer to use SQL queries is that it is faster and more flexible.


Thank you very much for your beautiful work.


Best regards,

Patrice

Gerardo Perez

unread,
Oct 29, 2023, 6:33:21 PM10/29/23
to Eydoux, Patrice, gen...@soe.ucsc.edu

Hello, Patrice.

Thank you for your interest in the UCSC Genome Browser and for your question about access to tables.

To better assist you, could you provide more details on what you are trying to do? Both MANE and dgvGold data are offered as bigBeds. We are increasingly moving new data into bigBed format, and we do offer tools such as bigBedToBed, which can be used to obtain only features within a given range without even downloading the entire file

$ bigBedToBed -chrom=chr6 -start=0 -end=1000000 http://hgdownload.soe.ucsc.edu/gbdb/hg38/mane/mane.bb stdout

chr6    292486    351355    ENST00000419235.7    0    +    292539    348951    8,81,156    7    74,34,83,50,75,172,2587,    0,12141,19393,42627,53367,55616,56282,    ENST00000419235.7    cmpl    cmpl    0,0,1,0,2,2,0,    mRNA    ENSG00000112679.15    DUSP22    protein_coding    ENSP00000397459.2    NM_001286555.3    GeneID:56940    NP_001273484.1    MANE Select
chr6    391751    411443    ENST00000380956.9    0    +    393152    407598    8,81,156    9    58,271,187,89,145,108,354,113,3989,    0,1346,3069,4095,5356,7076,9672,13266,15703,    ENST00000380956.9    cmpl    cmpl    -1,0,0,1,0,1,1,1,0,    mRNA    ENSG00000137265.16    IRF4    protein_coding    ENSP00000370343.4    NM_002460.4    GeneID:3662    NP_002451.2    MANE Select
chr6    485153    693139    ENST00000230449.9    0    -    486670    637818    8,81,156    28    1611,60,62,123,56,142,117,67,62,60,81,62,122,158,66,125,126,119,103,82,146,81,125,114,127,177,161,121,    0,3825,5971,12213,14491,47315,64021,68700,70073,70800,71330,77630,78879,79391,79710,87366,91603,107315,112867,113706,113926,124944,132557,134276,144681,147787,152547,207865,    ENST00000230449.9    cmpl    cmpl    2,2,0,0,1,0,0,2,0,0,0,1,2,0,0,1,1,2,1,0,1,1,2,2,1,1,0,-1,mRNA    ENSG00000112685.14    EXOC2    protein_coding    ENSP00000230449.4    NM_018303.6    GeneID:55770    NP_060773.3    MANE Select
chr6    655938    657100    ENST00000380907.3    0    -    656107    656944    8,81,156    1    1162,    0,    ENST00000380907.3    cmpl    cmpl    0,    mRNA    ENSG00000188996.5    HUS1B    protein_coding    ENSP00000370293.2    NM_148959.4    GeneID:135458    NP_683762.2    MANE Select

$ bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/hg38/dgv/dgvMerged.bb -chrom=chr6 -start=0 -end=100000 stdout

chr6    60000    775274    esv2759401    0    +    60000    60000    139,69,19    gain+loss    Redon et al 2006    17122850    BAC aCGH, SNP array    esv2757151,esv2758034    270    198    19            NA06985,NA06991,NA06993,NA06994,NA07000,NA07019,NA07034,NA07048,NA07056,NA07345,NA07348,NA10830,NA10831,NA10838,NA10839,NA10846,NA10847,NA10854,NA10855,NA10856,NA10857,NA10859,NA10860,NA10863,NA11829,NA11830,NA11831,NA11832,NA11839,NA11840,NA11993,NA11995,NA12003,NA12004,NA12005,NA12006,NA12044,NA12056,NA12057,NA12144,NA12146,NA12154,NA12156,NA12234,NA12236,NA12239,NA12248,NA12249,NA12264,NA12707,NA12751,NA12752,NA12753,NA12760,NA12761,NA12762,NA12801,NA12802,NA12813,NA12814,NA12815,NA12864,NA12865,NA12873,NA12874,NA12878,NA18500,NA18501,NA18502,NA18503,NA18504,NA18505,NA18508,NA18515,NA18516,NA18517,NA18521,NA18522,NA18523,NA18524,NA18532,NA18537,NA18542,NA18545,NA18547,NA18550,NA18552,NA18558,NA18562,NA18563,NA18566,NA18572,NA18573,NA18582,NA18592,NA18593,NA18594,NA18603,NA18608,NA18609,NA18611,NA18612,NA18620,NA18621,NA18623,NA18624,NA18632,NA18635,NA18637,NA18852,NA18853,NA18854,NA18855,NA18856,NA18857,NA18858,NA18859,NA18860,NA18861,NA18862,NA18863,NA18870,NA18871,NA18872,NA18912,NA18913,NA18914,NA18940,NA18944,NA18945,NA18947,NA18949,NA18951,NA18952,NA18953,NA18956,NA18959,NA18961,NA18965,NA18968,NA18969,NA18972,NA18973,NA18974,NA18975,NA18976,NA18978,NA18980,NA18981,NA18991,NA18994,NA18995,NA18997,NA18998,NA18999,NA19003,NA19005,NA19007,NA19092,NA19093,NA19094,NA19098,NA19099,NA19100,NA19101,NA19102,NA19103,NA19116,NA19119,NA19120,NA19127,NA19128,NA19129,NA19130,NA19131,NA19132,NA19137,NA19138,NA19139,NA19143,NA19144,NA19145,NA19153,NA19154,NA19159,NA19160,NA19161,NA19172,NA19192,NA19193,NA19194,NA19200,NA19201,NA19202,NA19203,NA19204,NA19205,NA19207,NA19208,NA19209,NA19210,NA19211,NA19221,NA19222,NA19223,NA19238,NA19239,NA19240    715274
chr6    60000    653718    nsv428134    0    +    60000    60000    0,0,200    gain    Perry et al 2008b    18775914    BAC aCGH, FISH, PCR            nssv450589,nssv450603,nssv450588,nssv450586,nssv450578,nssv450596,nssv450597,nssv450599,nssv450602,nssv450585,nssv450600,nssv450582,nssv450594,nssv450579,nssv450595,nssv450601,nssv450584,nssv450591,nssv450593,nssv450580,nssv450590,nssv450581,nssv450583,nssv450592    62    24    0            HGDP00449,HGDP00450,HGDP00460,HGDP00463,HGDP00467,HGDP00471,HGDP00474,HGDP00476,HGDP00984,HGDP00986,HGDP01087,HGDP01088,HGDP01089,HGDP01093,HGDP01094,NA18498,NA18916,NA19096,NA19108,NA19113,NA19147,NA19181,NA19189,NA19257    593718
chr6    60000    126418    nsv10796    0    +    60000    60000    139,69,19    gain+loss    Perry et al 2008    18304495    Oligo aCGH        nssv16470,nssv13892,nssv14496,nssv13150,nssv12829,nssv14374,nssv15673,nssv14158,nssv15703,nssv15831,nssv15807,nssv13965,nssv13884,nssv15001,nssv13880,nssv15861    31    11    3            NA07029,NA10839,NA10863,NA12802,NA18504,NA18537,NA18572,NA18853,NA18860,NA18975,NA19132,NA19144,NA19221,NA19240    66418
chr6    60000    85500    nsv1151792    0    +    60000    60000    0,0,200    duplication    John et al 2014    26484159    Sequencing            nssv3998974    1    1    0            KWB1    25500
chr6    60000    79325    nsv4121945    0    +    60000    60000    0,0,200    duplication    gnomAD Structural Variants    32461652    Sequencing        nssv15977875    10847    1    0                19325
chr6    60030    155228    nsv820764    0    +    60030    60030    200,0,0    deletion    Ju et al 2010    20802225    Sequencing            nssv1420566    1    0    1            NA10851    95198
chr6    60030    149517    esv26003    0    +    60030    60030    0,0,200    gain    Conrad et al 2009    19812545    Oligo aCGH            esv20531,esv19428,esv13218    40    31    0            NA06985,NA07037,NA11894,NA11993,NA11995,NA12006,NA12156,NA12239,NA12287,NA12749,NA12776,NA12828,NA12878,NA15510,NA18502,NA18505,NA18508,NA18511,NA18523,NA18858,NA18861,NA18907,NA18909,NA18916,NA19099,NA19114,NA19129,NA19147,NA19190,NA19225,NA19240    89487
chr6    60198    63400    nsv1076592    0    +    60198    60198    0,0,200    duplication    Thareja et al 2015    25765185    Sequencing            nssv3770858    1    1    0            KWP1    3202
chr6    61999    102888    nsv969341    0    +    61999    61999    0,0,200    duplication    Sudmant et al 2013    23825009    Oligo aCGH, Sequencing        nssv2390986,nssv2390943,nssv2390983,nssv2391001,nssv2390989,nssv2390951,nssv2390984,nssv2390948,nssv2390981,nssv2390997,nssv2390974,nssv2390959,nssv2390979,nssv2390144,nssv2390969,nssv2390945,nssv2390972,nssv2390999,nssv2390991,nssv2390956,nssv2390992,nssv2390966,nssv2390976,nssv2390950,nssv2390978,nssv2390949,nssv2390990,nssv2390958,nssv2390973,nssv2390960,nssv2390994,nssv2390961,nssv2390970,nssv2390944,nssv2390977,nssv2390133,nssv2390993,nssv2390142,nssv2390985,nssv2390954,nssv2390982,nssv2390141,nssv2390968,nssv2390139,nssv2390988,nssv2390138,nssv2390987,nssv2390995,nssv2390971,nssv2390965,nssv2390980,nssv2390942,nssv2390975,nssv2390140,nssv2390143,nssv2391003,nssv2390955,nssv2390946,nssv2390998,nssv2390136,nssv2390963,nssv2390962,nssv2391004,nssv2390953,nssv2390964,nssv2390940,nssv2390996,nssv2390147,nssv2390134,nssv2391000,nssv2390941,nssv2390146,nssv2390137,nssv2390947,nssv2390967,nssv2390957,nssv2390145,nssv2390952,nssv2390135,nssv2391002    97    10    0            HGDP00456,HGDP00521,HGDP00542,HGDP00665,HGDP00778,HGDP00927,HGDP00998,HGDP01029,HGDP01284,HGDP01307    40889
chr6    62051    63550    dgv3436e59    0    +    62051    62051    0,0,200    duplication    1000 Genomes Consortium Pilot Project    20981092    Digital array, Oligo aCGH, PCR, Sequencing            esv3398664,esv3427645,esv3363772    185    3    0            NA19238,NA19239,NA19240    1499
chr6    62099    69700    nsv1139856    0    +    62099    62099    0,0,200    duplication    Alsmadi et al 2014    24896259    Sequencing            nssv3978192    2    1    0            KWS2    7601
chr6    66999    111000    nsv4125160    0    +    66999    66999    200,0,0    deletion    gnomAD Structural Variants    32461652    Sequencing        nssv15907097    10847    0    1                44001
chr6    67300    82100    nsv3386669    0    +    67300    67300    0,0,200    duplication    Audano et al 2019    30661756    Sequencing            nssv14729633,nssv14724650,nssv14723515    14    3    0            HG00733,NA12878,NA19240    14800
chr6    68051    70850    esv3348565    0    +    68051    68051    0,0,200    duplication    1000 Genomes Consortium Pilot Project    20981092    Digital array, Oligo aCGH, PCR, Sequencing            essv8695032    185    1    0            NA19240    2799
chr6    69199    72000    nsv1128922    0    +    69199    69199    0,0,200    duplication    Alsmadi et al 2014    24896259    Sequencing            nssv3961592    2    1    0            KWS1    2801
chr6    70844    202392    dgv1068e199    0    +    70844    70844    200,0,0    deletion    1000 Genomes Consortium Phase 1    23128226    Merging, Oligo aCGH, PCR, Sequencing, SNP array            esv2671556,esv2678528,esv2663786    1151    0    226            HG01515,HG01516,HG01518,HG01519,HG01521,HG01522,NA18526,NA18530,NA18532,NA18534,NA18535,NA18536,NA18537,NA18538,NA18539,NA18541,NA18542,NA18543,NA18544,NA18545,NA18546,NA18547,NA18548,NA18549,NA18552,NA18553,NA18555,NA18557,NA18558,NA18559,NA18560,NA18561,NA18562,NA18563,NA18564,NA18565,NA18566,NA18567,NA18570,NA18571,NA18572,NA18573,NA18574,NA18576,NA18577,NA18579,NA18582,NA18592,NA18593,NA18595,NA18597,NA18599,NA18602,NA18603,NA18605,NA18606,NA18608,NA18609,NA18610,NA18611,NA18612,NA18613,NA18614,NA18615,NA18616,NA18617,NA18618,NA18620,NA18621,NA18622,NA18623,NA18624,NA18626,NA18627,NA18628,NA18630,NA18631,NA18632,NA18633,NA18634,NA18635,NA18636,NA18637,NA18638,NA18940,NA18941,NA18942,NA18943,NA18944,NA18945,NA18947,NA18948,NA18949,NA18950,NA18951,NA18952,NA18953,NA18956,NA18959,NA18960,NA18961,NA18963,NA18964,NA18965,NA18971,NA18973,NA18980,NA18982,NA18983,NA18984,NA18985,NA18986,NA18987,NA18988,NA18989,NA18990,NA18999,NA19000,NA19003,NA19004,NA19005,NA19007,NA19009,NA19010,NA19012,NA19055,NA19056,NA19057,NA19058,NA19059,NA19060,NA19062,NA19063,NA19064,NA19065,NA19066,NA19067,NA19068,NA19070,NA19072,NA19074,NA19075,NA19076,NA19077,NA19078,NA19079,NA19080,NA19081,NA19083,NA19084,NA19087,NA19088,NA20503,NA20504,NA20505,NA20506,NA20507,NA20508,NA20509,NA20510,NA20512,NA20515,NA20516,NA20517,NA20518,NA20519,NA20520,NA20522,NA20524,NA20525,NA20527,NA20528,NA20529,NA20531,NA20532,NA20534,NA20535,NA20538,NA20539,NA20541,NA20542,NA20543,NA20544,NA20582,NA20585,NA20586,NA20588,NA20589,NA20752,NA20753,NA20754,NA20756,NA20757,NA20758,NA20759,NA20760,NA20761,NA20765,NA20766,NA20768,NA20769,NA20770,NA20771,NA20773,NA20778,NA20783,NA20785,NA20790,NA20792,NA20795,NA20796,NA20798,NA20799,NA20800,NA20801,NA20802,NA20804,NA20807,NA20810,NA20811,NA20814,NA20815,NA20816,NA20818,NA20819,NA20828    131548
chr6    75558    76650    nsv3383057    0    +    75558    75558    0,0,0    line1 deletion    Audano et al 2019    30661756    Sequencing            nssv14776456    14    0    1            HG02059    1092
chr6    76851    77650    esv3328489    0    +    76851    76851    0,0,200    duplication    1000 Genomes Consortium Pilot Project    20981092    Digital array, Oligo aCGH, PCR, Sequencing            essv8695193    185    1    0            NA19239    799
chr6    86298    118800    nsv1073944    0    +    86298    86298    200,0,0    deletion    Thareja et al 2015    25765185    Sequencing            nssv3763139    1    0    1            KWP1    32502
chr6    86699    98400    nsv4117238    0    +    86699    86699    200,0,0    deletion    gnomAD Structural Variants    32461652    Sequencing        nssv15907098    10847    0    1                11701
chr6    88549    88626    nsv3379713    0    +    88549    88549    0,0,0    line1 deletion    Audano et al 2019    30661756    Sequencing            nssv14786231    14    0    1            HG02059    77
chr6    88646    88721    nsv3383240    0    +    88646    88646    200,0,0    deletion    Audano et al 2019    30661756    Sequencing            nssv14714856    14    0    1            HG02059    75
chr6    88851    90150    dgv3437e59    0    +    88851    88851    0,0,200    duplication    1000 Genomes Consortium Pilot Project    20981092    Digital array, Oligo aCGH, PCR, Sequencing            esv3388723,esv3386561    185    2    0            NA19239,NA19240    1299
chr6    89140    89678    nsv3389545    0    +    89140    89140    200,0,0    deletion    Audano et al 2019    30661756    Sequencing            nssv14717010    14    0    1            HG02059    538
chr6    89515    89593    nsv3383561    0    +    89515    89515    200,0,0    deletion    Audano et al 2019    30661756    Sequencing            nssv14723320    14    0    1            NA12878    78
chr6    89969    90124    nsv3394774    0    +    89969    89969    200,0,0    deletion    Audano et al 2019    30661756    Sequencing            nssv14713042    14    0    1            HG02059    155
chr6    91799    93900    nsv1119065    0    +    91799    91799    200,0,0    deletion    Alsmadi et al 2014    24896259    Sequencing            nssv3981864    2    0    1            KWS1    2101
chr6    95399    96100    nsv1126738    0    +    95399    95399    200,0,0    deletion    Alsmadi et al 2014    24896259    Sequencing            nssv3985006    2    0    1            KWS1    701
chr6    98699    110100    nsv1145084    0    +    98699    98699    200,0,0    deletion    Alsmadi et al 2014    24896259    Sequencing            nssv3981704    2    0    1            KWS1    11401

You can download the bigBedToBed utility from the downloads page, https://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads. You can then find bigBedToBed under the directory that matches your operating system. For example, here is the direct link for Linux: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed

You can also access the data from these tracks using the Table Browser, and the API:
https://api.genome.ucsc.edu/getData/track?genome=hg38;track=mane;chrom=chr6;start=0;end=1000000

https://api.genome.ucsc.edu/getData/track?genome=hg38;track=dgvGold;chrom=chr6;start=0;end=100000

I hope this is helpful. Please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/7b79a4f6738e4a289ad423d266442730%40mail.ubc.ca.

Eydoux, Patrice

unread,
Oct 31, 2023, 7:51:18 PM10/31/23
to gen...@soe.ucsc.edu

Hello Gerardo et al.
It seems that you guys are moving away from the sql database… 
The problem is that using the api is about 2x slower. I suppose that it is because I am using a small number of fields. Is there a way to filter the data? For example, in the dgvGold table, I would need chrom, chromStart, chromEnd, thickStart, thickEnd, dgvID. Any way we could do this, without having to filter AFTER retrieving the data? That would be more efficient…
I am trying to use the UCSC data to analyze cnvs which are sometimes very large. Filtering can save a lot of time. 
E.g. retrieving the isoforms of a gene in a region can be quite time-consuming… that can still be done with SQL. Last time I timed, I got 61ms with sql and 153.

Thank you. 

Patrice 

Begin forwarded message:

From: Gerardo Perez <gpe...@ucsc.edu>
Date: October 29, 2023 at 18:33:22 EDT
To: "Eydoux, Patrice" <pey...@mail.ubc.ca>
Cc: gen...@soe.ucsc.edu
Subject: Re: [genome] access to tables


[CAUTION: Non-UBC Email]

Matthew Speir

unread,
Nov 7, 2023, 11:05:01 AM11/7/23
to Eydoux, Patrice, gen...@soe.ucsc.edu
Hello, Patrice.

Thanks for your input.

Can you provide more details on the speed? How does it differ from the speed of accessing the MySQL server? Is the speed difference consistent over time?

Additionally, does bigBedToBed not work for you? How does that speed compare to MySQL and the API?


If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
---

Matthew Speir

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



Reply all
Reply to author
Forward
0 new messages