Mirroring hg19

43 views
Skip to first unread message

Stuart Meacham

unread,
Aug 15, 2013, 8:38:28 AM8/15/13
to genome...@soe.ucsc.edu
Hi,

I want to set up a mirror of hg19 but an rsync dry run suggests over 5TB
of space required and I only have about 4TB space available. Is there
anything I can miss out of the /gbdb/hg19/ directory without overly
impacting functionality? I appreciate missing out 1TB of data is likely
to have some impact but was hoping some data could be identified which
was perhaps 'less core' than the rest . . .

Cheers

Stuart

Matthew Speir

unread,
Aug 15, 2013, 4:33:59 PM8/15/13
to Stuart Meacham, genome...@soe.ucsc.edu

Hello Stuart,

Thank you for your question about mirroring the UCSC Genome Browser. You can actually do a minimal browser installation, instead of a full mirror as you described. We have a wiki page page, http://genomewiki.ucsc.edu/index.php/Minimal_Browser_Installation, that describes the process needed to set up one these minimal installations. According to the wiki page, the bare minimum of files needed from /gbdb/ are the 2bit or the nib files for the chosen assembly.

I hope this information is helpful. If you have anymore questions, please respond to genome...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

Matthew Speir

unread,
Aug 16, 2013, 1:58:40 PM8/16/13
to Stuart Meacham, genome...@soe.ucsc.edu
Hello Stuart,

You actually have a few options to even further reduce the amount of data needed for your own minimal installation. The ENCODE tables, as well as the Net and Chain tables, take up a sizable amount of space in hg19. These tables can be excluded using the the following rsync options:

--exclude=/encode* --exclude=/wgEncode* --exclude=/net* --exclude=/chain*

Another option is to set up an Assembly Hub in the UCSC Genome Browser. Assembly Hubs are a new feature that allow you to host your genome and tracks locally, and then display these within the genome browser. The Assembly Hubs wiki page, http://genomewiki.ucsc.edu/index.php/Assembly_Hubs, contains a great set of instructions on how to set up a hub.


I hope this information is helpful. If you have anymore more questions, please respond to genome...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

On 8/15/13 5:38 AM, Stuart Meacham wrote:

Stuart Meacham

unread,
Aug 20, 2013, 7:33:25 AM8/20/13
to Matthew Speir, genome...@soe.ucsc.edu
Hi Matthew,

Thanks very much for taking the time to respond. I believe that I have
followed the instructions and I get the gateway page without problems. I
have chowned the db files to mysql.mysql and the /gbdb files to
www-data.www-data. When I go to the page
http://myserver/cgi-bin/hgGateway I get a minimal interface with no
apparent errors. But for every gene I search for I get - Sorry, couldn't
locate <gene_name> in genome database.

There are no Mysql errors in the logs
hguser has the appropriate permissions

Is there a mysql command I can use on the server to see if I should be
finding my genes and it is merely a setup error or something more serious?

Cheers

Stuart

Hiram Clawson

unread,
Aug 20, 2013, 11:26:35 AM8/20/13
to Stuart Meacham, Matthew Speir, genome...@soe.ucsc.edu

Good Morning Stuart:

What tables do you have in your database ? Do you have gene tables ?
Is this one of UCSC's genome databases, or a custom database of
your own ?

--Hiram

Hiram Clawson

unread,
Aug 21, 2013, 11:25:12 AM8/21/13
to Stuart Meacham, Matthew Speir, genome...@soe.ucsc.edu
Good Morning Stuart:

Can you clarify what you are trying to do there ?
What you have in your minimal set of tables is merely
a minimal set of tables in order to prove that your
local installation can function. There won't be
any data displayed in your genome browser because
there are no data tables. You will need to install
other tables in order to get data displayed. It depends
upon what function you want to see. One simple method
to load the browser so it will do almost everything is
to install all tables except anything related to Encode.

--Hiram

On 8/21/13 8:13 AM, Stuart Meacham wrote:
> Hey,
>
> No custom tables yet, output of:
>
> mysql -uroot -p -P 3307 -h 127.0.0.1 hg19 -e 'show tables'
>
> is:
>
> +----------------+
> | Tables_in_hg19 |
> +----------------+
> | chromInfo |
> | cytoBandIdeo |
> | gap |
> | gold |
> | grp |
> | hgFindSpec |
> | trackDb |
> +----------------+
>
> all populated with rsync and the hguser user has all the necessary permissions.
>
> Thanks for any help/pointers
>
> Stuart

Stuart Meacham

unread,
Aug 21, 2013, 11:13:06 AM8/21/13
to Hiram Clawson, Matthew Speir, genome...@soe.ucsc.edu
Hey,

No custom tables yet, output of:

mysql -uroot -p -P 3307 -h 127.0.0.1 hg19 -e 'show tables'

is:

+----------------+
| Tables_in_hg19 |
+----------------+
| chromInfo |
| cytoBandIdeo |
| gap |
| gold |
| grp |
| hgFindSpec |
| trackDb |
+----------------+

all populated with rsync and the hguser user has all the necessary
permissions.

Thanks for any help/pointers

Stuart



On 20/08/13 16:26, Hiram Clawson wrote:
>
Reply all
Reply to author
Forward
0 new messages