Installing GBiB on a local server

100 views
Skip to first unread message

Lina Hultin Rosenberg

unread,
Jun 26, 2015, 10:42:05 AM6/26/15
to gen...@soe.ucsc.edu
Dear Sir/Madam,

We are setting up a database platform that will handle human genetic and clinical data we are generating in a project. The database will be run from a local server and members of the project will access the database through a web-interface by 2-factor authentication. We are also hoping to install GBiB locally on this server to allow the users to look at selected genes or variants identified in the project. The idea is that the user will generate the track files in the right format (VCF, BED etc) from our data in the database and then launch the browser.

When I read on the homepage I can read that by default, GBiB can only be accessed form the machine on which it is installed. Can we let members of the project (that have logged in to the database) to use GBiB without opening up the browser to everyone that happens to know the IP address?

A second question, we are thinking to mirror some of the tacks locally on the server to save time. Can I find information somewhere on the size of the data so I can take this into account when planning the size of the server? We are interested in the following tracks:

1.     Conservation (PhyloP score, MultiZAlignment)
2.     ENCODE regulation (TF chipseq data, H3K27AC, H3K4me1, H3K4me3, DNAse I hypersensitivity)
3.     Common SNPs (dbSNP)
4.     UCSC genes


We greatly appreciate any input in this matter!

Best regards, 
Lina Rosenberg

-------- 
Lina Hultin Rosenberg, PhD
Bioinformatician
Department of Medical Biochemistry and Microbiology
Uppsala University, BMC D11:3
e-mail: lina.hulti...@imbim.uu.se
phone: +46-18-471 4525
web: www.imbim.uu.se

Matthew Speir

unread,
Jun 26, 2015, 7:26:57 PM6/26/15
to Lina Hultin Rosenberg, gen...@soe.ucsc.edu
Hi Lina,

Thank you for your questions about GBiB. We are looking into these authentication issues, and will get back to you soon. You can see the different download sizes of the tracks you are interested here: http://genome-test.soe.ucsc.edu/cgi-bin/hgMirror. The total size of the download will be listed next to the track name. Tracks are labeled by names in the UCSC Genome Browser.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Lina Hultin Rosenberg

unread,
Jun 29, 2015, 12:10:41 PM6/29/15
to Matthew Speir, gen...@soe.ucsc.edu
Hi Matthew,

Thank you for your reply! I am looking forward to hear from you regarding the authentication.

Best regards, Lina

-------- 
Lina Hultin Rosenberg, PhD
Bioinformatician
Department of Medical Biochemistry and Microbiology
Uppsala University, BMC D11:3
e-mail: lina.hulti...@imbim.uu.se
phone: +46-18-471 4525
web: www.imbim.uu.se

Jules Kerssemakers

unread,
Jun 30, 2015, 11:58:43 AM6/30/15
to lina.hulti...@imbim.uu.se, gen...@soe.ucsc.edu
Dear Lina,

We have built a similar set-up to what you describe here https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/sBm43E2ZRB0.
We use this at the DKFZ Heidelberg for some of our internal data. I hope our experiences can be of use to you!

We decided against modifying the GBiB image to bolt/duct-tape authentication on to it.
Instead we isolated the GBiB image from the rest of our intranet with our firewall, and only allow traffic to the UCSC mysql server, the update server, and an internal authenticating proxy.

This means we have an unmodified GBiB image, which greatly simplifies the UCSC's auto-update procedure, and saves us the brittleness of a locally modified VM that could break at every update from the UCSC.
Our network admins also (understandibly) wouldn't allow an 'external' VM access to our central LDAP server with all our passwords.

The authenticating proxy is a simple, purpose-built VM that runs on the trusted base-image we use for all DKFZ services.
On top of that is a standard apache install with mod_proxy.
This apache instance is configured with a "proxypass" directive [1] set up to forward to the GBiB after you enter your DKFZ-password.
(template config below at [2])
To prevent everyone from the cleaner to the Director to access this protected data, we further limit access with a special user-group.
Only a few researchers are members of this group.

The proxy-vm also mounts a network drive with the trackhub.txt on it, and allows access from the GBiB.
We can then add this trackhub in the gbib-browser by adding "gbib-proxy.intranet.url/TrackHub/hub.txt" under "my hubs"
Again, our carefull network admins didn't want to give 'external' VMs direct NFS-access to our network drive hosts.
By having the trackhub on a network drive, we can host the hundreds of gigabytes of datafiles on our central disk-farm, and simultaneously allow the actual users to configure the trackhub to their liking.
(They have write-access to this trackhub-drive from their local PCs, the gbib-proxy has read-only access.)


If you have any further questions, don't hesitate to ask!

Kind regards,
Jules Kerssemakers,
eilslabs Data Support Group @ DKFZ Heidelberg



[1] https://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass

[2] Apache config. This forwards all requests to "gbib-proxy.intranet.url" to "gbib-vm.intranet.url" after you enter your password.
"gbib-vm" is the unmodified GBiB image from the UCSC
"gbib-proxy" is our purpose-built proxy image, running linux + apache + mod_proxy + mod_ldap.

<VirtualHost *>
ProxyTimeout 3600
# allow gbib-vm access to the network drive with the trackhub
# /Trackhub is an NFS-mount on this machine, to our network-disk host
<Location /Trackhub> # settings specific gbib-proxy.intranet.url/TrackHub/
Options +FollowSymLinks # our network disk has the data in a different layout, the users build a "trackhub" subdir with the UCSC layout, containing symlinks to the actual data
ProxyPass !
Allow from <GBiB-ip> # allow access to this Trackhub from the UCSC vm ..
Deny from all # .. and from nowhere else ..
Satisfy any
</Location>

# configure the proxy, so that visiting gbib-proxy.intranet.url asks for password, then transparantly forwards to gbib-vm.
<Location /> # general settings

# settings to access central LDAP password authentication server
AuthLDAPBindDN "<YOUR LDAP SERVER>"
AUthLDAPBindPassword YOUR-LDAP-PASSWORD
AuthLDAPURL YOUR-LDAP-URL
AuthLDAPGroupAttributeIsDN on
AuthType Basic # Basic = unsecure (leaks passwords to spies) unless you use HTTPS or intranet only.
AuthBasicProvider ldap
AuthName "your LDAP account (member of GBiB-group)"

# don't allow all employees access, but only those with special gbib-permissions
Require ldap-group <CN=...DN=... name for GBiB group>

# proxy settings. Users won't realise that "gbib-vm.intranet.url" exists, they only see "gbib-proxy.intranet.url"
ProxyPass http://gbib-vm.intranet.url/
ProxyPassReverse http://gbib-vm.intranet.url/
Substitute 's!gbib-vm!gbib-proxy!n'
Substitute 's!gbib%2Dvm!gbib%2Dproxy!n'

# speed: allow compression for some of the textfiles sent by the browser to save bandwith
AddOutputFilterByType INFLATE;SUBSTITUTE;DEFLATE text/html
AddOutputFilterByType INFLATE;SUBSTITUTE;DEFLATE text/css
AddOutputFilterByType INFLATE;SUBSTITUTE;DEFLATE application/x-javascript
</Location>
</VirtualHost>


Lina Hultin Rosenberg

unread,
Jun 30, 2015, 4:29:28 PM6/30/15
to Jules Kerssemakers, gen...@soe.ucsc.edu
Dear Jules,

Thank you so much for sharing this information with us, it will for sure be very useful when we are setting it up. We are in the planning phase still and do not know the exact details of the IT infrastructure yet but we will start setting everything up during the fall. Possibly, we will get back to you with some questions then if you don’t mind.

Thank you again and have a nice summer!

Best regards, Lina
-------- 
Lina Hultin Rosenberg, PhD
Bioinformatician
Department of Medical Biochemistry and Microbiology
Uppsala University, BMC D11:3
e-mail: lina.hulti...@imbim.uu.se
phone: +46-18-471 4525
web: www.imbim.uu.se

Jules Kerssemakers

unread,
Jul 1, 2015, 12:20:20 PM7/1/15
to Lina Hultin Rosenberg, gen...@soe.ucsc.edu
Dear Lina,

You're welcome!

As for IT-infrastructure: our GBiB runs in the central DKFZ VM
environment (I think we have vmWare vSphere?).
apparantly (so our VM-admin tells us) it is possible to interconvert
between many VM-formats.
So he converted the virtualbox image to something that vmWare will accept.

I imagine that Uppsala Univerity already has such a central VM-host
infrastructure, now would be a good time to start having coffee with
their admins :-)

~Jules

On 06/30/2015 10:27 PM, Lina Hultin Rosenberg wrote:
> Dear Jules,
>
> Thank you so much for sharing this information with us, it will for sure be very
> useful when we are setting it up. We are in the planning phase still and do not
> know the exact details of the IT infrastructure yet but we will start setting
> everything up during the fall. Possibly, we will get back to you with some
> questions then if you don’t mind.
>
> Thank you again and have a nice summer!
>
> Best regards, Lina
> --------
> Lina Hultin Rosenberg, PhD
> Bioinformatician
> Department of Medical Biochemistry and Microbiology
> Uppsala University, BMC D11:3
> e-mail: lina.hulti...@imbim.uu.se <mailto:lina.hulti...@imbim.uu.se>
> phone: +46-18-471 4525
> web: www.imbim.uu.se <http://www.imbim.uu.se>
>
> On 30 Jun 2015, at 13:28, Jules Kerssemakers <j.kerss...@dkfz-heidelberg.de
> --
>
> To unsubscribe from this group and stop receiving emails from it, send an email
> to genome+un...@soe.ucsc.edu <mailto:genome+un...@soe.ucsc.edu>.


Lina Hultin Rosenberg

unread,
Aug 23, 2016, 11:50:28 AM8/23/16
to Jules Kerssemakers, gen...@soe.ucsc.edu, Jonas Söderberg
Dear Jules and UCSC forum,

It was about a year ago that I contacted the UCSC forum to ask for guidance to set up a local UCSC browser on our internal server, to be used with sensitive human sequencing data. I got a detailed description from you Jules on a similar setup, and I very much appreciated your help.

Just to refresh your mind, we are running a genomics analysis platform called BC Genome (developed by BC Platforms) from a local server. The server stores all our data (bam files, VCF, clinical data files etc) that can be accesses through the BC Genome interface which allows us to search, view, filter and analyse the genomic data. The BC Genome also has an option to generate bed-files that can be viewed using UCSC genome browser. Since we are handling sensitive patient data we wanted to make a local installation of the UCSC browser. 

We have gotten help from our local bioinformatics support to make the installation and setup of GBIB (following instructions on https://genome.ucsc.edu/goldenpath/help/gbib.html as well as your instructions below Jules) and the IT team at BC Platforms have made the necessary changes to link the software to the local GBIB instead of the public UCSC server. The UCSC link in the software now gets us to the local UCSC main page. The analysis output file in the UCSC format can be imported in My data/custom tracks. However, all tests in our BC|Genome installation give the following error:

"The gateway did not receive a timely response from the upstream server or application."

The same testing using the public UCSC browser was ok. 

BC Platforms does not know the reason for this, neither does the local bioinformatics support nor the IT personnel administrating our server. So I am turning to you to see if you have any input on this.

Thank you very much!


Best regards, Lina


-------- 
Lina Hultin Rosenberg, PhD
Bioinformatician
Department of Medical Biochemistry and Microbiology
Uppsala University, BMC D11:3
e-mail: lina.hulti...@imbim.uu.se

Jules Kerssemakers

unread,
Aug 24, 2016, 10:47:30 AM8/24/16
to Lina Hultin Rosenberg, gen...@soe.ucsc.edu, Jonas Söderberg
Dear Lina, Dear Forum,

I'm glad my instructions back then were of use to you!
As a side note: We have since duplicated the installation here for a
second project, to great satisfaction of the users.

As for your technical problem of tests failing, I can only do some
educated guesses (below).
We have seen nothing similar, but then again we haven't interfaced any
software to the GBiB, only 'dumb' data-folders.

As I described back then, the data folders are mounted via NFS to our
authenticating proxy, which then serves them up via apache as simple
http-navigable folders.
(Just as you would use an owned server to serve your data/trackhub to
the public UCSC browser, except on the intranet)
To prevent un-authorised users on the intranet from navigating those
trackhubs, the proxy is set to only allow access to logged-in users and
the GBiB's IP-address.

My educated guess is that, compared to the official UCSC browser, the
big difference is the proxy in-between, so that is the likely cause of
trouble.
Is the BC software talking directly to the GBiB, or is it talking to the
proxy-which-forwards-to-the-GBiB?
Also, and for that we'll need input from the UCSC, maybe the BC software
uses interfaces that were excluded (for size/"lightness") from the GBiB
version of the browser?

As for the "no timely response" error you are quoting:
I have no experience with "BC Genome", and it is unclear to me if the
error message appearing on a BC-page, or on the UCSC pages.
I'm probably of zero help in debugging this, but the UCSC people can
probably use a more detailed description of the symptoms.

I hope you can get it solved!
~Jules
> e-mail: lina.hulti...@imbim.uu.se <mailto:lina.hulti...@imbim.uu.se>
> phone: +46-18-471 4525
> web: www.imbim.uu.se <http://www.imbim.uu.se>
>
> On 30 Jun 2015, at 13:28, Jules Kerssemakers <j.kerss...@dkfz-heidelberg.de

Lina Hultin Rosenberg

unread,
Aug 25, 2016, 12:18:49 PM8/25/16
to Jules Kerssemakers, gen...@soe.ucsc.edu, Jonas Söderberg
Dear Jules, 

Thank you so much for your answer and the effort to make at least some educated guesses. I understand it is very difficult to debug with so little information.

I have forwarded your response to BC Platforms support and our IT personell to see if it can guide them in the right direction at least.

Thanks again!

Best regards, Lina
-------- 
Lina Hultin Rosenberg, PhD
Bioinformatician
Department of Medical Biochemistry and Microbiology
Uppsala University, BMC D11:3
e-mail: lina.hulti...@imbim.uu.se
Reply all
Reply to author
Forward
0 new messages