Hello -
We have been maintaining a partial mirror of the UCSC Genome Browser for a number of years, including only a few organisms (human, mouse, yeast) and excluding much of the wgEncode data that takes so much disk space.
We are now in a position to implement a full UCSC Genome Browser mirror, and have a solution in mind for the file server portion. This dedicated file server would have about ~100 TB available storage for goldenPath and gbdb files, and would be NFS mounted from the application/DB server via 10 gig Ethernet.
However I am looking for some advice regarding an application/DB server. This machine would run the MySQL DB for the mirror, so would need fast local disk with capacity for all the standard UCSC DB tables (~6 TB last time I checked) as well as DB tables for non-public tracks created by us. The server would also host multiple GenBrowse web applications, each with a different view of the underlying database (via local trackDb configurations) so that labs can browse their own data tracks.
Can you suggest specifications for the kind of app/DB server I've outlined? And could you provide specs on the machines you currently use for the standard UCSC Genome Browser?