I hope you are having as good a New Year as possible given the current circumstances. We resume our public SCIMMA talks on Tuesday 2nd February at 3pm Eastern/noon Pacific, with a presentation by Mario Juric on "Science Platforms: Enabling Scalable Research in the Big-Dataset Era". Connection details are at the end of this message and the talk abstract is below:
Research in astronomy is undergoing a major paradigm shift, being transformed by the advent of large, automated, sky-surveys into a data-rich field where PB-sized spatio-temporal datasets are becoming common. At the same time, streaming is becoming commonplace (e.g., to transmit alerts and coordinate follow-up), as well as the need to combine and share the data and analyses. This presents a challenge to a typical astronomer: how can a domain scientist with little experience in data management or distributed computing take advantage of this data-rich environment? One solution to this problem are scalable, cloud-based, "science platforms" -- computing platforms combined with rich gateways exposing server-side code editing, management, execution and result visualization capabilities (usually through Jupyter).
In this talk, I'll discuss the desiderata for a successful science platform, research concepts, present work, and a few solutions we developed and deployed within DiRAC motivated by the need for ZTF data analysis. I'll also demonstrate some recent research on making science platforms fully scalable and cost-effective, with scaling and live migration for Jupyter notebooks. These developments promise to make arbitrarily-large datasets and streams -- residing in cloud-based data-lakes -- accessible to CI non-experts, easy to combine and collaboratively analyze. These systems have the potential to allow the science community to take the advantage of the next generation of experiments and datasets.
Adam Brazier, for SCiMMA