Guest: Erin LeDell
Title: Scalable Machine Learning with H2O & Systems Approach to Algorithm Development
Abstract: The focus of this presentation is the scalable and distributed machine learning platform, H2O. The multi-node distributed algorithms (GLM, Random Forest, GBM, DNNs, etc) can train on datasets which are larger than RAM (of a single machine), and H2O integrates with other 'big data' systems, Hadoop and Spark. H2O is engineered for production use cases with a focus on fast training and prediction speeds. The second part of the talk will discuss a systems approach to developing novel machine learning algorithms such as H2O AutoML. Unlike well-defined ML algorithms (e.g. GBM), an 'AutoML' algorithm is an automated process which aims to train the best model (or ensemble) in a specified amount of time. I will discuss our methodology for experimentation and validation of new strategies or changes to the algorithm, using a benchmark-driven systems approach.
Bio: Erin LeDell is the Chief Machine Learning Scientist at H2O.ai. Her research focuses on automatic machine learning, ensemble machine learning and statistical computing. Before joining H2O.ai, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security, the founder of DataScientific, Inc. She received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley.