Please see below for a thesis project in the area of meta-learning and automated machine learning.
PeterMassive Scale Mythbusting of Data Science Rules of Thumb with OpenML
Data science practitioners often use very common rules of thumb, but often large scale studies backing up these rules of thumb are lacking, for instance ‘feature selection is more important than algorithm selection’ or ‘nonlinear models are significantly better than linear models’
In this project you will pick one or two of these myths and investigate these using OpenML, a very large repository of data models. For example you will compare pipeline A and A’ (example: feature selection Y/N) and run these across hundreds of data sets and for a large number of algorithms. This will result in a meta level data set of results that we can then mine to validate these hypothesis rules of thumb, or perform actual meta level knowledge discovery. Typically this will not result in a simple answer whether the rule of thumb holds, but more a description of when it holds, for example for certain types of data set / algorithm combinations.