The impact of Auto-Sklearn’s Learning Settings: Meta-learning, Ensembling, Time Budget, and Search Space Size
2021
DARLI-AP 2021: 5th International workshop on Data Analytics solutions for Real-LIfe Applications
Hassan Eldeeb, Oleh Matsuk, Mohamed Maher, Abdelrhman Eldallal, Sherif Sakr
PDF: http://ceur-ws.org/Vol-2841/DARLI-AP_11.pdf
Publisher: EDBT/ICDT Workshops 2021
With the booming demand for machine learning (ML) applications, it is recognized that the number of knowledgeable data scientists cannot scale with the growing data volumes and application needs in our digital world. Therefore, several automated machine learning (AutoML) frameworks have been developed to fill the gap of human expertise by automating most of the process of building a ML pipeline. In this paper, we present a micro-level analysis of the AutoML process by empirically evaluating and analyzing the impact of several learning settings and parameters, i.e., meta-learning, ensembling, time budget and size of search space on the performance. Particularly, we focus on AutoSklearn, the state-of-the-art AutoML framework. Our study reveals that no single configuration of these design decisions achieves the best performance across all conditions and datasets. However, some design parameters have a statistically consistent improvement over the performance, such as using ensemble models. Some others are conditionally effective, e.g., meta-learning adds a statistically significant improvement, only with a small time budget.