Data Selection to Train Machine Learning Models and Forecast Bitcoin Prices: Depth vs. Width
this study aims to analyze the impact of data selection to train machine learning models and forecast Bitcoin prices. Specifically, we train elastic net regularization models using two datasets with almost identical total observations. One dataset emphasizes years of observations (depth) over total variables, while the second one emphasizes the number of variables (width) over years of data. Our results suggest that the dataset with more extended historical time series and fewer variables provides a lower forecasting error than the dataset with shorter time series and more variables. Our results may be helpful to practitioners looking to identify data selection strategies to train ML-based forecasting models.