Review and Examination of Input Feature Preparation Methods and Machine Learning Models for Turbulence Modeling.
Model extrapolation to unseen flow is one of the biggest challenges facing data-driven turbulence modeling, especially for models with high dimensional inputs that involve many flow features. In this study we review previous efforts on data-driven Reynolds-Averaged Naiver Stokes (RANS) turbulence modeling and model extrapolation, with main focus on the popular methods being used in the field of transfer learning. Several potential metrics to measure the dissimilarity between training flows and testing flows are examined. Different Machine Learning (ML) models are compared to understand how the capacity or complexity of the model affects its behavior in the face of dataset shift. Data preprocessing schemes which are robust to covariate shift, like normalization, transformation, and importance re-weighted likelihood, are studied to understand whether it is possible to find projections of the data that attenuate the differences in the training and test distributions while preserving predictability. Three metrics are proposed to assess the dissimilarity between training/testing dataset. To attenuate the dissimilarity, a distribution matching framework is used to align the statistics of the distributions. These modifications also allow the regression tasks to have better accuracy in forecasting under-represented extreme values of the target variable. These findings are useful for future ML based turbulence models to evaluate their model predictability and provide guidance to systematically generate diversified high-fidelity simulation database.