Could normalization improve robustness of abdominal MRI radiomic features?
Radiomics-based systems could improve the management of oncological patients by supporting cancer diagnosis, treatment planning, and response assessment. However, one of the main limitations of these systems is the generalizability and reproducibility of results when they are applied to images acquired in different hospitals by different scanners. Normalization has been introduced to mitigate this issue, and two main approaches have been proposed: one rescales the image intensities (image normalization), the other the feature distributions for each center (feature normalization). The aim of this study is to evaluate how different image and feature normalization methods impact the robustness of 93 radiomics features acquired using a multicenter and multi-scanner abdominal Magnetic Resonance Imaging (MRI) dataset. To this scope, 88 rectal MRIs were retrospectively collected from 3 different institutions (4 scanners), and for each patient, six 3D regions of interest on the obturator muscle were considered. The methods applied were min-max, 1st-99th percentiles and 3-Sigma normalization, z-score standardization, mean centering, histogram normalization, Nyul-Udupa and ComBat harmonization. The Mann-Whitney U-test was applied to assess features repeatability between scanners, by comparing the feature values obtained for each normalization method, including the case in which no normalization was applied. Most image normalization methods allowed to reduce the overall variability in terms of intensity distributions, while worsening or showing unpredictable results in terms of feature robustness, except for the z-score, which provided a slight improvement by increasing the number of statistically similar features from 9/93 to 10/93. Conversely, feature normalization methods positively reduced the overall variability across the scanners, in particular, 3sigma, z_score and ComBat that increased the number of similar features (79/93). According to our results, it emerged that none of the image normalization methods was able to strongly increase the number of statistically similar features.