Modelling Stand Variables of Beech Coppice Forest Using Spectral Sentinel-2A Data and the Machine Learning Approach
Background and Purpose: Coppice forests have a particular socio-economic and ecological role in forestry and environmental management. Their production sustainability and spatial stability become imperative for forestry sector as well as for local and global communities. Recently, integrated forest inventory and remotely sensed data analysed with non-parametrical statistical methods have enabled more detailed insight into forest structural characteristics. The aim of this research was to estimate forest attributes of beech coppice forest stands in the Sarajevo Canton through the integration of inventory and Sentinel S2A satellite data using machine learning methods. Materials and Methods: Basal area, mean stand diameter, growing stock and total volume data were determined from the forest inventory designed for represented stands of coppice forests. Spectral data were collected from bands of Sentinel S2A satellite image, vegetation indices (difference, normalized difference and ratio vegetation index) and biophysical variables (fraction of absorbed photosynthetically active radiation, leaf area index, fraction of vegetation cover, chlorophyll content in the leaf and canopy water content). Machine learning rule-based M5 model tree (M5P) and random forest (RF) methods were used for forest attribute estimation. Predictor subset selection was based on wrapping assuming M5P and RF learning schemes. Models were developed on training data subsets (402 sample plots) and evaluations were performed on validation data subsets (207 sample plots). Performance of the models was evaluated by the percentage of the root mean squared error over the mean value (rRMSE) and the square of the correlation coefficient between the observed and estimated stand variables. Results and Conclusions: Predictor subset selection resulted in a varied number of predictors for forest attributes and methods with their larger contribution in RF (between 8 and 11). Spectral biophysical variables dominated in subsets. The RF resulted in smaller errors for training sets for all attributes than M5P, while both methods delivered very high errors for validation sets (rRMSE above 50%). The lowest rRMSE of 50% was obtained for stand basal area. The observed variability explained by the M5P and RF models in training subsets was about 30% and 95% respectively, but those values were lower in test subsets (below 12%) but still significant. Differences of the sample and modelled forest attribute means were not significant, while modelled variability for all forest attributes was significantly lower (p<0.01). It seems that additional information is needed to increase prediction accuracy, so stand information (management classes, site class, soil type, canopy closure and others), new sampling strategy and new spectral products could be integrated and examined in further more complex modelling of forest attributes.