Computer vision‐based recognition and distinction of Arabidopsis thaliana ecotypes using supervised deep learning models
Image‐based plant phenotyping has diverse applications, ranging from providing quantitative traits for genetic breeding to enhancing management practices for indoor and outdoor production systems. Misidentification of cell lines or ecotypes/varieties is a major problem across all biological research disciplines. With the 1000 Arabidopsis Genome Project facilitating the use of various ecotypes, it is crucial to verify the identity of ecotypes in discovery‐based genetic screens involving hundreds of ecotypes. To address this issue, an RGB image analysis pipeline was established for the accurate recognition of different Arabidopsis thaliana ecotypes. In the developed pipeline, the most crucial aspects for accurately capturing traits and training deep learning models were identified as follows: (i) assessment of data complexity using spatial‐temporal features of the RGB spectrum and data entropy, the latter defined as the variability within the dataset; (ii) data redefinition in instances of high data complexity; and (iii) data partitioning based on extracted morphological similarity among ecotype replicates. The pipeline includes several supervised deep learning models integrated into an auto‐optimization subsystem. Extensive hyperparameter tuning was performed to identify the best‐performing models for single‐image and image‐sequence ecotype classification. Two external datasets were evaluated to demonstrate the robustness of the pipeline, regardless of how they were collected. A graphical user interface is provided to prepare these images for input into the pipeline in cases of extreme variability. The pipeline can automatically verify ecotypes in large‐scale studies and extract traits for further analysis and correlation, as needed, using datasets from a variety of sources.