Application of neural networks to the prediction of a phenotypic trait of pacific lampreys based on single nucleotide polymorphism (SNP) genetic markers
The relationship between single nucleotide polymorphisms (SNPs) and phenotypes is noisy and cryptic due to the abundance of genetic factors and the influence of environmental factors on complex traits, which makes the idea of applying artificial neural networks (ANNs) as universal approximates of complex functions promising. In this study, we compared different ANN architectures and input parameters to predict the adult length of Pacific lampreys, which is the primary indicator of their total migratory distance. Feedforward and simple recurrent network architectures with a different range of input parameters and different sizes of hidden layers were compared. Results indicate that the highest performing ANN had an accuracy of 67.5% in discriminating between long and short specimens. Sensitivity and specificity were 62.16% and 70.73%, respectively. Our results imply that feedforward ANN architecture with a single hidden neuron is enough to solve the problem of specimen classification. Nonetheless, while ANNs are useful at approximating functions with unknown relationships in the case of SNP data, additional work needs to be performed to ensure that the chosen SNP markers are related to functional regions related to the examined trait, as the use of non-specific markers will result in the introduction of noise into the dataset.