Mathematical Formalization of Vectorization and Ensemble Learning Algorithms for Talent Detection in Educational Systems
The development of reliable Decision Support Systems (DSS) for talent identification requires a rigorous analytical framework capable of processing high-dimensional educational data. This paper presents the mathematical formulation of the machine learning pipeline utilized for classifying student potential, focusing on the algebraic structure of data representation and the optimization of predictive algorithms. We formally define the mapping of unstructured textual attributes into sparse vector spaces using One-Hot Encoding and analyze the dimensionality reduction effects. The study details the training dynamics of classification models, specifically examining the cost function minimization in Decision Trees via the Gini Impurity index and the stochastic aggregation mechanisms within Random Forest ensembles. Furthermore, to address the challenge of class imbalance, we provide a formal definition of performance metrics, including the harmonic mean of precision and recall and the arithmetic mean of indicator functions for Global Top-K Accuracy. By establishing these mathematical foundations, the paper demonstrates how formal optimization directly correlates with the discriminative power and stability of AI-driven educational assessments.