Application of KNN and Decision Tree Classification Algorithms in the Prediction of Education Success from the Edu720 Platform
Data mining is the process of knowledge discovery in a certain amount of data. Knowledge discovery in data reflects in the application of sophisticated machine learning methods such as regression, classification, clustering, etc. The focus of this study is the analysis of data from the real production system called Edu720, which is intended for internal education of employees in companies and which is used by numerous companies in Bosnia and Herzegovina and its region. A complex process of data preprocessing, including data cleaning and data transformation, was applied to the considered data set so it can be used in numerous classification tasks. The main goal of this study is to predict the success of the education that the company wants to set up for its employees. Information such as the number of questions in education, the average number of words per question in certain education, the number of employees and the duration of the educational video resource represented in seconds were used as attributes for applied classification methods. Class output represents the level of success for certain educations. K-nearest neighbors and decision tree algorithms were used for classification tasks and the accuracy of the classification was determined by the holdout method. The influence of applying the more sophisticated method for data set partitioning, which uses the K-means clustering method, is also presented.