Automated Identification of Security Requirements: A Machine Learning Approach
Early characterization of security requirements supports system designers to integrate security aspects into early architectural design. However, distinguishing security related requirements from other functional and non-functional requirements can be tedious and error prone. To address this issue, machine learning techniques have proven to be successful in the identification of security requirements. In this paper, we have conducted an empirical study to evaluate the performance of 22 supervised machine learning classification algorithms and two deep learning approaches, in classifying security requirements, using the publicly availble SecReq dataset. More specifically, we focused on the robustness of these techniques with respect to the overhead of the pre-processing step. Results show that Long short-term memory (LSTM) network achieved the best accuracy (84%) among non-supervised algorithms, while Boosted Ensemble achieved the highest accuracy (80%), among supervised algorithms.