Novel Method for Realistic Evaluation of ML Models in Software Bug Assignments

Download PDF Copy

By Muhammad OsamaReviewed by Susha Cheriyedath, M.Sc.Dec 10 2023

In an article published in the journal Scientific Reports, researchers discussed different techniques for creating train-and-test datasets for evaluating machine learning (ML) models related to software bug assignments. The study proposed a novel method that respects the real use conditions of the applications and reflects the possible scenarios of bug report handling.

*Study: Novel Method for Realistic Evaluation of ML Models in Software Bug Assignments. Image credit: Alexander56891/Shutterstock*

Background

ML is a branch of artificial intelligence (AI) that enables computers to learn from data and make predictions or decisions. ML models can be trained on various types of data, such as text, images, audio, or numerical values, and can perform various tasks, such as classification, regression, clustering, or recommendation. They can be evaluated using different metrics, such as accuracy, precision, recall, or F1-score, depending on the task and the data. However, these metrics depend on how data splits into training and testing sets and are used to train and test the ML model.

Software bug reports are documents that describe the unexpected or faulty behavior of a software system such as malfunction, security vulnerability, or performance issues. These are usually created by testers or users and must be assigned to the appropriate group of professionals for further analysis and resolution.

ML techniques such as classification, clustering, or ranking can automate or assist the bug reporting process by learning from historical data and predicting the best group for a new bug report. However, evaluating such techniques requires careful consideration of the temporal aspects of the data, such as the dates of reporting and solving the bug reports, introduction of new features, error codes, or configuration parameters, which may change over time and affect the features and labels of the data.

About the Research

This paper explains the limitations of existing methods such as random split with shuffling, cross-validation, split by creation date, time-based split by reporting date, and time-based split by solving date that split the data into training and testing sets. These methods do not use reporting and resolving dates of the bug reports, which are important for reflecting the real-world scenarios and the production conditions of the applications, thus affecting the accuracy and reliability of the evaluation results.

To address this gap, the study introduces a novel method for splitting the data using single or multiple division points based on the reporting and bug-solving dates. This method ensures that the train set only contains resolved cases before the date of prediction, while the test set comprises reported cases after the date of prediction, i.e., the date when the ML model assigns a new software bug report. This way, the method mimics the real use cases of the applications and avoids the unrealistic use of future data for training or testing.

The authors illustrate the novel method with an example diagram and compare it with the existing methods using a real-world dataset of bug reports from a telecom company. They show that the novel method produces significantly lower accuracy, precision, and recall but gives more realistic and reliable results when trained on historical data and tested on the new data.

Moreover, their method can avoid the problems of data leakage, overfitting, or underfitting, which may occur when using state-of-the-art methods. Further, experimental factors such as accuracy, precision, recall, and F1-score are used to evaluate the performance of the novel method.

The experimental protocol involved four different evaluation methods: random split with shuffling (20% for test data), cross-validation (5 folds), split by date of reporting (8 months for train, 2 months for test), and split by date of solving with the use of novelty (8 months for train, 2 months for test) of evaluation for the task of assigning bug reports.

Research Findings

The outcomes show that previously used evaluation methods are not appropriate, as they do not reflect the real use cases and may overestimate the performance of the models because they may include some software bug reports that are similar to the ones in the train set. The authors conducted experiments using the novel method of train-and-test datasets based on the time dependencies between the reporting dates and solving the bugs.

They showed that the new method produces significantly different and more realistic results in real use scenarios, where the software bug reports constantly change and evolve more than the standard methods. Moreover, they analyzed the impact of different factors, such as the size of the train set, the duration of the test set, the number of classes, and the distribution of classes, on the performance of their method.

The present research has potential application in various fields such as telecommunication, software quality prediction, software defect prediction, or software maintenance prediction, where software bug reports are common and must be handled carefully. This can improve the automation of the software bug reports assignment process, providing a more realistic and reliable evaluation technique for the ML models. It can help practitioners to develop and maintain bug-free software applications and systems.

Conclusion

In summary, the paper presents a novel method for building test and train evaluation datasets. This method depends on time and ensures that the test set only contains bug reports created after the latest date of solving the bug reports in the train set. Furthermore, it compares the proposed method with state-of-the-art methods and shows that the results are significantly different and more realistic.

Therefore, the proposed method is more appropriate and reliable for evaluating ML models related to software bug report assignments. The authors acknowledged some challenges in their method, such as the difficulty of choosing the optimal date of prediction, the possibility of data imbalance or sparsity, or the need to update the ML model frequently. Hence, they suggest future directions for further research, such as using more complex and dynamic ways of selecting the division points and applying the method to other domains and tasks involving temporal data.

Journal reference:

Chmielowski, L., Kucharzak, M. & Burduk, R. Novel method of building train and test sets for evaluation of machine learning models related to software bugs assignment. Sci Rep 13, 21512 (2023). https://doi.org/10.1038/s41598-023-48617-0, https://www.nature.com/articles/s41598-023-48617-0

Posted in: AI Research News

Comments (0)

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Osama, Muhammad. (2023, December 14). Novel Method for Realistic Evaluation of ML Models in Software Bug Assignments. AZoAi. Retrieved on August 04, 2025 from https://www.azoai.com/news/20231210/Novel-Method-for-Realistic-Evaluation-of-ML-Models-in-Software-Bug-Assignments.aspx.
MLA
Osama, Muhammad. "Novel Method for Realistic Evaluation of ML Models in Software Bug Assignments". AZoAi. 04 August 2025. <https://www.azoai.com/news/20231210/Novel-Method-for-Realistic-Evaluation-of-ML-Models-in-Software-Bug-Assignments.aspx>.
Chicago
Osama, Muhammad. "Novel Method for Realistic Evaluation of ML Models in Software Bug Assignments". AZoAi. https://www.azoai.com/news/20231210/Novel-Method-for-Realistic-Evaluation-of-ML-Models-in-Software-Bug-Assignments.aspx. (accessed August 04, 2025).
Harvard
Osama, Muhammad. 2023. Novel Method for Realistic Evaluation of ML Models in Software Bug Assignments. AZoAi, viewed 04 August 2025, https://www.azoai.com/news/20231210/Novel-Method-for-Realistic-Evaluation-of-ML-Models-in-Software-Bug-Assignments.aspx.