In an article published in the journal Scientific Reports, researchers proposed a traffic-predicting model that can identify large flow sizes (elephant flows) and prevent network congestion in software-defined networks (SDN) using deep learning techniques.
This paper mainly focuses on the different approaches to enhancing the quality of service (QoS) in real-time applications like video streaming and voice over internet protocol (VoIP) calls. These applications generate data flows and require high bandwidth and low latency.
Background
SDN is a network architecture that separates the control plane from the data plane, enabling centralized and programmable control of network resources and behavior. This network architecture can potentially improve the QoS for various real-time applications.
However, one of the challenges in SDN is handling large and long-lived elephant flows. If elephant flow is not managed appropriately, it can create further congestion problems that can degrade network performance and affect user experience. Therefore, it is essential to detect and predict elephant flows in advance and allocate optimal routes to mitigate potential issues and ensure a consistently high level of network performance.
About the Research
Researchers designed a traffic-predicting model that can identify elephant flows and prevent network congestion in advance in SDN. The model uses deep learning algorithms to learn from historical traffic data and predict the probability of elephant flows in real time. They used the SHapley Additive exPlanations (SHAP) technique (an explainable artificial intelligence or XAI technique) to explain further the prediction of the model and the importance of those features that potentially impact the prediction of the model. They used an SDN dataset that contains 104,345 rows and 23 columns (attributes) related to traffic flows, such as packet size, byte count, protocol, and duration. The dataset is generated by using a mininet emulator and an Ryu controller.
Methodology
Researchers proposed a three-step methodology to develop and evaluate the traffic-predicting model. The first step is to cluster the traffic data into elephant and mice flows using an unsupervised algorithm called H2O, which is a deep learning framework that can automatically label traffic flows based on characteristics. The next step is to remove anomalies from the data using a deep autoencoder neural network algorithm that can reconstruct the input data with minimal bias. In the last step, they applied distributed random forest (DRF), gradient boosting machine (GBM), and eXtreme gradient boosting machine (XGBoost) algorithm for training and testing the model. All algorithms used here are supervised algorithms.
Key metrics such as accuracy, precision, and loss metrics were used to check the model's accuracy. Further, the model is explained using SHAP, as it can quantify the importance of each feature for the prediction and visualize the feature's importance and effects.
Research Findings
The findings show that the clustering model can label the traffic data into elephant and mice flows with 39.11% accuracy, which is close to the manual label assignment. Additionally, the autoencoder model can separate (or remove animalities) normal traffic flows from abnormal ones by a threshold value of 0.091.
The performance of all the implemented models was presented in terms of accuracy and loss after testing them on both the training and validation datasets. Among these, the DRF model stands out as the performer achieving 100% accuracy, with a loss of 0.00000408. SHAP helped explain how their models make predictions and determine which features significantly impact the outcomes of those predictions. Features such as protocol, source address, destination address, packet count, and byte count have the highest impact on predicting elephant flows, and they visualize the positive and negative contributions of each feature using force plots and summary plots.
Applications
This research has potential applications in various domains, including electricity, communications, and informatics. Specifically, the proposed model can be utilized to predict and clarify traffic patterns in network settings, such as sensor networks, cellular networks, vehicular networks, and multimedia networks. It can also be applied in areas such as VOIP, video streaming, online gaming, and the IoT (internet of things). It is also helpful in ensuring quality of service (QoS) standards like latency, bandwidth, and packet loss are met.
Conclusion
In conclusion, this paper comprehensively explains that a traffic-predicting model using deep learning and XAI techniques can achieve high accuracy and low loss in identifying elephant flows and preventing network congestion in SDN. It also shows how the SHAP technique can provide detailed explanations for the traffic prediction results and feature importance.
As per the findings, the traffic predicting and explaining model can be integrated into the SDN controller or switch for real-time and adaptive traffic management. Overall, the proposed model performs well but its performance and accuracy can be improved more using other advanced deep learning algorithms or more data.