In a paper published in the journal Digital, researchers introduced a web-based malware detection system using deep learning. The primary goal was to create a robust model for classifying malware in executable files. The approach relied on static analysis and utilized a one-dimensional convolutional neural network (1D-CNN) for portable executable (PE) files. The method was seamlessly integrated into a user-friendly web interface. Empirical evidence supported the superiority of their approach in identifying malware within executable files.
Background
Deep learning has gained increasing prominence in recent years, emerging as a central theme across various domains. This trend has led to the development of more extensive and sophisticated models, particularly in detection and classification. These deep-learning techniques are introduced to disentangle complex data patterns and enable classification algorithms to work with learned representations instead of raw data. Traditional classification algorithms encounter performance limitations as the dimensionality of data grows. Thus, focusing on high-level features aids in the successful execution of classification tasks.
Previous research has demonstrated the effectiveness of machine learning (ML) in malware detection, with a recent shift towards deep learning. Traditional signature-based methods have limitations, prompting the adoption of ML and deep learning techniques. Various supervised ML algorithms have been used successfully for malware detection, but deep learning stands out for its feature extraction capabilities.
Additionally, deep learning models deployed in web-based applications have shown versatility, applied in touchless workspace access control, disaster management through social media analysis, and facial recognition-based attendance systems by highlighting their broader utility across domains.
Proposed Method
The proposed method employs a 1D-CNN specifically chosen for its compatibility with the structural characteristics of PE files. This approach leverages the power of 1D-CNN to capture localized features within the PE dataset by making it a practical choice for malware detection. The 1D-CNN architecture is distinct from traditional neural networks due to its incorporation of convolutional layers, which excel in capturing spatial features by utilizing convolutional operations. These layers play a vital role in comprehending complex data patterns, making them especially relevant for our malware classification task.
Each filter corresponds to specific axes in PE files, enabling the extraction of pertinent information. The network consists of a 1D-CNN layer for feature extraction and a fully connected layer for classification. To further enhance feature extraction capabilities, a different configuration of the 1D-convolutional layer in the proposed model is explored.
Experimental Analysis
In this section, the comprehensive experimental results and discussions are presented. The datasets used underwent normalization and the exploration of established techniques to establish uniformity and baseline performance. The evaluation metric selected for this study is accuracy. Additionally, the technical aspects of the implementation, which include the tools used and specific characteristics of the datasets, are also detailed. Subsequently, the core experimental outcomes are presented, demonstrating the developed models' accuracy. This thorough analysis is the foundation for further discussions and conclusions drawn from this study's findings.
Datasets and Baseline Methods Evaluation:
The study employed various PE header datasets, including the CLaMP dataset, the Benign and Malicious PE Files dataset, and MalwareDataSet. Data normalization was applied to ensure consistent scaling. Baseline methods were categorized into ML and deep-learning approaches. ML methods included Decision Tree (DT) and Support Vector Machine (SVM), while deep-learning methods encompassed architectures such as Two-Layer Deep Neural Network (2L-DNN), Four-Layer Deep Neural Network (4L-DNN), Seven-Layer Deep Neural Network (7L-DNN), and One-Dimensional Convolutional Neural Network with Long Short-Term Memory (1D-CNN with LSTM). The accuracy metric was utilized for model evaluation. The implemented models were trained and tested on these datasets, which showed very promising results. The 4L-DNN model outperformed others with an accuracy of 98.85% on the Benign and Malicious PE Files dataset and MalwareDataSet and 98.37% on the CLaMP dataset.
Web-Based System Implementation
The selection of a web-based framework for the malware detection system aimed to enhance accessibility for Windows users and address their specific cybersecurity needs. This approach bridges the gap between advanced malware detection technology and user requirements by offering a user-friendly experience. Due to its superior accuracy results, the chosen deep-learning model, MODEL1, was deployed in the web-based system. The implementation involved saving the model in the h5 format and creating a user-friendly web interface using HTML, CSS, and JavaScript. Additionally, a back-end system was developed using Flask and the pefile library for feature extraction. The system allows users to upload PE files and obtain immediate classification results.
Comparison with Existing Web-Based Systems
A comparison with existing web-based systems reveals that the proposed web-based malware detection system is a pioneering approach in the field. While other systems primarily focus on applications like facial recognition and disaster management, the system is the first of its kind specifically designed for malware detection. Unlike some existing systems that rely solely on deep learning, this approach incorporates a variety of M and deep-learning models for evaluation to contribute to its robustness and effectiveness in malware detection.
Conclusion
In summary, this paper introduces an effective deep-learning approach for classifying malicious software based on PE files using a 1D-CNN. While the results demonstrate the method's effectiveness, limitations exist. The ever-evolving nature of malware and the reliance on header-based features may pose challenges. Future research will focus on incorporating diverse malicious data to improve detection and classify malware into distinct types. This will provide deeper insights for security professionals and system administrators to enhance countermeasures.
Journal reference:
- Alqahtani, A., Azzony, S., Alsharafi, L., & Alaseri, M. (2023). Web-Based Malware Detection System Using Convolutional Neural Network. Digital, 3:3, 273–285. https://doi.org/10.3390/digital3030017, https://www.mdpi.com/2673-6470/3/3/17