In a paper published in the journal Information, researchers addressed the challenge of gathering data on customer browsing activities in physical retail stores, proposing using Radio-Frequency Identification (RFID) technology integrated into store shelves and machine learning models for analysis. They tracked product movement with RFID tags, collected customer behavior data via receive signal strength (RSS) of the tags, and used machine learning, including Isolation Forest (iForest) Outlier Detection, Adaptive Synthetic Sampling (ADASYN) data balancing, and Multilayer Perceptron (MLP), to classify shopping activities.
The results demonstrated significant improvements in accuracy, precision, specificity, recall, and the f1-score compared to other models. The integrated model was also showcased in a web-based application, offering valuable insights for store managers to enhance customer preferences, product placement, promotions, and recommendations.
Background
In physical retail, understanding customer behavior can be challenging compared to online stores. RFID technology uses wireless communication to track items with RFID tags to gain insights into customer behavior. RFID is crucial for various industries, offering real-time data and efficiency improvements. In physical stores, combining RFID data with machine learning models, such as the MLP, helps analyze customer behavior effectively. Researchers address challenges such as outliers and imbalanced datasets to enhance predictive system performance using methods like iForest and ADASYN.
Related Work
Previous work has focused on integrating RFID technology and machine learning for customer behavior analysis in physical stores. Retailers have applied RFID technology to enhance intelligent trolley systems and track customer shopping paths. Active utilization of machine learning models has employed RSS data to classify shopping activities. Researchers have actively engaged techniques like iForest for outlier detection and ADASYN for data balancing to enhance model accuracy. MLP models have shown promise in various domains, but their performance can be affected by outliers and imbalanced data, necessitating preprocessing methods to improve accuracy.
Proposed Method
This study aimed to employ machine learning models to identify customer shopping behavior patterns using data from RFID readings, explicitly focusing on the RSS data. The machine learning model was designed for this purpose and utilized various data preprocessing and model enhancement techniques. The dataset underwent preprocessing steps to remove inconsistent entries and handle missing values. In addition, the iForest method eliminated anomalies from the dataset. The ADASYN technique created synthetic instances of the minority class to address the class imbalance problem. The MLP algorithm conducted prediction, and the model assessed its effectiveness through stratified 10-fold cross-validation.
Ultimately, the trained model integrates into a web-based application for improved accessibility to end users. The study's dataset originated from RFID readings collected in a controlled laboratory setting, simulating customer interactions with products in a retail store. Two types of customer behaviors included "no action" (indicating a lack of interest in the product) and "browsing" (exploring the product). Time-domain features are extracted from the RSS data to capture relevant information for distinguishing between these behaviors. The dataset is then transformed into input features and output labels for machine learning.
Two critical techniques enhanced the dataset and model performance. Firstly, iForest outlier detection identified and removed anomalies from the data. Subsequently, the ADASYN method addressed class imbalance by generating synthetic samples for the minority class. This balanced dataset served for training and evaluation.
The chosen machine learning model, MLP, was employed for customer behavior prediction, leveraging its capacity to handle non-linear relationships in tabular data. The study also considered various alternative machine learning models for performance comparison. The evaluation of these models occurred through the utilization of stratified 10-fold cross-validation, and model effectiveness depended on critical metrics such as true positives, true negatives, false positives, and false negatives. This comprehensive approach aimed to provide valuable insights into customer shopping behavior patterns and enhance accessibility for end users through a web-based application.
Results and Discussion
The study evaluated the effectiveness of the proposed model in distinguishing customer shopping behaviors using RFID sensor data and supervised machine learning techniques. This model significantly improves critical metrics such as accuracy, precision, specificity, recall, and the F1 score. Notably, it excels in handling datasets with class imbalances, demonstrated by a remarkable Area Under the Curve (AUC) value of 0.98 in Receiver Operating Characteristic (ROC) analysis.
Additionally, it explores the positive impact of incorporating outlier detection with iForest and data balancing techniques like ADASYN on model accuracy. Results show notable enhancements, emphasizing the importance of thorough validation for synthetic data generation methods to ensure their effectiveness and avoid potential biases. The study also highlights its contributions through a comparative analysis with previous RFID-based research on customer behavior detection.
Furthermore, the section discusses the practical application of a web-based system for predicting customer shopping behavior. This application offers valuable insights for store optimization and improved customer experience while acknowledging the challenges associated with web-based application development, such as security, scalability, compatibility, and user feedback management.
Conclusion
To sum up, this study harnesses RFID technology and machine learning, particularly the MLP model with iForest and ADASYN, to effectively analyze customer behavior. By extracting time-domain features from RFID data, the model accurately distinguishes between browsing and disinterest, surpassing alternative models in performance. Integrating into a web-based system aids managerial decision-making store layout optimization and enhances the shopping experience. Future research should explore more complex real-world scenarios with larger datasets and various techniques.