Detecting Retail Crime with AI: A Game-Changing Strategy

In a paper published in the journal Nature, researchers proposed a machine learning strategy to identify and classify the rise of organized retail crime (ORC) listings on a well-known online marketplace. This has become a significant challenge for retailers and consumers, particularly with the surge of online commerce and digital platforms. Swiftly detecting and responding to ORC is crucial to mitigate its impact. Leveraging supervised learning and advanced techniques in the proposed strategy, the method achieves a remarkable recall score of 0.97 on the holdout set and 0.94 on the testing dataset, employing a refined set of 45 features from the original 58.

Study: Detecting Retail Crime with AI: A Game-Changing Strategy. Image credit: Rawpixel.com/Shutterstock
Study: Detecting Retail Crime with AI: A Game-Changing Strategy. Image credit: Rawpixel.com/Shutterstock

Background

In the context of a rapidly expanding internet commerce landscape and the surge of online activities prompted by the COVID-19 pandemic, the prevalence of cybercrime and fraud has risen, posing severe economic and security challenges. Detecting and responding to such threats are imperative, but traditional prevention methods are not foolproof, and detection approaches have shown limitations.

The rapid growth of e-commerce platforms like Yahoo and eBay has been accompanied by a surge in online fraud cases, presenting a substantial challenge. Categorized by the Internet Fraud Complaint Center (IFCC) into various types, including non-delivery of goods, product misrepresentation, and multiple bidding, online fraud has spurred research into diverse detection strategies. Feedback anomaly detection methods, data mining schemes, and trust management solutions have been explored.

Addressing the issue of skewed data distribution, an imbalance between fraudulent and legitimate instances, researchers have employed data-level and algorithmic approaches. Data-level rebalancing involves techniques like undersampling and oversampling, with Synthetic Minority Oversampling Technique (SMOTE) emerging as a superior oversampling method. Algorithmic solutions, such as cost-sensitive learning, aim to manage class imbalance, with data-level methods generally outperforming algorithm-level strategies.

The proposed method tackles the issue by presenting a machine-learning solution to identify and combat organized retail crime (ORC) in online marketplaces. Through supervised learning and advanced methods, the approach achieves high recall scores on holdout and testing datasets.

Proposed method

The framework includes four experiments to identify the optimal organized retail fraud detection model. Numeric features are extracted and preprocessed in the design named individual classifiers, where seven classifiers are trained without asymmetry resolution techniques. Grid search with stratified k-fold cross-validation is employed for hyperparameter tuning.

Using the same data, an ensemble is constructed by stacking seven classifiers. This approach, called stacked generalization, combines predictions from these models via a meta-model trained on out-of-fold predictions from k-fold cross-validation of the base models. This framework addresses class asymmetry due to the imbalance between fraudulent and non-fraudulent cases in fraud data. This phase results in the optimal class rebalancing technique - classifier combination for the context, elaborated further in the detailed class resolution approach.

The employed section covers the utilized classifiers, experimental configurations, and data preprocessing steps. Historical data from a prominent worldwide online marketplace is utilized to detect ORC instances, focusing on 3606 high-volume sellers based in the US. The dataset encompasses numeric, category, and text data types, with text features having a limited impact. The preprocessing stage addresses duplicates, missing data, and outliers. Feature engineering entails generating predictive attributes through encoding, dummy columns, and new features derived from titles and descriptions. Established and new classifiers are incorporated, guided by expert insights from ORC professionals with experience in fraud detection and mitigation.

Handling an "unbalanced data problem," denoting a skewed distribution of data between classes38, is crucial due to the hindered performance of many machine learning algorithms in such scenarios. As a solution, adaptations of the Synthetic Minority Oversampling Technique (SMOTE) are applied in this research. SMOTE involves generating synthetic instances for the minority class, different from conventional oversampling. This synthesis relies on Euclidean distances between nearest neighbors and follows these steps: (1) compute the distance between the feature vector and its nearest neighbors; (2) multiply this difference by a random value between 0 and 1 and add it to the feature vector.

Experimental Analysis

In the context of imbalanced data, the evaluation employs repeated stratified k-cross validation, highlighting Gaussian Naive Bayes' high recall but lower accuracy and true positive predictions. Tree-based models, particularly the tuned random forest, emerge with the best F1 score after hyperparameter tuning. Transitioning to out-of-sample data, classifiers experience performance degradation due to evolving fraud behavior, with tree-based models maintaining their superiority.

Shifting to data balancing, data-level techniques like Random Oversampling (ROS) outperform algorithms, and the balanced random forest algorithm excels in optimizing recall. The framework highlights the importance of feature selection, preprocessing, and class imbalance resolution, underscoring the necessity for regular retraining in a dynamic fraud detection landscape. It attains a leading recall score of 97.5% on in-sample data and 94.9% on out-of-sample data, compared to 92.8% and 81.9%, respectively.

Conclusion and future work

E-commerce platforms like the digital marketplace operated by Meta and eBay face ongoing cybersecurity challenges due to organized retail crime (ORC). Detecting fraudulent activities in this context is increasingly complex, with abundant user data and transactions. The research presents an advanced fraud detection approach that utilizes supervised machine learning, surpassing traditional rule-based and unsupervised methods in terms of accuracy and effectiveness.

The comprehensive framework integrates expert-derived feature discovery, customized data processing, imbalanced learning, careful feature and model selection, precise hyperparameter tuning, and business-relevant performance metrics to achieve superior results. The limitations of single-stage trials are addressed, setting the approach apart. While primarily utilizing numeric and categorical features, future research could investigate the efficacy of multimodal features to enhance ORC detection performance.

Journal reference:
  • Mutemi, A., & Bacao, F. (2023). A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces. Scientific Reports, 13:1, 12499. DOI: 10.1038/s41598-023-38304-5, https://www.nature.com/articles/s41598-023-38304-5.
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2023, August 21). Detecting Retail Crime with AI: A Game-Changing Strategy. AZoAi. Retrieved on December 26, 2024 from https://www.azoai.com/news/20230806/Detecting-Retail-Crime-with-AI-A-Game-Changing-Strategy.aspx.

  • MLA

    Chandrasekar, Silpaja. "Detecting Retail Crime with AI: A Game-Changing Strategy". AZoAi. 26 December 2024. <https://www.azoai.com/news/20230806/Detecting-Retail-Crime-with-AI-A-Game-Changing-Strategy.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Detecting Retail Crime with AI: A Game-Changing Strategy". AZoAi. https://www.azoai.com/news/20230806/Detecting-Retail-Crime-with-AI-A-Game-Changing-Strategy.aspx. (accessed December 26, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2023. Detecting Retail Crime with AI: A Game-Changing Strategy. AZoAi, viewed 26 December 2024, https://www.azoai.com/news/20230806/Detecting-Retail-Crime-with-AI-A-Game-Changing-Strategy.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Identifies Seismic Precursors, Advancing Earthquake Forecasting Capabilities