In a paper published in the journal Npj Science of Food, researchers addressed the challenge of ensuring food safety in complex supply chains vulnerable to internal and external factors, including food fraud. They demonstrated the potential of Artificial Intelligence (AI) and Federated Learning (FL) to predict and prevent food fraud while preserving data privacy.
Utilizing a data-driven Bayesian Network (BN) model allowed the construction of a framework that effectively integrated data from various sources. This framework became a robust tool for food supply chain stakeholders. It enabled them to enhance decision-making regarding food fraud control while upholding data confidentiality.
Background
Many models have emphasized the significance of AI, particularly BN. It can manage extensive data from various sources, such as drones, mobile devices, and the Internet of Things (IoT) within food supply chains. This technology ensures food safety and counters fraud in these complex systems.
However, these studies have also shed light on the substantial challenges related to data sharing within the food industry. These challenges stem from the diverse interests and sensitivities among stakeholders in the food supply chain. Recognizing these obstacles, researchers explored the potential of FL as a solution.
FL allows data to remain under the ownership of individual stakeholders while algorithms collect parameters. This approach can automate negotiation processes and address the data-sharing issues hindering the food industry's progress. The limited adoption of FL in this sector underscores the pressing need to establish clear data-sharing guidelines and accessible ontologies to facilitate advancements in the field.
Proposed method
Federated Architecture and Data Sources: In this paper, the researchers employed the Vantage636 platform for FL, which was hosted at Wageningen University & Research. This platform ensured data security and collaboration among various stakeholders in the food supply chain and allowed only specific algorithms to run. The data from two major sources, the EU RASFF and US EMA databases were distributed among three data stations. Each data station was situated in a distinct Dutch city. These datasets encompassed information on food fraud types, product categories, years, and origin countries. It also provided metadata accessible via an internally hosted FAIR Data Point. Different data formats were used, and ontologies ensured semantic interoperability.
Federated BN Model and Experiments: The researchers implemented a BN model in a federated environment using the R package "bnlearn". The model was trained on various fraud-related characteristics. Two experiments were conducted: one comparing individual BN models at each data station to a combined model trained on aggregated data and another examining differences between federated and non-federated approaches. The former aimed to evaluate model performance, while the latter assessed the impact of federated settings on BN construction.
Experimental results
Demonstrating FL for Food Fraud Detection: This study exemplified the application of FL to address data-sharing challenges in food fraud detection. The available dataset was divided into three incomplete subsets. Each of these subsets varied in the number of food fraud cases, years, and types of fraud. This division aimed to simulate real-world scenarios where supply chain actors often grapple with limited and imbalanced data.
These subsets were hosted across three data stations located in different Dutch cities. For instance, STATION-1 contained data from 2008 to 2013, which focused on two types of food fraud, while other stations included additional fraud types and more recent data. The researchers aimed to showcase how FL could effectively resolve data-sharing issues and benefit decision-makers by sharing knowledge without exposing the source data.
Impactful Experiments and Insights: The FL infrastructure facilitated the development and validation of a BN model. This model served as a demonstration of knowledge sharing while upholding data privacy. Two distinct experiments were conducted to emphasize the significance of FL. In the first experiment, individual BN models were created, trained, and tested at each data station. This showcased the effectiveness of FL in maintaining model performance. The results indicated high accuracy for STATION-1 and -2 but a lower accuracy for STATION-3. However, the combined BN model demonstrated improved sensitivity for STATION-2.
In the second experiment, a BN was constructed on the total dataset without FL, highlighting federated settings' benefits. The findings revealed that the FL infrastructure maintained comparable model performance and accommodated data imbalances. This research underscores the potential of FL for secure and privacy-preserving knowledge sharing in domains where data privacy is a concern.
Conclusion
In conclusion, this study showcased the potential of FL as a robust solution to address data-sharing challenges in the context of food fraud detection within supply chains. FL demonstrated its capability to enhance decision-making and model performance by efficiently partitioning and sharing data. This helped in maintaining the security of sensitive information.
The experiments highlighted the advantages of FL in maintaining data privacy, even with imbalanced datasets. This research contributes to a better understanding of the applicability of FL in domains where data privacy is paramount, paving the way for its broader adoption. FL promises secure and privacy-preserving knowledge sharing, fostering collaboration and trust among supply chain stakeholders. This potentially optimizes resource utilization and cost reduction in food safety monitoring.