Federated learning, a concept initially introduced in 2016 by Google, is a privacy-preserving, collaboratively decentralized technology that overcomes the challenges of data sensitivity and data silos. In this concept, several devices learn a machine learning model collaboratively under a central server's supervision without sharing their private data. This article deliberates on the applications and challenges of federated learning.
Need for Federated Learning
The rapid development of deep learning, smart production, machine learning, and artificial intelligence in recent years has resulted in data silos and data governance-related issues. Data governance has become a significant aspect with the promulgation of regulations like the General Data Protection Regulation (GDPR), which aims to protect users' data security and personal privacy by allowing them to become the absolute owners of their data.
Specifically, the GDPR requires organizations/institutions/operators to express user agreements clearly and prevent them from inducing or deceiving users to give up privacy requirements. Additionally, operators are also prevented from using user data without permission to train a model. Federated learning can tackle the issue of data silos while ensuring data privacy. In this machine learning scheme, multiple clients are coordinated using one or several central servers for decentralized machine learning settings.
Overview of Federated Learning
Federated learning is a set-up that allows multiple clients to collaborate under a central aggregator's coordination to solve machine learning problems. Federated learning is based on local computing and model transmission, which decreases the systematic costs and privacy risks involved in conventional centralized machine learning methods.
The client's original data is locally stored and cannot be migrated or exchanged. Using federated learning, each device utilizes local data for local training and uploads the model to the server for aggregation, and then the server sends the model update to participants to attain the learning goal.
Federated learning is related to distributed learning. Conventional distributed systems consist of distributed computation and distributed storage. The first proposed model update for federated learning for Android clients was similar to distributed computation to some extent.
Specifically, the Google-proposed federated learning is an encrypted distributed machine learning technology. The original federated learning concept was later extended to all decentralized privacy-preserving collaborative machine learning techniques.
Thus, federated learning can tackle both horizontally partitioned data based on samples and vertically partitioned data based on features in a collaborative learning setting. Federated learning can also be extended to incorporate cross-organizational enterprises into a federal framework to realize regional and cross-platform co-creation value on the premise of data privacy protection.
The TensorFlow Federated (TFF) framework and Federated AI Technology Enabler (FATE) are the mainstream open-source frameworks for federated learning. TFF is the first self-contained framework developed at a production level primarily for mobile devices. This framework integrates Secure Aggregation for privacy concerns and FedAvg for model updates.
Similarly, FATE is the first open-source industrial-level framework designed for serving a cross-organizational architecture. This Webank-created architecture ensures adequate privacy for clients based on secure multi-party computing and homomorphic encryption.
Federated Learning Types
Horizontal Federated Learning: This is utilized for cases in which every device contains a dataset with the same feature space and different sample instances. Google Keyboard utilizes this type of learning as the participating mobile phones possess diverse training data with similar features.
Vertical Federated Learning: This federated learning type is employed for cases in which every device contains a dataset with diverse features and the same sample instances. For instance, two organizations possessing data about the same group of individuals with different feature sets can utilize vertical federated learning to develop a shared machine learning model.
Federated Transfer Learning: This is similar to conventional machine learning, where a new feature can be added to a pre-trained model. For instance, an extension to vertical federated learning can be realized using federated transfer learning when the machine learning is extended to a larger number of sample instances that do not exist in all collaborating organizations.
Cross-device Federated Learning: Scenarios involving numerous participating devices employ cross-device federated learning. Incentive designs and client selection are notable techniques required to facilitate this type of federated learning.
Cross-silo Federated Learning: This is utilized when the participating devices are available for all rounds and are less in number. The training data can be in a vertical or horizontal federated learning format. Cross-silo is mostly used for cases with organizations.
Federated Learning Challenges
Although federated learning has several benefits, many challenges are also associated with it, including security challenges and training-related challenges. The heterogeneity of data utilized for training, heterogeneity of devices participating in the learning, and communication overhead during multiple training iterations are the key training-related challenges.
Security challenges primarily include security and privacy threats from adversaries like malicious users with only black-box access to the model or malicious clients in the local device. Although the private data does not leave the device in federated learning, a curious observer or an adversary can still learn the presence of a data point utilized for training in the local models.
Federated Learning Applications
The mobile edge computing (MEC) technology faces a higher risk of information leakage, which can be addressed using the combination of MEC and federated learning. An In-Edge AI framework has been developed which combines federated learning based on deep reinforcement learning with the MEC system to address the risk and optimize the resource allocation problem.
Similarly, a privacy-aware service placement scheme was developed using federated learning on MEC to provide high-quality service by caching the desired service on the edge server near the users. Smart home is an important applicable field of the Internet of Things (IoT).
In smart home architecture, devices upload data related to users' preferences to cloud servers to better learn user preferences, which can lead to data breaches. Thus, a sufficiently secure federated architecture was proposed to build joint models. Similarly, a federated multitask learning framework was developed for smart home IoT to learn user behavior patterns automatically to effectively detect physical hazards.
Moreover, a data fusion approach based on federated learning was presented for robot imitation learning in robot networking. This technique can also be leveraged for self-driving cars to foresee different emergencies and generate guide models.
A novel environmental monitoring framework was designed based on federated region learning for inconveniently interchangeable monitor data. Thus, monitoring data dispersed from different sensors could be used for superior collaborative model performance. Federated learning was applied to visual inspection tasks to solve the issue of lacking defective samples for defect detection in production tasks. This also offered privacy guarantees for manufacturers.
Federated learning is effective for image detection and representation, and the detection of malicious attacks in communication systems composed of unmanned aerial vehicles (UAVs) as the UAV features like unreliable communication conditions and unbalanced data distribution are similar to the challenges in federated learning.
A federated energy demand prediction method was designed for different electric vehicle charging stations to prevent energy congestion in the transmission process. Similarly, federated learning was leveraged for transactions owned by various banks to efficiently detect credit card fraud.
Electronic health records were used to determine whether a patient with heart disease should be hospitalized based on a federated learning algorithm known as cluster primal-dual splitting. This prediction task is accomplished either by hospitals possessing these medical data or by health monitoring devices without information leakage.
Health records were also used to identify similar patients scattered in various hospitals using a federated patient hashing framework without sharing patient-level information. Similarly, a loss-based adaptive boosting federated averaging algorithm was leveraged on drug usage data obtained from the MIMIC-III database for patient mortality rate prediction.
Model-contrastive federated learning is an effective and simple federated learning framework that can perform contrastive learning at the model level. This proposed framework significantly outperforms other state-of-the-art federated learning algorithms on different image classification tasks.
To summarize, federated learning allows training machine learning models without sharing private data. This approach is useful for various applications like smart homes, healthcare, and image classification while ensuring data privacy.
References and Further Reading
Li, L., Fan, Y., Tse, M., Lin, K. (2020). A review of applications in federated learning. Computers & Industrial Engineering, 149, 106854. https://doi.org/10.1016/j.cie.2020.106854
Mammen, P. M. (2021). Federated Learning: Opportunities and Challenges. ArXiv. https://doi.org/10.48550/arXiv.2101.05428
Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., Gao, Y. (2021). A survey on federated learning. Knowledge-Based Systems, 216, 106775. https://doi.org/10.1016/j.knosys.2021.106775
Li, Q., He, B., Song, D. (2021). Model-Contrastive Federated Learning. ArXiv. https://doi.org/10.48550/arXiv.2103.16257