In a recent submission to the arXiv* server, researchers explored distributed learning for Internet of Things (IoT) services with the integration of artificial intelligence (AI) and applications in emerging networks.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Background
The emergence of novel services and applications, including the Metaverse, autonomous vehicles, and satellite networks, has triggered a substantial demand for incorporating AI techniques into both the IoT and the upcoming sixth-generation (6G) wireless networks. This demand stems from the rapid growth in IoT connections, advancements in IoT computing hardware, and the remarkable success of AI in various domains, spanning military, computer vision, smart healthcare, and mobile and communication networking. The influx of IoT data, coupled with IoT devices' augmented computing and communication capabilities, necessitates the development of AI techniques to harness these evolving capabilities efficiently for 6G IoT.
Sharing raw data between mobile users and IoT devices via central cloud storage for learning purposes carries significant drawbacks, such as high data transmission costs, power consumption, and privacy concerns. To address these issues, there is a growing shift towards distributed learning in 6G IoT networks. Unlike centralized AI, distributed learning leverages distributed computing resources across the network, encompassing massive IoT devices, edge servers, and cloud computing.
Machine learning and distributed learning
Machine learning is a subset of AI that enables computers to make precise predictions using data. The machine learning process consists of two phases: training and inference. During training, models are built based on input data, which can be labeled or unlabeled. In conventional AI, training data is collected centrally in a cloud with powerful resources. In the inference phase, live data is input into the trained model to produce outputs such as identifying abnormal IoT devices or detecting attacks.
Federated Learning (FL) and Split Learning (SL): FL is a privacy-preserving ML technique that allows multiple devices to collaboratively train a model without sharing raw data. It involves the initialization of a global model, local training by users, and model aggregation on the edge server. SL is an effective approach when users are reluctant to share their raw data due to privacy concerns. In SL, a deep neural network is divided into parts, with some layers trained by clients using their data and the remaining layers trained on the server.
Multi-Agent Reinforcement Learning (MARL): It addresses multi-agent decision-making in a shared environment. MARL offers advantages such as parallel computation and experience sharing.
Distributed Inference: Distributed Inference is the phase where a trained ML model is deployed to make predictions on new data. It can be classified into cooperative inference, knowledge inference, model inference, and result inference. Cooperative inference involves dividing a large neural network for training among multiple devices. Knowledge inference, model inference, and result inference require IoT devices to fully train deep models, potentially leading to high computational workloads.
Distributed learning for IoT services
In the context of practical offloading applications, conventional cloud computing methodologies necessitate the transmission and aggregation of substantial data at centralized data centers, incurring high communication costs and delays. An alternative, edge computing, emerges as a promising paradigm for offloading tasks from numerous IoT devices, thereby mitigating data transmission and reducing latency. Nonetheless, both edge and cloud computing approaches grapple with privacy concerns attributed to raw data transmission and central processing. Herein, FL emerges as a solution, enabling distributed devices to collectively train a shared model without transmitting raw data to the cloud or edge server, thus minimizing communication costs and enhancing user privacy.
For real-time implications with dynamic workloads, the intricacies of computational offloading decisions require resource management schemes. FL, integrated with multi-agent deep reinforcement learning (DRL), optimizes resource allocation through local training orchestration, facilitating optimal offloading decisions.
To ensure sustainable learning procedures and cost-effective system operation, one must consider the long-term benefits of energy consumption and local computation at the agents. Traditional machine learning methods may prove advantageous for IoT-based localization services, offering robustness, scalability, accuracy, and complexity reduction.
Nevertheless, these approaches necessitate distributed clients transmitting copious raw data to the server for training the model, incurring high communication costs and privacy vulnerabilities. In this regard, FL presents a transformative technology to alleviate communication burdens and bolster client data privacy in IoT-driven localization services.
Applications and future directions
Distributed learning has numerous advantages when applied to IoT-enabled domains such as healthcare, smart grids, autonomous vehicles, aerial networks, and smart industry. Firstly, it facilitates the training of models with extensive datasets that exceed the capacity of a single device. Secondly, it enables training models across geographically dispersed devices while preserving data privacy, thereby enhancing machine-learning model performance by reducing latency. Thirdly, it can be harnessed for training models on devices with limited computational resources, leading to energy conservation.
Distributed learning holds great promise for enhancing IoT services and applications in future 6G networks, but it also faces significant challenges in the realms of security, privacy, communication efficiency, resource allocation, Metaverse integration, and standardization. These challenges necessitate ongoing research and collaborative efforts to fully realize the potential of distributed learning in the evolving landscape of IoT and 6G networks.
Conclusion
In summary, researchers explored the role of AI in advancing from 5G to 6G IoT, managing vast amounts of data, and addressing privacy concerns through distributed learning. The current study begins with an overview of AI and distributed learning methods, then delves into their application to emerging IoT services in 6G networks. Research in areas such as security, privacy, effective communication, resource allocation, and standardization is encouraged by the persistence of challenges.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.