In an article recently submitted to the ArXiv* server, researchers introduced a novel approach named Unlink to Unlearn (UtU) to simplify edge unlearning in graph neural networks (GNNs).
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Privacy risk in GNNs
GNNs have attained significant success in an extensive range of tasks. However, these networks also come with the privacy leakage risk as the training data for GNNs primarily consists of sensitive personal information, which can be remembered implicitly within model parameters.
Recent legislation has granted individuals the right to be forgotten to address these privacy concerns. This right allows individuals to request service providers to remove their private information from online platforms. Thus, the machine unlearning concept has emerged, allowing efficient and quick specific data removal from a trained model instead of retraining a new model from the beginning.
Machine unlearning is also effective in fixing models affected by outdated, noisy, or poisoned training data. Unlearning in GNNs is crucial to enforce the right to be forgotten by complying with the data owners'/users' requests for selective data removal from trained GNNs.
The problem of over-forgetting
Edge unlearning is a critical unlearning scheme in graphs due to its crucial role in practical applications like protecting edge privacy in social networks. Recently, state-of-the-art approaches such as GNNDelete have realized excellent performance in edge unlearning by eliminating the influence of specific edges.
However, over-forgetting is a major limitation of these approaches. Over-forgetting occurs due to the inadvertent removal of excessive information beyond specific data by the unlearning process/algorithm, which leads to substantially reduced prediction accuracy for the remaining edges. Thus, the unlearned model performance on retained data significantly deteriorates compared to the model retrained from the beginning using the retained data. GNNDelete’s loss functions are the key source of the over-forgetting phenomenon.
The proposed approach
In this study, researchers developed UTU, a novel method that can exclusively facilitate unlearning through unlinking the forget edges from the original graph structure by simplifying GNNDelete to effectively address the issue of over-forgetting. Investigation by researchers has revealed design deficiencies in the loss functions of GNNDelete.
The first loss function, designed to eliminate the forgotten edge influence, opted for an unsuitable optimization objective. Thus, this loss function was the primary contributor to the over-forgetting phenomenon. The other loss function, designed to mitigate the over-forgetting issue, failed to prevent the decline of the retaining edge performance.
Thus, researchers deprecated both loss functions in GNNDelete and derived UtU. The proposed method eliminates the need for complex parameter optimization, significantly reducing computation overhead. UtU effectively eliminates the forgotten edge influence by modifying the input graph structure, thus obstructing the corresponding message-passing paths in GNN during the inference stage.
Experimental evaluation
Four real-world datasets, including collaboration networks OGB-collab and CS and citation networks PubMed and CoraFull, were used to perform the experiments. 90% of data was utilized for the training set, 5% for the test set, and 5% for the validation set. Two-layer GNNs, including GIN, GCN, and GAT, were employed for GNN backbones.
Additionally, GNNDelete, GIF, Gradient Ascent, and Retrain from scratch were used as baseline methods. A GNNDelete variant obtained by only removing the deleted edge consistency (DEC) loss and designated as GNNDelete-neighborhood influence (NI) was also used for comparison.
All models were trained on the link prediction task, and then edge unlearning was performed. Forgotten edges were randomly selected from the training set. The forgotten edge proportion was varied to evaluate the algorithm performance under various scenarios.
ROC-AUC was utilized as a metric to assess the downstream tasks for link prediction. The unlearning effectiveness was evaluated by comparing the unlearned model with the retrained model using JS divergence and ROC-AUC of membership inference (MI) Attack. Moreover, the over-forgetting of models was also compared.
Significance of this study
Experimental results demonstrated that UtU's efficacy in unlearning, its output distribution, and its performance on downstream tasks were more aligned with the performance of the retrained model, which is considered the gold standard of unlearning.
UtU delivered privacy protection on par with that of a retrained model with near-zero computational overhead while preserving high accuracy in downstream tasks. Specifically, UtU maintained over 97.3% and 99.8% of the retrained model's privacy protection capabilities and link prediction accuracy, respectively.
The proposed method remained unaffected by over-forgetting irrespective of the forget set's size. Moreover, the UtU's predictions of the retained set were highly consistent with those of the retrained model. To summarize, the findings of this study demonstrated the effectiveness of UtU as a practical and lightweight edge unlearning solution requiring constant computational demands.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Tan, J., Sun, F., Qiu, R., Su, D., Shen, H. (2024). Unlink to Unlearn: Simplifying Edge Unlearning in GNNs. ArXiv. https://doi.org/10.48550/arXiv.2402.10695, https://arxiv.org/abs/2402.10695