In an article recently posted to the Meta Research Website, researchers explored temporal graph neural networks (TGNNs) and their effectiveness in fraud detection and content recommendation applications. TGNNs faced challenges from noise in dynamic graphs, affecting accuracy.
Researchers proposed temporal adaptive sampling for enhanced representation in graph neural networks (TASER), an adaptive sampling method enhancing accuracy, efficiency, and scalability to combat this. TASER dynamically adjusted minibatch and temporal neighbor selection, utilizing graphics processing unit (GPU)-based tools to alleviate bottlenecks. Evaluation across popular datasets showed TASER outperformed baselines in mean reciprocal rank (MRR) with a training time speedup.
Related Work
Past work has focused on addressing the challenges posed by noise in dynamic graphs, which can significantly impair the accuracy of TGNNs. These challenges include deprecated links and skewed neighborhood distributions.
Traditional denoising techniques, such as edge dropping and human-defined heuristics, have been proposed but often require extensive tuning and may not adequately capture the diverse noise patterns in different nodes at different timestamps. Adaptive sampling methods have emerged as a promising solution, leveraging dynamic graph information, training dynamics, and task performance to learn personalized neighborhood sampling probability distributions.
Dynamic Graph Enhancement
Researchers introduce TASER, a robust temporal adaptive sampling method tailored for TGNNs to address temporal dynamics robustly. TASER addresses the challenges of noisy dynamic graphs by dynamically selecting high-quality training samples and informative supporting neighbors.
TASER utilizes an importance score mechanism to adaptively select training samples, enhancing the robustness of TGNNs to noise. Additionally, TASER employs a bi-level neighbor sampling approach, integrating GPU-based temporal neighbor finding and adaptive neighbor sampling to ensure the selection of informative supporting neighbors. The temporal adaptive mini-batch selection method prioritizes selecting training samples based on their importance scores, which dynamically update during forward propagation using dynamic model predictions.
TASER effectively reduces noise in the training samples by considering the confidence level of the model's predictions. Adaptive neighbor sampling focuses on selecting a fixed-size set of informative supporting neighbors from the pre-sampled neighborhood. This process involves encoding neighbors' temporal, frequency, and identity information to generate a customized neighborhood importance distribution, improving the quality of sampled neighbors.
TASER leverages a GPU-based approach utilizing the T-CSR data structure and block-centric parallel sampling design to expedite neighbor findings on dynamic graphs. This design efficiently identifies candidate neighbors for each target node by utilizing GPU resources and balancing the workload across different blocks. Moreover, TASER introduces GPU feature caching to mitigate the CPU-GPU feature slicing overhead.
TASER optimizes memory usage and accelerates training without compromising cache hit rates by dynamically caching edge features based on access frequency. TASER offers a comprehensive strategy for addressing noise in dynamic graphs, improving TGNN accuracy and efficiency through adaptive sampling, GPU-based neighbor finding, and feature caching. These techniques collectively enhance the performance of TGNNs on real-world dynamic graph data.
Dynamic Graph Experiment Evaluation
Researchers conducted experiments to evaluate TASER's performance across five dynamic graph datasets: Wikipedia, Reddit, flights, MovieLens, and the global database of events, language, and tone (GDELT). These datasets vary in size and characteristics, ranging from bipartite graphs without node features to large-scale knowledge graphs containing node and edge features. Tasks include predicting user posts, flight schedules, and news. The datasets were split chronologically into training, validation, and test sets to simulate real-world scenarios, with the latest one million edges used for training and evaluation.
Researchers implemented TASER on two state-of-the-art TGNN models: TGAT and GraphMixer. TGAT utilizes a 2-layer attention-based temporal aggregator, while GraphMixer employs a single-layer multilayer perceptron (MLP)-Mixer temporal aggregator. TASER enhances these models by dynamically selecting high-quality supporting neighbors without introducing additional input. To ensure a fair comparison, the number of supporting neighbors remains constant across all models.
The researchers set the hyperparameters for the TGNN models using the default parameters utilized in the TGL framework. These include learning rate, batch size, training epochs, and the number of supporting neighbors per node. For neighbor finding, researchers configured adaptive neighbor sampling with a fixed budget, and evaluation metrics included MRR for transductive temporal link prediction.
Researchers implemented TASER using Python 3.11, PyTorch 2.0.1, deep graph library (DGL) 1.1, and compute unified device architecture (CUDA) 12.2. Researchers conducted experiments on a machine equipped with dual 96-core AMD enterprise productivity compute (EPYC) 9654 central processing units (CPUs) and a single NVIDIA ray tracing extension (RTX) 6000 Ada GPU. TASER significantly improved accuracy and runtime efficiency compared to baseline TGNN models.
The experiments highlighted TASER's effectiveness in denoising training samples and supporting neighbors, leading to substantial accuracy gains. Additionally, TASER's optimizations, such as GPU neighbor finding and feature caching, contributed to notable speedups in training time, enhancing the scalability of TGNN models for large-scale dynamic graph data.
Conclusion
In conclusion, researchers proposed TASER, a novel temporal neighbor sampling method for efficient representation learning on dynamic graphs. By employing adaptive sampling techniques, including temporal adaptive mini-batch selection and temporal adaptive neighbor sampling, TASER enabled TGNNs to handle noise in dynamic graphs effectively.
Despite the increased neighborhood traversal, TASER addressed runtime bottlenecks by introducing two system optimizations: an efficient GPU neighbor finder and a dynamic GPU cache. Across two state-of-the-art TGNNs and five real-world datasets, TASER achieved an average accuracy improvement of 3.1% and a remarkable speedup on a single GPU.