In a paper published in the journal Big Data and Cognitive Computing, researchers introduced the Laplacian correlation graph (LOG) concept for stock trend prediction, explicitly modeling correlations between stock price changes as edges in a graph. Incorporating the LOG into machine learning (ML) models like graph attention networks (GATs), they developed a unique loss term, enabling effective leverage of price correlations among stocks.
Experimental results showcased significant improvements in predictive performance across various metrics and consistently enhanced capabilities of five base ML models. Backtesting revealed superior returns and information ratios, highlighting practical implications for real-world investment decisions.
Background
Previous research in stock price prediction includes statistical methods like simple moving averages and sophisticated models such as autoregressive integrated moving averages (ARIMA) and generalized autoregressive conditional heteroskedasticity (GARCH). Artificial intelligence (AI) techniques, like decision trees and support vector machines (SVMs), have been employed successfully, with ensemble methods often outperforming single classifiers.
Deep learning methods like multi-layer perceptrons (MLP) and recurrent neural networks (RNN), especially long short-term memory (LSTM) networks, have shown promise in capturing long-term dependencies. However, these methods typically overlook interdependencies between stocks, prompting the recent exploration of graph neural networks (GNNs) to improve forecasting by considering cross-stock correlations.
Framework Overview: Correlation & Laplacian
The authors present their framework built upon fundamental concepts, including the correlation matrix and the Laplacian graph. They begin by introducing the calculation of Pearson's correlation coefficient to measure the correlation between stock pairs, forming the correlation matrix. The Laplacian matrices of graphs are then explained, defining the adjacency matrix and degree matrix, which are used to derive the Laplacian matrix. This matrix is a basis for constructing the LOG, where stocks are represented as nodes and correlations as edges.
The authors detail the construction of the LOG, emphasizing using correlation coefficients as edge weights. They discuss methods for determining these weights, opting to directly utilize correlation coefficients without transformation to ensure higher weights are assigned to more similar stocks. Additionally, they introduce a modified weight matrix that is symmetric and compatible with graph theory, facilitating the formulation of the graph Laplacian.
Finally, the training loss design is described, which consists of two components: improving estimation accuracy and maintaining correlation. The authors introduce a base model, such as LSTM, to estimate stock returns and employ a mean squared error (MSE) loss function for accuracy assessment. They then incorporate the LOG into the loss function with a correlation penalty term, leveraging the Laplacian matrix to measure the smoothness of signals on the graph. The framework iteratively updates neural network parameters through optimization algorithms to minimize the total loss function, thus refining the model's predictive capabilities.
Experimental Validation
The authors embarked on a rigorous validation process for their proposed method, leveraging real-world data to assess its practical efficacy. Their experimental design revolved around two pivotal components: datasets and data processing. Within the dataset, they focused on two prominent stock pools in the Chinese market—the CSI100 and CSI300 indices—representing significant segments of the A-shares market.
These datasets provided a robust foundation for evaluating the proposed method's performance across various market conditions. Moreover, the authors utilized Alpha158 stock features from the Qlib platform, derived from fundamental components of stock data, ensuring a comprehensive assessment of their method's effectiveness. Data processing played a crucial role in preparing the datasets for training. The authors employed meticulous pre-processing steps to ensure data integrity and compatibility with their proposed method.
It included normalizing original data to standardize initial prices for each stock and calculating 158 features from fundamental stock components. Further refinement involved filling in missing values and employing cross-sectional rank normalization to normalize features across all stocks.
These meticulous data preparation steps laid the groundwork for robust model training and evaluation, setting the stage for comprehensive validation of the proposed method's predictive capabilities. The authors integrated their proposed LOG module into various base models, including MLP, GRU, LSTM, GATs, and Transformer, in their experimental setup.
The evaluation process encompassed a comprehensive array of metrics, including information coefficient, rank IC, and long position cumulative return, with additional considerations for transaction fees and trading limitations. The authors ensured a robust evaluation process by conducting experiments over multiple iterations and recording average values alongside standard deviations, providing insights into their proposed method's practical applicability and performance in real-world investment scenarios.
Conclusion
To sum up, the proposed LOG framework significantly improved the prediction of stock returns by directly capturing their correlation. Integration with various base models consistently enhanced performance across multiple evaluation metrics, promising higher returns and reduced risk in real investment scenarios.
While these findings highlighted the framework's utility and versatility, future work could explore using alternative pricing metrics, extending experiments to other financial markets, testing on additional models, and addressing potential limitations related to correlation coefficient calculation. Overall, the LOG framework presented a valuable addition to financial modeling tools, offering enhanced portfolio management strategies for practitioners and researchers alike.