In an article published in the journal Economies, researchers compared the performance of stochastic processes (Vasicek, geometric Brownian motion (GBM), and Cox–Ingersoll–Ross (CIR) models) and machine learning (ML) algorithms (random forest, K-nearest neighbors (KNN), and support vector machine (SVM)) in predicting stock indices ((financial sector (XLF), technology sector (XLK), healthcare sector (XLV)).
Results indicated that stochastic processes, particularly the CIR model, outperformed ML algorithms, though the latter offered more flexibility. The authors highlighted the need for optimizing ML hyperparameters to enhance their predictive performance.
Background
Modeling and predicting asset prices are crucial for financial market participants, influencing investment decisions to maximize returns and minimize risks. Traditionally, stochastic models like GBM, Vasicek model, and CIR model have been used to capture price dynamics, but they struggle with non-linearities and volatility changes. With advancements in technology, ML techniques such as random forest, SVM, and KNN have emerged, offering capabilities to model complex, non-linear relationships in financial data.
However, these approaches also have limitations and require careful evaluation. This study aimed to bridge the gap by comparing the performance of traditional stochastic models with ML algorithms using real data from the XLF, XLK, and XLV indices over ten years. By analyzing the advantages and disadvantages of each approach, the study identified opportunities to combine these methods, potentially enhancing forecast accuracy and providing valuable insights for financial professionals.
Stochastic Processes and ML Algorithms
Stochastic processes are fundamental for modeling financial asset prices and interest rates, capturing exponential growth, market volatility, and mean-reversion tendencies. GBM is commonly used for asset price fluctuations, while the Vasicek and CIR models focus on interest rates, with CIR accounting for positive rates and volatility.
In addition to stochastic models, ML algorithms such as random forest, KNN, and SVM are increasingly used for financial predictions. Random Forests are robust and handle large datasets well but can be computationally intensive. KNN is simple to implement but struggles with high-dimensional data. SVMs are effective in high-dimensional spaces but can be complex to train. The authors integrated these methods to provide robust financial forecasts, highlighting their respective advantages and limitations.
Methodology for Analyzing Financial Market
Daily historical data from Yahoo Finance for XLF, XLK, and XLV trackers over ten years (March 2014 - March 2024) formed the dataset. Key variables included open, high, low, close, volume, and adjusted close prices. Additional derived variables such as OpenClose, HighLow, and DiffVolume were computed to capture market dynamics.
The target variable, Y, indicated buy as 1 or sell as -1, signals based on the next day's adjusted closing price returns. Exploratory data analysis (EDA) on the XLF index revealed patterns in variables, dividing the data into buy and sell sets. Stochastic models predicted daily returns and generated buy/sell signals. These models used historical data for parameter estimation. For ML, data was divided into training and test sets. Models were trained on the training data and evaluated on the test set using precision, recall, and F1 score metrics to assess predictive performance.
Evaluation of models
The researchers evaluated the effectiveness of stochastic processes and ML algorithms in predicting stock index movements across different sectors. Key findings included high metric values for stochastic models, with the Vasicek and CIR models outperforming the GBM model due to their mean-reversion properties. The CIR model, in particular, achieved metrics of 99% across all sectors, highlighting its superior performance.
In contrast, ML algorithms demonstrated around 70% accuracy, with SVM slightly leading. It's important to note that stochastic models predicted the actual index value, while ML algorithms predicted the trend. The high accuracy of stochastic models was attributed to their use of the current day's real value for predictions.
The authors suggested a hybrid approach, combining both methods to enhance prediction quality. Stochastic processes were recommended when the number of predicted values was within the model's memory, while ML algorithms were better for longer-term predictions. Future research could explore recursive methods to use predicted values for further predictions and optimize ML algorithms through hyperparameter tuning.
The study's findings provided valuable insights for traders, emphasizing the strategic application of different models based on specific prediction needs and market dynamics. Further analysis across various sectors could reveal sector-specific model performance.
Conclusion
In conclusion, the researchers demonstrated that stochastic processes, particularly the CIR model, outperformed ML algorithms in predicting stock index movements when using the current day's real value. The high metrics observed for stochastic models emphasized their effectiveness in financial forecasting.
However, the flexibility of ML algorithms and their potential for optimization through hyperparameter tuning remained valuable. A hybrid approach, leveraging both stochastic and ML methods, offered a promising path for enhanced prediction accuracy. Future research should explore recursive prediction methods and sector-specific analyses to further refine these models and their applications in financial markets.
Journal reference:
- Bouasabah, M. (2024). A Performance Analysis of Stochastic Processes and Machine Learning Algorithms in Stock Market Prediction. Economies, 12(8), 194. DOI: 10.3390/economies12080194, https://www.mdpi.com/2227-7099/12/8/194