Unleashing the Power of High-Frequency Financial Data: A Novel Methodology for AI-Driven Intraday Trend Forecasting

Download PDF Copy

By Dr. Sampath LonkaReviewed by Susha Cheriyedath, M.Sc.Jul 14 2023

In a paper published in the journal Engineering Applications of Artificial Intelligence, researchers proposed a novel feature engineering methodology for high-frequency financial data using time series segmentation. The proposed approach enables the extraction and analysis of variables in intraday trends and enables the forecasting of response variables using artificial intelligence (AI) models. Specifically, the methodology focuses on estimating volatility, duration, and direction of future intraday trends using extreme gradient boosting (XGBoost) for multiclass classification.

*Study: Unleashing the Power of High-Frequency Financial Data: A Novel Methodology for AI-Driven Intraday Trend Forecasting. Image Credit: Kei studio / Shutterstock*

Background

The increasing importance of AI in finance has led to the widespread use of machine learning techniques for extracting knowledge from large financial datasets. However, the irregular intervals and multiple variables present in high-frequency data require AI methods that do not assume specific data distributions. The combination of high-frequency data analysis and AI-based forecasting has gained significant interest from scientific and private sectors, with a focus on accurate predictions and higher returns.

Extracting relevant features from financial market data is crucial for effective machine learning techniques. Previous research has primarily concentrated on fixed sampling schemes and horizons, overlooking the need to group volatility values based on intraday trend movements. To address this gap, a new problem is introduced, and a methodology is developed that tackles these challenges and contributes to the field.

Related work

While existing research in applying AI to high-frequency financial data has focused on feature engineering for fixed-composition data subsets, this study introduces a novel approach by constructing data subsets with variable composition. This addresses a new problem in AI using high-frequency financial data and specifically proposes a methodology for forecasting intraday volatility and directional movements. Previous studies have utilized machine learning models such as gradient descent boosting, random forest, support vector machines (SVM), and artificial neural networks for volatility forecasting. Directional forecasting, on the other hand, has been approached using log short-term memory (LSTM) networks, SVM, and other types of neural networks. The methodology presented in this work stands out for its unique approach to high-frequency directional forecasting.

Methodology

The proposed methodology involves extracting features from intraday trends and limiting order book states within these trends. A multistage feature engineering approach is employed to achieve this. The first step includes partitioning the transaction time series into segments with variable lengths based on the irregular durations of intraday trends. The second step involves synchronizing the order book states with trade times to obtain variables associated with each order book state. Finally, multiple conversions are applied to the variable set within each segment to derive the features mentioned, constituting the input for AI models.

Experimentation and application

The experimentation is conducted using a dataset consisting of trades and buy/sell orders from 20 assets listed on the Brazil Stock Exchange (B3). The dataset spans 206 trading days from July 2, 2018, to May 6, 2019, and undergoes cleaning procedures to remove errors and inconsistencies. The developed methodology is applied to extract features for the three response variables: duration, volatility, and direction.

The application of the methodology focuses on feeding AI models with classification problems. After segmenting the trade series and extracting features, the segments are classified using labeling based on the response variable. Embedding is then performed to build the set of samples from the extracted feature vectors. The samples are structured with lagged variables and a specific step. The XGBoost algorithm is chosen for modeling due to its speed and efficiency. The model's performance is evaluated using performance metrics such as confusion matrices, kappa, and F1-score.

Results

The performance metrics for each machine learning model used to forecast volatility, duration, and direction are analyzed. The best results are obtained in volatility estimation, followed by duration and direction. The analysis of variable importance highlights the significance of certain variables for each response variable. For volatility, variables related to volatility per unit of time and total squared log returns per second are found to be the most important. Duration is influenced by variables such as interval duration and durations between trades. In the case of direction, interval duration, return per second, squared log return per second, and value per second are identified as the most critical variables. Trade series variables are found to be more important compared to those from the limit order book.

Conclusion

In conclusion, a feature engineering methodology was developed in the study to extract features from high-frequency intervals and predict three response variables: volatility, duration, and direction. The methodology incorporates time series segmentation and the inclusion of order book data. The best performance is observed in volatility forecasting, followed by duration and direction.

The analysis of variable importance reveals the greater impact of trade variables compared to variables from the limit order book. The developed methodology can be applied to other high-frequency time series problems, although scalability considerations are necessary when dealing with larger volumes of observations to strike a balance between dimensionality reduction and information loss.

Journal reference:

Mantilla, P., & Dormido-Canto, S. (2023). A novel feature engineering approach for high-frequency financial data. Engineering Applications of Artificial Intelligence, 125, 106705. DOI: https://doi.org/10.1016/j.engappai.2023.106705, https://www.sciencedirect.com/science/article/abs/pii/S0952197623008898

Posted in: AI Research News

Comments (0)

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Lonka, Sampath. (2023, July 14). Unleashing the Power of High-Frequency Financial Data: A Novel Methodology for AI-Driven Intraday Trend Forecasting. AZoAi. Retrieved on July 13, 2025 from https://www.azoai.com/news/20230714/Unleashing-the-Power-of-High-Frequency-Financial-Data-A-Novel-Methodology-for-AI-Driven-Intraday-Trend-Forecasting.aspx.
MLA
Lonka, Sampath. "Unleashing the Power of High-Frequency Financial Data: A Novel Methodology for AI-Driven Intraday Trend Forecasting". AZoAi. 13 July 2025. <https://www.azoai.com/news/20230714/Unleashing-the-Power-of-High-Frequency-Financial-Data-A-Novel-Methodology-for-AI-Driven-Intraday-Trend-Forecasting.aspx>.
Chicago
Lonka, Sampath. "Unleashing the Power of High-Frequency Financial Data: A Novel Methodology for AI-Driven Intraday Trend Forecasting". AZoAi. https://www.azoai.com/news/20230714/Unleashing-the-Power-of-High-Frequency-Financial-Data-A-Novel-Methodology-for-AI-Driven-Intraday-Trend-Forecasting.aspx. (accessed July 13, 2025).
Harvard
Lonka, Sampath. 2023. Unleashing the Power of High-Frequency Financial Data: A Novel Methodology for AI-Driven Intraday Trend Forecasting. AZoAi, viewed 13 July 2025, https://www.azoai.com/news/20230714/Unleashing-the-Power-of-High-Frequency-Financial-Data-A-Novel-Methodology-for-AI-Driven-Intraday-Trend-Forecasting.aspx.