Predicting Salicylic Acid Solubility Using Machine Learning

In a paper published in the journal Digital Chemical Engineering, researchers aimed to predict the solubility of salicylic acid (SA) in 13 solvents using machine learning (ML) by applying six algorithms: neural network, linear regression, logistic regression, decision tree (DT), random forest (RF), and k-nearest neighbors (KNN), to 217 samples based on 15 variables (13 solvents, temperature, and pressure). The RF algorithm achieved the lowest total error, while KNNs had the highest error, highlighting the effectiveness of machine learning in accurately predicting solubility.

Study: Predicting Salicylic Acid Solubility Using Machine Learning.  Image Credit: Arpon Pongkasetkam/Shutterstock
Study: Predicting Salicylic Acid Solubility Using Machine Learning. Image Credit: Arpon Pongkasetkam/Shutterstock

Related Work

Previous research on SA, a natural phenolic compound used extensively and extensively in treating skin disorders, highlights its historical and medicinal importance. SA, known for its exfoliating and comedolytic properties, is used in conditions like acne and photodamage.

Salicylic acid's primary metabolite also contributes to aspirin's anti-inflammatory and cancer-preventive effects. Determining SA solubility in various solvents is crucial but traditionally costly and time-consuming. While thermodynamic methods have been used, they involve complex calculations and struggle with large data sets.

Predicting SA Solubility

This study employed six ML algorithms to predict salicylic acid solubility in various solvents: DT, KNN, linear regression, logistic regression, RF, and neural network. The analysts assigned data and labels to variables X and T, and the DT model was trained using the `fitrtree` function with default parameters. Similarly, the KNN model was trained using the `fitcknn` function with the number of neighbors set to 3. After assigning data to variables X and T, the team trained the linear regression model using the `fitlm` function.

Logistic regression was implemented using the `fitglm` function, specifying a binomial distribution and the logit link function. The `TreeBagger` function, with 90 trees, trained the RF model. Lastly, the neural network model was constructed with 10 hidden neurons, with the data split into training, testing, and validation sets (60%, 30%, and 10%, respectively). The training used the `train` function with default activation functions.

The performance of each algorithm was assessed by calculating the total error between predicted and experimental values. Experimental data were gathered from reliable sources, ensuring comprehensive coverage by including multi-component and single-component systems.

Special attention was given to temperature variations during data collection. The details of input and output variables, including the solvents, number of samples, temperature ranges, and solubility ranges for each solvent, were summarized in a comprehensive table.

The solvents studied included water, methanol, ethanol, ethyl acetate, PEG 300, 1,4-dioxane, and 1-propanol. Data for solvents such as ethanol and water were taken across multiple temperatures, and solubility was measured in mole fractions. The independent parameters in this study were the 13 different solvents, temperature, and pressure.

These parameters served as inputs to the machine learning models, which then predicted the solubility of salicylic acid. The dependent parameter was the predicted solubility of salicylic acid based on the given independent parameters. This approach facilitated a robust computational modeling framework, providing valuable insights for pharmaceutical applications.

SA Solubility Prediction

The experimental dataset included 217 samples across solvents such as methanol, water, ethanol, ethyl acetate, PEG 300, etc. ML methods offer greater flexibility than traditional thermodynamic methods, adapting to diverse problems and data sets. During implementation, a command was added to ensure all predicted values were positive, addressing any negative predictions by the algorithms.

The team evaluated the performance of each algorithm based on the total error between predicted and experimental values. The total error for the neural network model was 0.0096964, with high R values for training, testing, and validation sets indicating effective model performance.

The linear regression model achieved a total error of 0.015122, while logistic regression had a total error of 0.020409. The k-NN algorithm resulted in a total error of 0.024768, demonstrating acceptable prediction quality. The decision tree algorithm performed significantly with a total error of 0.0066577, and the RF algorithm had the lowest total error of 0.00016835, showcasing its superior predictive capability.

Overall, the ML approach to predicting salicylic acid solubility based on input variables like solvents, temperature, and pressure proved effective. Despite the extensive experimental data, all six algorithms showed desirable performance, with the RF algorithm exhibiting the highest accuracy and best agreement with experimental results. The results underscore the potential of ML models in enhancing pharmaceutical research and development by providing accurate solubility predictions.

Conclusion

In summary, solubility is crucial in drug development, impacting absorption and clinical response. This study explored the solubility of salicylic acid across 16 solvents under varying temperature and pressure conditions. Utilizing ML due to the extensive experimental data, six algorithms were employed: linear regression, logistic regression, neural network, DT, RF, and KNN.

Overall, all algorithms performed well predicting solubility, with the RF algorithm yielding the best results. These findings underscored the significance of ML in enhancing the crystallization process of salicylic acid production in the pharmaceutical industry.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, June 04). Predicting Salicylic Acid Solubility Using Machine Learning. AZoAi. Retrieved on November 12, 2024 from https://www.azoai.com/news/20240604/Predicting-Salicylic-Acid-Solubility-Using-Machine-Learning.aspx.

  • MLA

    Chandrasekar, Silpaja. "Predicting Salicylic Acid Solubility Using Machine Learning". AZoAi. 12 November 2024. <https://www.azoai.com/news/20240604/Predicting-Salicylic-Acid-Solubility-Using-Machine-Learning.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Predicting Salicylic Acid Solubility Using Machine Learning". AZoAi. https://www.azoai.com/news/20240604/Predicting-Salicylic-Acid-Solubility-Using-Machine-Learning.aspx. (accessed November 12, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Predicting Salicylic Acid Solubility Using Machine Learning. AZoAi, viewed 12 November 2024, https://www.azoai.com/news/20240604/Predicting-Salicylic-Acid-Solubility-Using-Machine-Learning.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Predicts Recovery in Endurance Athletes But Requires Personalized Strategies