Machine Learning-based System for Analyzing Sexual Harassment in Middle Eastern Literature

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Apr 17 2024

In a paper published in the journal Humanities and Social Sciences Communications, scholars tackled the growing incidence of harassment in Middle Eastern nations by analyzing its portrayal in local literature using a machine learning (ML) system. This system extracted instances of sexual harassment from Middle Eastern novels, utilizing a dataset comprising 12 literary works.

*Flow chart of text pre-processing. Image Credit: https://www.nature.com/articles/s41599-024-02908-7*

Combining lexicon-based sentiment and emotion analysis with sophisticated deep learning architectures like long short-term memory (LSTM) and gated recurrent unit (GRU), the researchers effectively categorized physical and non-physical harassment occurrences. Their inquiry underscores the pervasive nature of this problem in the region, stressing the urgent need for thorough research and intervention to address it comprehensively.

Related Work

Past work has extensively explored the rising prevalence of sexual harassment in Middle Eastern countries, utilizing both empirical and theoretical approaches. However, analyzing sexual harassment in literary texts poses significant challenges due to human cognitive limitations and potential biases. Academics have recognized the significance of computational methodologies, including hybrid strategies that merge manual annotation with computational tools, to surmount these hurdles. Text analysis, particularly utilizing natural language processing (NLP) methodologies, has arisen as a promising path for thoroughly examining extensive textual datasets.

Sexual Harassment Models: Overview

This study introduces two models designed to tackle sexual harassment issues. The first model, an ML approach, effectively classifies various types of sexual harassment, while the second model employs deep learning techniques to classify sentiment and emotion. To ensure the accuracy of their models, researchers meticulously executed a comprehensive text preprocessing phase, and then proceeded with data preparation, modeling, evaluation, and visualization.

The study's framework provides a comprehensive overview of the process, offering a valuable reference for future research in sexual harassment classification. The data source for this study comprises 12 Anglophone novels set in the Middle East. Researchers subjected these novels to extensive text preparation techniques, including format conversion, sentence tokenization, expanding contractions, part-of-speech tagging, word tokenization, lowercase conversion, stop word removal, and lemmatization.

The resulting cleaned text data facilitated further analysis with detailed sentences, words, and vocabulary summaries. Following text preparation, the study utilized rule-based detection methods to identify sentences containing sexual harassment-related words. Manual interpretation was crucial in accurately determining which sentences truly depicted instances of sexual harassment. Subsequently, the sentences were labeled based on the type of sexual harassment, leading to the identification of physical and non-physical instances—the distribution of these sentences underscores the prevalence of sexual harassment in the analyzed literature.

For the second model, which focuses on sentiment and emotion analysis, a dataset of instances labeled as 'physical' or 'non-physical' was prepared. The team utilized feature engineering techniques to reduce dimensionality, integrating methods like term frequency/inverse document frequency (TFIDF) and principal component analysis (PCA). They then utilized six machine learning algorithms for building text classification models, refining hyperparameters through grid search cross-validation (GridSearchCV) to enhance effectiveness.

The sentiment and emotion analysis phase utilized lexicon-based techniques to explore and classify the sentiment characteristics underlying sexual harassment. A natural language toolkit (NLTK) pre-trained sentiment analyzer and text2emotion Python package were employed for sentiment and emotion classification, respectively.

Finally, the researchers constructed a deep learning-based ensemble model comprising LSTM-GRU architecture to enhance sentiment analysis and emotion detection. This model's architecture and training parameters are detailed to ensure reproducibility and optimal performance. This methodological approach enables a comprehensive understanding of sexual harassment instances in Anglophone literature, offering valuable insights for future research and intervention efforts.

Text Analysis Insights

In the text classification, the study aimed to discern various forms of sexual harassment within the corpus, distinguishing between physical and non-physical instances. It involved the construction of two distinct text classification models tailored to these objectives.

Utilizing a dataset extracted from anglophone novels set in the Middle East, researchers deployed six ML algorithms to build these models, with logistic regression emerging as the top performer with an accuracy of 75.8%. Despite challenges stemming from the relatively small training set of 70 sentences, the study underscores the importance of model training. It suggests avenues for future improvement, such as increasing sample size and diversifying training data sources.

In sentiment and emotion analysis, researchers actively utilized lexicon-based techniques to explore the underlying emotional nuances of instances of sexual harassment. The study revealed that the majority of sexual harassment instances were associated with negative sentiment, with physical harassment eliciting a more intense negative response compared to non-physical forms.

Similarly, fear and surprise emotions were predominant, particularly in cases of physical harassment. The study's insights shed light on the emotional dimensions of sexual harassment instances, underscoring the need for nuanced understanding and intervention strategies. Using LSTM-GRU deep learning architectures, sentiment, and emotion were precisely classified, achieving 84.5% and 80.8%, respectively, on a dataset comprising around 60,000 labeled sentences.

Despite challenges like potential bias in data labeling, the study demonstrates the efficacy of deep learning in nuanced sentiment analysis. Overall, it underscores the importance of robust classification models and the need for context-aware training to enhance performance across diverse cultural contexts and social interactions.

Conclusion

To sum up, this study developed a framework for identifying and analyzing sexual harassment instances in Middle Eastern Anglophone literature. Utilizing logistic regression and LSTM-GRU recurrent neural network (RNN) models, it achieved 75.8% and 84.5% accuracy, respectively, in classifying harassment types and sentiments.

However, the framework may exhibit subjective biases and require manual validation. Enhancements such as incorporating boosting techniques, emotion analysis, and expanding training datasets can improve its robustness and accuracy, facilitating deeper insights into sexual harassment contexts and tones in textual data.

Journal reference:

Low, H. Q., Keikhosrokiani, P., & Pourya Asl, M. (2024). Decoding violence against women: analyzing harassment in Middle Eastern literature with machine learning and sentiment analysis. Humanities and Social Sciences Communications, 11:1, 1–18. https://doi.org/10.1057/s41599-024-02908-7, https://www.nature.com/articles/s41599-024-02908-7

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2024, April 17). Machine Learning-based System for Analyzing Sexual Harassment in Middle Eastern Literature. AZoAi. Retrieved on December 20, 2025 from https://www.azoai.com/news/20240417/Machine-Learning-based-System-for-Analyzing-Sexual-Harassment-in-Middle-Eastern-Literature.aspx.
MLA
Chandrasekar, Silpaja. "Machine Learning-based System for Analyzing Sexual Harassment in Middle Eastern Literature". AZoAi. 20 December 2025. <https://www.azoai.com/news/20240417/Machine-Learning-based-System-for-Analyzing-Sexual-Harassment-in-Middle-Eastern-Literature.aspx>.
Chicago
Chandrasekar, Silpaja. "Machine Learning-based System for Analyzing Sexual Harassment in Middle Eastern Literature". AZoAi. https://www.azoai.com/news/20240417/Machine-Learning-based-System-for-Analyzing-Sexual-Harassment-in-Middle-Eastern-Literature.aspx. (accessed December 20, 2025).
Harvard
Chandrasekar, Silpaja. 2024. Machine Learning-based System for Analyzing Sexual Harassment in Middle Eastern Literature. AZoAi, viewed 20 December 2025, https://www.azoai.com/news/20240417/Machine-Learning-based-System-for-Analyzing-Sexual-Harassment-in-Middle-Eastern-Literature.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.