Dual-Language Sentiment Analysis: A Comprehensive Evaluating SVM, Logistic Regression, XGBoost, and Decision Tree Using TF-IDF On Arabic and English Dataset

Authors

  • Hawraa Ali Taher Department of Computer Science, Faculty of Education for Girls, University of Kufa, IRAQ

DOI:

https://doi.org/10.31185/wjps.549

Keywords:

Sentiment Analysis, TF-IDF, Arabic Language, English Language, Machine Learning Algorithms

Abstract

Sentiment analysis (SA) is a growing area of study that straddles a number of disciplines, including machine learning, data mining, and natural language processing. It is focused on the automatic extraction of viewpoints presented in a certain text. Many studies have been conducted in the area of sentiment analysis because to its broad uses, particularly on texts in English, whereas other languages like Arabic have gotten less attention. The Arabic language presents several difficulties, such as its rich morphology and the difficulty of tracing words back to their original roots. Arabic comments have been analyzed and categorized into good and negative attitudes using a framework. With the aim of evaluating any tweet, opinion, purpose, or reputation, such as a university, company, mobile, and others, the research analyzes the comments made by users of the social networking site Twitter. It does this by using classification technology and machine learning, which are among the fundamental tasks of the data mining process used in the larger process, which is to explore knowledge . This search helps the user to access the evaluation of other users through their tweets and comments on the social networking site for an opinion immediately and automatically, and then the process of uploading and evaluating the opinions using appropriate algorithms for this purpose as(Decision Tree classifier DTC , XGboost, Logistic Regression LR, Support Vector Machine SVM )with Term Frequency-Inverse Document Frequency TF_IDF

References

. Duwairi, R. M., Marji, R., Sha'ban, N., & Rushaidat, S. (2014, April). Sentiment analysis in arabic tweets. In 2014 5th international conference on information and communication systems (ICICS) (pp. 1-6). IEEE.

. Lamsal, R. (2021). Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence, 51(5), 2790-2804.

. Al-Twairesh, N., Al-Khalifa, H., Alsalman, A., & Al-Ohali, Y. (2018). Sentiment analysis of arabic tweets: Feature engineering and a hybrid approach. arXiv preprint arXiv:1805.08533.

. Tubishat, M., Abushariah, M. A., Idris, N., & Aljarah, I. (2019). Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence, 49(5), 1688-1707.

. Ombabi, A. H., Ouarda, W., & Alimi, A. M. (2020). Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Social Network Analysis and Mining, 10(1), 1-13.

. Jihad, A. A., & Abdalkafor, A. S. (2019). A Framework for Sentiment Analysis in Arabic Text. Indonesian Journal of Electrical Engineering and Computer Science, 16(3), 1482-1489.

. Alayba, A. M., Palade, V., England, M., & Iqbal, R. (2018, August). A combined CNN and LSTM model for arabic sentiment analysis. In International cross-domain conference for machine learning and knowledge extraction (pp. 179-191). Springer, Cham.

. Attia, M. (2007, June). Arabic tokenization system. In Proceedings of the 2007 workshop on computational approaches to semitic languages: Common issues and resources (pp. 65-72).

. Taher, H. A., Abdulameer, M. H., & Mahdi, B. (2022). Information Retrieval Scheme Via Similarity Technique. International Journal on Technical and Physical Problems of Engineering (IJTPE), (51), 375-379.‏

. Larkey, L. S., Ballesteros, L., & Connell, M. E. (2007). Light stemming for Arabic information retrieval. In Arabic computational morphology (pp. 221-243). Springer, Dordrecht.

. Liu, C. Z., Sheng, Y. X., Wei, Z. Q., & Yang, Y. Q. (2018, August). Research of text classification based on improved TF-IDF algorithm. In 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE) (pp. 218-222). IEEE.

. Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of TF* IDF, LSI and multi-words for text classification. Expert systems with applications, 38(3), 2758-2765.

. Oommen, T., Misra, D., Twarakavi, N. K., Prakash, A., Sahoo, B., & Bandopadhyay, S. (2008). An objective analysis of support vector machine based classification for remote sensing. Mathematical geosciences, 40(4), 409-424.

. Auria, L., & Moro, R. A. (2008). Support vector machines (SVM) as a technique for solvency analysis.

. Shah, K., Patel, H., Sanghvi, D., & Shah, M. (2020). A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Research, 5(1), 1-16.

. Samih, A., Ghadi, A., & Fennan, A. (2023). Enhanced sentiment analysis based on improved word embeddings and XGboost. International Journal of Electrical & Computer Engineering (2088-8708), 13(2).

. Afifah, K., Yulita, I. N., & Sarathan, I. (2021, October). Sentiment analysis on telemedicine app reviews using xgboost classifier. In 2021 International Conference on Artificial Intelligence and Big Data Analytics (pp. 22-27). IEEE.

. Nsaif, A. A., & Abd, D. H. (2022, July). Sentiment analysis of political post classification based on XGBoost. In Proceedings of International Conference on Computing and Communication Networks: ICCCN 2021 (pp. 177-188). Singapore: Springer Nature Singapore.

Downloads

Published

2024-12-30

Issue

Section

Computer

How to Cite

Ali Taher, H. (2024). Dual-Language Sentiment Analysis: A Comprehensive Evaluating SVM, Logistic Regression, XGBoost, and Decision Tree Using TF-IDF On Arabic and English Dataset. Wasit Journal for Pure Sciences , 3(4), 59-69. https://doi.org/10.31185/wjps.549