International Journal on Data Science and Technology
Volume 6, Issue 1, March 2020, Pages: 23-36
Received: Dec. 8, 2019;
Accepted: Dec. 26, 2019;
Published: Jan. 8, 2020
Views 487 Downloads 120
Rawan Almutlaq, Computer and Information Sciences College, King Saud University, Riyadh, Saudi Arabia
Alaaeldin Hafez, Computer and Information Sciences College, King Saud University, Riyadh, Saudi Arabia
The internet has a considerable effect on social relations and connections among people. Social networking platforms have been an enormous medium for establishing relations and connections among different people all over the world. People, organizations and companies use these platforms to communicate and interact with their communities and audience. These platforms have made it easy for people to share information, create content, and communicate and connect with others online; however, online interaction and communication among people have resulted in the creation of many problems. Malicious contents can easily be shared and populated to reach a wider audience than by using the traditional sharing methods. Detection mechanism is a growing area of research that can detect any inappropriateness of data that is more sensitive to malicious behavior. The detection mechanism needs to be involved in the analysis of the abusing messages posted on the Twitter account of King Saud University (KSU). Text mining is one approach that can be used to detect such malicious or abusing messages. Text mining techniques provide the means to perform data classification where messages can be classified into malicious and non-malicious messages. In addition, Sentiment Analysis is used to identify user tendencies, trends, and opinions by classifying a text into positive, negative and neutral. In this paper, we aim to provide a literature review to investigate the current techniques. The study also addresses the detection of malicious messages which identifies the behavior of malicious and abusive messages. Based on the extensive review of the current techniques, our focus is on the analysis of Arabic and English tweets on KSU’s Twitter account. First, data was collected from Twitter. This was followed by the preprocessing phase. Then, a corpus was produced applying a machine learning based approach by using Naive Bayes and Random Forest Classifier algorithms. Subsequently, the study focused on comparing the accuracy and performance of the Naive Bayes classifier with Random Forest Classifier algorithms in analyzing Arabic and English texts. In order to ensure reaching accurate results, Arabic and English tweets were analyzed.
Detection Mechanism for Malicious Messages on KSU Student Social Network, International Journal on Data Science and Technology.
Vol. 6, No. 1,
2020, pp. 23-36.
Chapin, J. (2016). Adolescents and cyber bullying: The precaution adoption process model. In Education and information technologies, 21 (4), 719-728.
Twitter Inc. (2017). Selected Company Metrics and Financials. Retrieved from https://investor.twitterinc.com/static-files/73896e27-c138-4519-b63f-2cd4b80b568c (Last Accessed, 11 Nov 2018).
Omnicore. (2018, Oct). Twitter by the Numbers: Stats, Demographics & Fun Facts. Retrieved from https://www.omnicoreagency.com/twitter-statistics/ (Last Accessed, 11 Nov 2018).
Smith, K. (2017, Dec). 45 Incredible and Interesting Twitter Statistics. Brandwatch. Retrieved from https://www.brandwatch.com/blog/44-twitter-stats/ (Last Accessed, 11 Nov 2018).
Moss, K. (2017). Results of a Survey of Social Media Use in NYS Libraries. JLAMS, 13 (1), 2.
Pew Internet. (2018, Feb). Social Media Fact Sheet. Retrieved from http://www.pewinternet.org/fact-sheet/social-media/ (Last Accessed, 11 Nov 2018).
Edupuganti, V. (2017). Harassment Detection on Twitter using Conversations (Doctoral dissertation, Wright State University).
Vandersmissen, B. (2012). Automated detection of offensive language behavior on social networking sites. In IEEE Transaction.
Zephoria. (2018). Top 15 valuable Facebook statistics. Retrieved from https://zephoria.com/top-15-valuable-facebook-statistics/. (Last Accessed, 29 Nov 2018).
Internetlivestats. (2018). Twitter Usage Statistics - Internet Live Stats. Retrieved from http://www.internetlivestats.com/twitterstatistics/. (Last Accessed, 29 Nov 2018).
Hinduja, S., & Patchin, J. W. (2010). Bullying, cyberbullying, and suicide. In Archives of suicide research, 14 (3), 206-221.
Pew Internet. (2017, Jul). Online Harassment 2017. Retrieved from http://www.pewinternet.org/2017/07/11/online-harassment-2017/ (Last Accessed, 11 Nov 2018).
Lenhart, A., Ybarra, M., Zickuhr, K., & Price-Feeney, M. (2016). In Online harassment, digital abuse, and cyberstalking in America. Data and Society Research Institute.
Cornaz, N. (2019). An analysis of the# AidToo movement on Twitter: What impacts can a hashtag achieve on sexual exploitation and abuse in the aid sector?
White, G., Wimmer, H., Rebman, C., & Nwankwo, C. (2018). Using Twitter Sentiment Analysis to Analyze Self-Sentiment of the POTUS. In Proceedings of the Conference on Information Systems Applied Research ISSN (Vol. 2167, p. 1508).
T. Zerrouki, Tashaphyne, Arabic light stemmer, retrieved from https://pypi.python.org/pypi/Tashaphyne/0.2.
Bharti, S. K., Babu, K. S., & Jena, S. K. (2015, August). Parsing-based sarcasm sentiment recognition in twitter data. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 1373-1380). ACM.
Garg, P., & Bassi, V. G. (2016). Sentiment analysis of Twitter data using NLTK in python (Doctoral dissertation).
Karim, M., & Rahman, R. M. (2013). Decision tree and naive bayes algorithm for classification and generation of actionable knowledge for direct marketing. In Journal of Software Engineering and Applications, 6 (04), 196.
Montejo-Ráez, A., Martínez-Cámara, E., Martín-Valdivia, M. T., & Ureña-López, L. A. (2014). Ranked wordnet graph for sentiment polarity classification in twitter. In Computer Speech & Language, 28 (1), 93-107.
Altowayan, A. A., & Tao, L. (2016, December). Word embeddings for Arabic sentiment analysis. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 3820-3825). IEEE.
Gulli, A., & Pal, S. (2017). Deep Learning with Keras. Packt Publishing Ltd.
Pla, F., & Hurtado, L. F. (2017). Language identification of multilingual posts from Twitter: a case study. In Knowledge and Information Systems, 51 (3), 965-989.
Perez, F., & Granger, B. E. (2015). Project Jupyter: Computational narratives as the engine of collaborative data science. In Retrieved September, 11 (207), 108.
Nabil, M., Atiya, A. F., & Aly, M. (2015, April). New approaches for extracting Arabic keyphrases. In 2015 First International Conference on Arabic Computational Linguistics (ACLing) (pp. 133-137). IEEE.