Khulna University

An Empirical Study to Detect Cyberbullying with TF-IDF and Machine Learning Algorithms

Author:- Shagoto Rahman, Kamrul Hasan Talukder, Sabia Khatun Mithila
Category:- Journal; Year:- 2021
Discipline:- Computer Science & Engineering Discipline
School:- Science, Engineering & Technology School

Abstract

The headway of technology has enabled various aptitudes like the internet and social media platforms evolving within the internet. Apart from the offerings, there is also the dark side named cyberbullying which is defined as the abusive and personal attacking regarding social interactions on various social platforms. The aftermath of this heinous job is harrowing and tormentous. One momentary solution can evolve the detection of texts regarding cyberbullying. TF-IDF has shown preeminent textual feature map calculation over the years to solve the ambiguity of cross-domain words. Moreover, supervised machine learning algorithms allude to wondrous performance in classifying textual data. Bearing that in mind, a combination of machine learning algorithms namely SVM (Support Vector Machine), Logistic Regression, Decision Tree, Random Forest and MNB (Multinomial Naïve Bayes) has been advocated with TF-IDF features to classify cyberbullying from text data. Two different approaches have been engendered namely the baseline approach that proceeds with imbalanced data and SMOTE (Synthetic Minority Over-sampling Technique) approach for balancing the data where Random Forest with SMOTE approach has achieved an accuracy of 89% with an f1-score of 89% in this empirical study.