An Empirical Study to Detect Cyberbullying with TF-IDF and Machine Learning Algorithms
Category:- Journal; Year:- 2021
Discipline:- Computer Science & Engineering Discipline
School:- Science, Engineering & Technology School
Abstract
The headway of technology has enabled various aptitudes like
the internet and social media platforms evolving within the internet. Apart
from the offerings, there is also the dark side named cyberbullying which is
defined as the abusive and personal attacking regarding social interactions on
various social platforms. The aftermath of this heinous job is harrowing and
tormentous. One momentary solution can evolve the detection of texts regarding
cyberbullying. TF-IDF has shown preeminent textual feature map calculation over
the years to solve the ambiguity of cross-domain words. Moreover, supervised
machine learning algorithms allude to wondrous performance in classifying
textual data. Bearing that in mind, a combination of machine learning
algorithms namely SVM (Support Vector Machine), Logistic Regression, Decision
Tree, Random Forest and MNB (Multinomial Naïve Bayes) has been advocated with
TF-IDF features to classify cyberbullying from text data. Two different
approaches have been engendered namely the baseline approach that proceeds with
imbalanced data and SMOTE (Synthetic Minority Over-sampling Technique) approach
for balancing the data where Random Forest with SMOTE approach has achieved an
accuracy of 89% with an f1-score of 89% in this empirical study.