Khulna University

An Empirical Study of Machine Learning-based Bangla News Classification Methods.

Author:- Shagoto Rahman, Sabia Khatun Mithila, Aysha Akther, Kazi Masudul Alam
Category:- Journal; Year:- 2021
Discipline:- Computer Science & Engineering Discipline
School:- Science, Engineering & Technology School

Abstract

Technological furtherance has enabled the attainability of enormous digitized texts. However vast unorganized text is futile. Text classification in this regard can generate momentary solutions in terms of knowledge mining in texts. Though the improvement in NLP has advocated these categorizing tasks in recent times, their availability in Bangla text is a concern. Bangla is a globally spoken language and that hints at the extraneous availability of texts that are being used over various sectors. TF-IDF has shown preeminent feature maps calculation in terms of texts as the expression of the significance of words in a corpus. Again supervised machine learning algorithms have done wonders in terms of text classification. Keeping that in mind, an empirical study to categorize Bangla news articles using the TF-IDF features with the assistance of machine learning algorithms namely SVM, LR (Logistic Regression), DT (Decision Tree), RF (Random Forest), and KNN has been espoused. Two approaches have been engendered for that purpose, one the baseline approach with unbalanced data and the other is the SMOTE (Synthetic Minority Oversampling Technique) approach, achieving the best performance with Random Forest in SMOTE approach with an accuracy of 95% and an Fl-Score of 95%