TY - GEN
T1 - Abusive content detection in transliterated Bengali-English social media corpus
AU - Sazzed, Salim
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - Abusive text detection in low-resource languages such as Bengali is a challenging task due to the inadequacy of resources and tools. The ubiquity of transliterated Bengali comments in social media makes the task even more involved as monolingual approaches cannot capture them. Unfortunately, no transliterated Bengali corpus is publicly available yet for abusive content analysis. Therefore, in this paper, we introduce an annotated corpus of 3000 transliterated Bengali comments categorized into two classes, abusive and non-abusive, 1500 comments for each. For baseline evaluations, we employ several supervised machine learning (ML) and deep learning-based classifiers. We find support vector machine (SVM) classifier shows the highest efficacy for identifying abusive content. We make the annotated corpus publicly available for the researchers to aid abusive content detection in Bengali social media data.
AB - Abusive text detection in low-resource languages such as Bengali is a challenging task due to the inadequacy of resources and tools. The ubiquity of transliterated Bengali comments in social media makes the task even more involved as monolingual approaches cannot capture them. Unfortunately, no transliterated Bengali corpus is publicly available yet for abusive content analysis. Therefore, in this paper, we introduce an annotated corpus of 3000 transliterated Bengali comments categorized into two classes, abusive and non-abusive, 1500 comments for each. For baseline evaluations, we employ several supervised machine learning (ML) and deep learning-based classifiers. We find support vector machine (SVM) classifier shows the highest efficacy for identifying abusive content. We make the annotated corpus publicly available for the researchers to aid abusive content detection in Bengali social media data.
UR - https://www.scopus.com/pages/publications/85119440378
U2 - 10.26615/978-954-452-056-4_016
DO - 10.26615/978-954-452-056-4_016
M3 - Conference article
AN - SCOPUS:85119440378
T3 - Computational Approaches to Linguistic Code-Switching, CALCS 2021 - Proceedings of the 5th Workshop
SP - 125
EP - 130
BT - Computational Approaches to Linguistic Code-Switching, CALCS 2021 - Proceedings of the 5th Workshop
A2 - Solorio, Solorio
A2 - Chen, Shuguang
A2 - Black, Alan W.
A2 - Diab, Mona
A2 - Sitaram, Sunayana
A2 - Soto, Victor
A2 - Yilmaz, Emre
A2 - Srinivasan, Anirudh
PB - Association for Computational Linguistics (ACL)
T2 - 5th Workshop on Computational Approaches to Linguistic Code-Switching, CALCS 2021
Y2 - 11 June 2021
ER -