Skip to main navigation Skip to search Skip to main content

Abusive content detection in transliterated Bengali-English social media corpus

  • Old Dominion University

Research output: Contribution to book or proceedingConference articlepeer-review

40 Scopus citations

Abstract

Abusive text detection in low-resource languages such as Bengali is a challenging task due to the inadequacy of resources and tools. The ubiquity of transliterated Bengali comments in social media makes the task even more involved as monolingual approaches cannot capture them. Unfortunately, no transliterated Bengali corpus is publicly available yet for abusive content analysis. Therefore, in this paper, we introduce an annotated corpus of 3000 transliterated Bengali comments categorized into two classes, abusive and non-abusive, 1500 comments for each. For baseline evaluations, we employ several supervised machine learning (ML) and deep learning-based classifiers. We find support vector machine (SVM) classifier shows the highest efficacy for identifying abusive content. We make the annotated corpus publicly available for the researchers to aid abusive content detection in Bengali social media data.

Original languageEnglish
Title of host publicationComputational Approaches to Linguistic Code-Switching, CALCS 2021 - Proceedings of the 5th Workshop
EditorsSolorio Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
PublisherAssociation for Computational Linguistics (ACL)
Pages125-130
Number of pages6
ISBN (Electronic)9781954085459
DOIs
StatePublished - 2021
Externally publishedYes
Event5th Workshop on Computational Approaches to Linguistic Code-Switching, CALCS 2021 - Virtual, Online, Mexico
Duration: Jun 11 2021 → …

Publication series

NameComputational Approaches to Linguistic Code-Switching, CALCS 2021 - Proceedings of the 5th Workshop

Conference

Conference5th Workshop on Computational Approaches to Linguistic Code-Switching, CALCS 2021
Country/TerritoryMexico
CityVirtual, Online
Period06/11/21 → …

Scopus Subject Areas

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Abusive content detection in transliterated Bengali-English social media corpus'. Together they form a unique fingerprint.

Cite this