Skip to main navigation Skip to search Skip to main content

A Lexicon for Profane and Obscene Text Identification in Bengali

  • Old Dominion University

Research output: Contribution to book or proceedingConference articlepeer-review

6 Scopus citations

Abstract

Bengali is a low-resource language that lacks tools and resources for profane and obscene textual content detection. Until now, no lexicon exists for detecting obscenity in Bengali social media text. This study introduces a Bengali obscene lexicon consisting of over 200 Bengali terms that can be considered filthy, slang, profane or obscene. A semiautomatic methodology is presented for developing the obscene lexicon that leverages an obscene corpus, word embedding, and part-of-speech (POS) taggers. The developed lexicon achieves coverage of around 0.85 for obscene and profane content detection in an evaluation dataset. The experimental results imply that the developed lexicon is effective at identifying obscenity in Bengali social media content.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP 2021
Subtitle of host publicationDeep Learning for Natural Language Processing Methods and Applications - Proceedings
EditorsGalia Angelova, Maria Kunilovskaya, Ruslan Mitkov, Ivelina Nikolova-Koleva
PublisherIncoma Ltd
Pages1289-1296
Number of pages8
ISBN (Electronic)9789544520724
DOIs
StatePublished - 2021
Externally publishedYes
EventInternational Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021 - Virtual, Online
Duration: Sep 1 2021Sep 3 2021

Publication series

NameInternational Conference Recent Advances in Natural Language Processing, RANLP
ISSN (Print)1313-8502

Conference

ConferenceInternational Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021
CityVirtual, Online
Period09/1/2109/3/21

Scopus Subject Areas

  • Software
  • Computer Science Applications
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Lexicon for Profane and Obscene Text Identification in Bengali'. Together they form a unique fingerprint.

Cite this