TY - GEN
T1 - A Lexicon for Profane and Obscene Text Identification in Bengali
AU - Sazzed, Salim
N1 - Publisher Copyright:
© 2021 Incoma Ltd. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Bengali is a low-resource language that lacks tools and resources for profane and obscene textual content detection. Until now, no lexicon exists for detecting obscenity in Bengali social media text. This study introduces a Bengali obscene lexicon consisting of over 200 Bengali terms that can be considered filthy, slang, profane or obscene. A semiautomatic methodology is presented for developing the obscene lexicon that leverages an obscene corpus, word embedding, and part-of-speech (POS) taggers. The developed lexicon achieves coverage of around 0.85 for obscene and profane content detection in an evaluation dataset. The experimental results imply that the developed lexicon is effective at identifying obscenity in Bengali social media content.
AB - Bengali is a low-resource language that lacks tools and resources for profane and obscene textual content detection. Until now, no lexicon exists for detecting obscenity in Bengali social media text. This study introduces a Bengali obscene lexicon consisting of over 200 Bengali terms that can be considered filthy, slang, profane or obscene. A semiautomatic methodology is presented for developing the obscene lexicon that leverages an obscene corpus, word embedding, and part-of-speech (POS) taggers. The developed lexicon achieves coverage of around 0.85 for obscene and profane content detection in an evaluation dataset. The experimental results imply that the developed lexicon is effective at identifying obscenity in Bengali social media content.
UR - https://www.scopus.com/pages/publications/85123633852
U2 - 10.26615/978-954-452-072-4_145
DO - 10.26615/978-954-452-072-4_145
M3 - Conference article
AN - SCOPUS:85123633852
T3 - International Conference Recent Advances in Natural Language Processing, RANLP
SP - 1289
EP - 1296
BT - International Conference Recent Advances in Natural Language Processing, RANLP 2021
A2 - Angelova, Galia
A2 - Kunilovskaya, Maria
A2 - Mitkov, Ruslan
A2 - Nikolova-Koleva, Ivelina
PB - Incoma Ltd
T2 - International Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021
Y2 - 1 September 2021 through 3 September 2021
ER -