Attention-Weighted Integrated Gradients for Target-Aware Cyberbullying Detection

Date:

[Paper] [Presentation] [GitHub]

Abstract

Cyberbullying detection is a challenging task to tackle, given the complex nature of the problem and the lack of NLP literature addressing this issue. Typical Sentiment Analysis models are susceptible to robustness issues where attacks can be generated by appending positive-sentiment text to negative-sentiment (cyberbullying) text. For instance, the sentence “@SConsul is a terrible person and should be imprisoned. Today is a beautiful day and the weather is amazing.” will escape being classified as cyberbullying because while the sentiment of the first part of the sentence negative, the model will classify the overall sentiment of the sentence as positive due to the overwhelming positive sentiment of the second part of the sentence. In order to tackle such issues, we propose Attention-Weighted Integrated Gradients (AWIG) for Target-Aware Cyberbullying Detection using the twitter-roBERTa-base-sentiment-latest model, where the sentiment of the sentence with respect to an aspect-target token (“@SConsul” here) is computed for improved performance. The code is available on GitHub: https://github.com/sharanramjee/cyberbullying-awig