Hate Speech Detection using Deep Learning and Hybrid Features
Keywords:
Hybrid Features, Deep Learning, Hate Speech, Cyber-bullying, Deep Learning FeaturesAbstract
Automating hate speech or inappropriate text detection in social media and other internet platform is gaining a lot of interest and becoming a valuable research topic for both industry and academia in recent years. It is more important for applications to identify the disruptive contents, understand sentiment analysis, identify cyber bullying, detect flames, threats, hatred towards people or particular community or group etc. Text classification is very challenging task due to the nature and complexities with languages, especially its context, micro words, emojis, typo error and the hidden sarcasm present in the text.
We have collected and classified tweets into 3 categories as sexism, racism and none. In our proposed work, we have combined features learned from deep learning methods with the basic features like word n-grams and tweets specific syntactic features to form hybrid feature set and also focused on improving preprocessing steps to reduce the number of missing embeddings and increase the vocabulary for efficient feature learning. We have experimented with different neural networks for feature learning. Our work delivers hybrid features and appropriate preprocessing techniques required for efficient classification of the standard dataset of 16k annotated tweets related to hate speech. The combination of LSTM (Long Short Term Memory) trained on Random Embeddings for deep learning features extraction and Logistic Regression as classifier with the hybrid features is found to be the best model and outperforms the state-of-the-art methods reported in literature by substantial improvement in F1 score.
Downloads
Published
How to Cite
Issue
Section
Copyright (c) 2021 Iberamia & The Authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Open Access publishing.
Lic. under Creative Commons CC-BY-NC
Inteligencia Artificial (Ed. IBERAMIA)
ISSN: 1988-3064 (on line).
(C) IBERAMIA & The Authors