Novel Approach for Generating Hybrid Features Set to Effectively Identify Hate Speech

Authors

  • Shruthi P Department Of Computer Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India
  • Dr. Anil Kumar K M Department Of Computer Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India

DOI:

https://doi.org/10.4114/intartif.vol23iss66pp97-111

Keywords:

Hybrid Features, Deep Learning, Hate Speech, Cyber-bullying, Deep Learning Features

Abstract

Automating hate speech or inappropriate text detection in social media and other internet platforms is
gaining a lot of interest and becoming a valuable research topic for both industry and academia in recent years. It
is more important for applications to identify the disruptive contents, understand sentiment analysis, identify cyber
bullying, detect flames, threats, hatred towards people or particular communities or groups etc. Text classification
is a very challenging task due to the nature and complexities with languages, especially its context, micro words,
emojis, typo error and sarcasm present in the text. In this paper, we have proposed a model with a novel approach
for generating hybrid features for an effective feature representation to classify hate speech. We have combined
features learned from deep learning methods with the semantic features like word n-grams and tweets specific
syntactic features to form hybrid feature sets. We have also improvised preprocessing steps to reduce the number
of missing embeddings to increase the vocabulary for efficient feature learning. We have experimented with the
various neural networks for feature learning and machine learning models with hybrid features for classification.
Our work delivers hybrid features and appropriate preprocessing techniques for an efficient classification of the
standard dataset of 16k annotated hate speech tweets. The combination of Long Short Term Memory (LSTM)
trained on Random Embeddings for deep learning features extraction and Logistic Regression (LR) as a classifier
with the hybrid features is found to be the best model and it outperforms the state of the art reported in the
literature.

Downloads

Download data is not yet available.

Author Biographies

Shruthi P, Department Of Computer Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India

         Shruthi P is currently pursuing her M.Tech (Master of Technology) in Computer Engineering, Department of  Computer Science & Engineering, JSS Science and Technology University, Mysuru, Karnataka, India.  Prior to this, she had worked as a senior software developer  in  Schneider Electric R&D, Bengaluru, India for 5+ years. She has mainly worked on developing web applications using web technologies like ASP.NET MVC, Angular,  Nodejs, HTML5 etc. Her research interest includes Web Mining, Text Mining, Sentiment Analysis. She has worked on Machine Learning and Deep Learning Techniques using python for Sentiment (hate or abusive) Analysis, Text Classification  under the guidance of her professor  Dr. Anil Kumar K M during her tenure in MTech.

Dr. Anil Kumar K M, Department Of Computer Science and Engineering, JSS Science and Technology University, Mysuru, Karnataka, India

        

       Dr. Anil Kumar K.M is currently working as Associate Professor, Department of Computer Science & Engineering, JSS Science and Technology University, Mysuru, Karnataka, India. He did his post doc from Deakin University under Professor Jemal Abawajy and Ph.D. from University of Mysore under the supervision of Prof. Suresha, Chairman, DOS in Computer Science. He has teaching experience of 20 years and research experience of  12 years. His research interest includes Text mining, Sentiment Analysis, Data mining, Opinion mining, Web Mining, Data Analytics, Computer Networks, Cyber Security. He has received 5 grants from different Government and Private funding agencies for Research & Development. He has Published nearly 39 Research paper in National and International proceedings.  

Downloads

Published

2020-12-28

How to Cite

Shruthi P, & Dr. Anil Kumar K M. (2020). Novel Approach for Generating Hybrid Features Set to Effectively Identify Hate Speech. Inteligencia Artificial, 23(66), 97–111. https://doi.org/10.4114/intartif.vol23iss66pp97-111