Hate Comment Detection On Twitter Using Long Short Term Memory (LSTM) With Genetic Algorithm (GA)

Authors

  • Dea Alfatihah Nindya Erlani Fakultas Informatika, Telkom University, Indonesia
  • Erwin Budi Setiawan Fakultas Informatika, Telkom University, Indonesia

DOI:

https://doi.org/10.59188/eduvest.v4i11.1758

Keywords:

Hate speech, Twitter, Long Short Term Memory (LSTM), Genetic Algorithm (GA), TF-IDF, GloVe, Detection

Abstract

In the era of social media like today, one social media that is currently quite popular is Twitter. This study explores the use of the Long Short Term Memory (LSTM) method optimized with the Genetic Algorithm (GA) to detect hate speech in Twitter data in Indonesia. We use TF-IDF and GloVe feature extraction techniques to produce effective word vector representations in natural language processing. This study also introduces feature expansion and similarity corpus construction to improve the performance of the LSTM classification model. Evaluation is carried out through a confusion matrix to measure accuracy, precision, recall, and F1 score. The results show that the LSTM model with TF-IDF and GloVe feature extraction achieves the best performance with an accuracy of up to 92.91%. We also found that the combination of Unigram + Bigram + Trigram, max feature 10000, and Glove corpus with Top 20 similarity gave optimal results. In addition, parameter optimization using genetic algorithms has been shown to improve accuracy and F1-Score. The resulting LSTM model is able to classify test data with high accuracy, which has the potential to help in the detection and handling of hate speech on social media, as well as improving the model's ability to identify and understand text content in the Indonesian language context.

References

Bouktif, S., Fiaz, A., Ouni, A., & Serhani, M. A. (2018). Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies, 11(7), 1636.

Chui, K. T., Gupta, B. B., & Vasant, P. (2021). A genetic algorithm optimized RNN-LSTM model for remaining useful life prediction of turbofan engine. Electronics, 10(3), 285.

Gautam, V. (2021). A Real Time Analysis of Offensive Texts to Prevent Cyberbullying. Databases Theory and Applications: 32nd Australasian Database Conference, ADC 2021, Dunedin, New Zealand, January 29–February 5, 2021, Proceedings, 12610, 152.

Imaduddin, H., Kusumaningtias, L. A., & A’la, F. Y. (2023). Application of LSTM and GloVe Word Embedding for Hate Speech Detection in Indonesian Twitter Data. Ingénierie Des Systèmes d’Information, 28(4).

Onan, A. (2021). Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurrency and Computation: Practice and Experience, 33(23), e5909.

Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543.

Sameer, M. (2022). Hate Speech Detection in a mix of English and Hindi-English (Code-Mixed) Tweets.

Talita, A. S., & Wiguna, A. (2019). Implementasi algoritma long short-term memory (LSTM) untuk mendeteksi ujaran kebencian (Hate Speech) pada kasus pilpres 2019. MATRIK: Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 19(1), 37–44.

Tanujaya, W., Dewi, D. R. S., & Endah, D. (2013). Penerapan algoritma genetik untuk penyelesaian masalah vehicle routing di PT. MIF. Widya Teknik, 10(1), 92–102.

Wei, B., Li, J., Gupta, A., Umair, H., Vovor, A., & Durzynski, N. (2021). Offensive language and hate speech detection with deep learning and transfer learning. ArXiv Preprint ArXiv:2108.03305.

Wiranata, R. B. (2021). A Genetic Algorithm Hyper-parameter Optimization of Ensemble Approach: Strategi Prediksi Saham Mempertimbangkan Indikator Teknikal & Sentimen Berita. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 8(3), 1442–1456.

Downloads

Published

2024-11-20