Feature Selection and Performance Evaluation of Buzzer Classification Model

Main Article Content

Dian Isnaeni Nurul Afra
Radhiyatul Fajri https://orcid.org/0000-0003-2812-1077

Harnum Annisa Prafitia
Ikhwan Arief https://orcid.org/0000-0002-1958-0282

Aprinaldi Jasa Mantau

Keywords

buzzer classification, feature selection, spearman correlation, pearson correlation, chi-square test, social media

Abstract

In the rapidly evolving digital age, social media platforms have transformed into battleground for shaping public opinion. Among these platforms, X has been particularly susceptible to the phenomenon of 'buzzers', paid or coordinated actors who manipulate online discussions and influence public sentiment. This manipulation poses significant challenges for users, researchers, and policymakers alike, necessitating robust detection measures and strategic feature selection for accurate classification models. This research explores the utilization of various feature selection techniques to identify the most influential features among the 24 features employed in the classification modeling using Support Vector Machine. This study found that selecting 11 key features yields a remarkably effective classification model, achieving an impressive F1-score of 87.54 in distinguishing between buzzer and non-buzzer accounts. These results suggest that focusing on the relevant features can improve the accuracy and efficiency of buzzer detection models. By providing a more robust and adaptable solution to buzzer detection, our research has the potential to advance social media research and policy. This enabling researchers and policymakers to devise strategies aimed at mitigating misinformation dissemination and cultivating an environment of trust and integrity within social media platforms, thus fostering healthier online interactions and discourse.

References

[1] M. Arazzi, M. Ferretti, S. Nicolazzo, and A. Nocera, “The role of social media on the evolution of companies: A Twitter analysis of Streaming Service Providers,” Online Soc Netw Media, vol. 36, Jul. 2023, doi: 10.1016/j.osnem.2023.100251.
[2] M. Grandjean, “A social network analysis of Twitter: Mapping the digital humanities community,” Cogent Arts & Humanity, vol. 3, no. 1, 2016, doi: 10.1080/23311983.2016.1171458.
[3] L. K. Kaye, “Exploring the ‘socialness’ of social media,” Computers in Human Behavior Reports, vol. 3. Elsevier Ltd, Jan. 01, 2021. doi: 10.1016/j.chbr.2021.100083.
[4] D. M. Boyd and N. B. Ellison, “Social network sites: Definition, history, and scholarship,” Journal of Computer-Mediated Communication, vol. 13, no. 1, pp. 210–230, Oct. 2007, doi: 10.1111/j.1083-6101.2007.00393.x.
[5] G. Yavetz and N. Aharony, “Social media in government offices: usage and strategies,” Aslib Journal of Information Management, vol. 72, no. 4, pp. 445–462, Nov. 2020, doi: 10.1108/AJIM-11-2019-0313.
[6] I. Mergel, “Open innovation in the public sector: drivers and barriers for the adoption of Challenge.gov,” Public Management Review, vol. 20, no. 5, pp. 726–745, 2018, doi: 10.1080/14719037.2017.1320044.
[7] E. Rosenberg et al., “Sentiment analysis on Twitter data towards climate action,” Results in Engineering, vol. 19, Sep. 2023, doi: 10.1016/j.rineng.2023.101287.
[8] O. Czeranowska et al., “Migrants vs. stayers in the pandemic – A sentiment analysis of Twitter content,” Telematics and Informatics Reports, vol. 10, Jun. 2023, doi: 10.1016/j.teler.2023.100059.
[9] L. Ilias and I. Roussaki, “Detecting malicious activity in Twitter using deep learning techniques,” Appl Soft Comput, vol. 107, Aug. 2021, doi: 10.1016/j.asoc.2021.107360.
[10] M. T. Juzar and S. Akbar, “Buzzer Detection on Twitter Using Modified Eigenvector Centrality,” in 2018 5th International Conference on Data and Software Engineering (ICoDSE), 2018, pp. 1–5. doi: 10.1109/ICODSE.2018.8705788.
[11] M. Ibrahim, O. Abdillah, A. F. Wicaksono, and M. Adriani, “Buzzer Detection and Sentiment Analysis for Predicting Presidential Election Results in a Twitter Nation,” in Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015, Institute of Electrical and Electronics Engineers Inc., Jan. 2016, pp. 1348–1353. doi: 10.1109/ICDMW.2015.113.
[12] M. O. Ibrohim and I. Budi, “Hate speech and abusive language detection in Indonesian social media: Progress and challenges,” Heliyon, vol. 9, no. 8. Elsevier Ltd, Aug. 01, 2023. doi: 10.1016/j.heliyon.2023.e18647.
[13] M. Kantepe and M. C. Ganiz, “Preprocessing framework for Twitter bot detection,” in 2017 International Conference on Computer Science and Engineering (UBMK), 2017, pp. 630–634. doi: 10.1109/UBMK.2017.8093483.
[14] S. Wang, J. Tang, and H. Liu, “Feature Selection,” in Encyclopedia of Machine Learning and Data Mining, Boston, MA: Springer US, 2016, pp. 1–9. doi: 10.1007/978-1-4899-7502-7_101-1.
[15] H. I. Kuru, A. E. Cicek, and O. Tastan, “From Cell-Lines to Cancer Patients: Personalized Drug Synergy Prediction,” 2023, doi: 10.1101/2023.02.13.528276.
[16] R. Rodríguez-Pérez and J. Bajorath, “Feature importance correlation from machine learning indicates functional relationships between proteins and similar compound binding characteristics,” Sci Rep, vol. 11, no. 1, p. 14245, Jul. 2021, doi: 10.1038/s41598-021-93771-y.
[17] A. J. Panatra, F. B. Chandra, W. Darmawan, H. L. H. S. Warnars, W. H. Utomo, and T. Matsuo, “Buzzer Detection to Maintain Information Neutrality in 2019 Indonesia Presidential Election,” in Proceedings – 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019, Institute of Electrical and Electronics Engineers Inc., Jul. 2019, pp. 873–876. doi: 10.1109/IIAI-AAI.2019.00177.
[18] A. Suciati, A. Wibisono, and P. Mursanto, “Twitter Buzzer Detection for Indonesian Presidential Election,” in 2019 3rd International Conference on Informatics and Computational Sciences (ICICoS), 2019, pp. 1–5. doi: 10.1109/ICICoS48119.2019.8982529.
[19] S. Pebiana et al., “Experimentation of Various Preprocessing Pipelines for Sentiment Analysis on Twitter Data about New Indonesia’s Capital City Using SVM and CNN,” in 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022. doi: 10.1109/O-COCOSDA202257103.2022.9997982.
[20] M. Lobo and R. D. Guntur, “Spearman’s rank correlation analysis on public perception toward health partnership projects between Indonesia and Australia in East Nusa Tenggara Province,” J Phys Conf Ser, vol. 1116, p. 022020, Dec. 2018, doi: 10.1088/1742-6596/1116/2/022020.
[21] P. Sedgwick, “Spearman’s rank correlation coefficient,” BMJ, p. g7327, Nov. 2014, doi: 10.1136/bmj.g7327.
[22] F. Zinzendoff Okwonu, B. Laro Asaju, and F. Irimisose Arunaye, “Breakdown Analysis of Pearson Correlation Coefficient and Robust Correlation Methods,” IOP Conf Ser Mater Sci Eng, vol. 917, no. 1, p. 012065, Sep. 2020, doi: 10.1088/1757-899X/917/1/012065.
[23] P. Schober, C. Boer, and L. A. Schwarte, “Correlation Coefficients: Appropriate Use and Interpretation,” Anesth Analg, vol. 126, no. 5, pp. 1763–1768, May 2018, doi: 10.1213/ANE.0000000000002864.
[24] S. T. Nihan, “Karl Pearsons chi-square tests,” Educational Research and Reviews, vol. 15, no. 9, pp. 575–580, Sep. 2020, doi: 10.5897/ERR2019.3817.
[25] R. Singhal and R. Rana, “Chi-square test and its application in hypothesis testing,” Journal of the Practice of Cardiovascular Sciences, vol. 1, no. 1, p. 69, 2015, doi: 10.4103/2395-5414.157577.
[26] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18–28, 1998, doi: 10.1109/5254.708428.
[27] M. Kantepe and M. C. Ganiz, “Preprocessing framework for Twitter bot detection,” in 2017 International Conference on Computer Science and Engineering (UBMK), IEEE, Oct. 2017, pp. 630–634. doi: 10.1109/UBMK.2017.8093483.
[28] A. J. Panatra, F. B. Chandra, W. Darmawan, H. L. H. S. Warnars, W. H. Utomo, and T. Matsuo, “Buzzer Detection to Maintain Information Neutrality in 2019 Indonesia Presidential Election,” in 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI), IEEE, Jul. 2019, pp. 873–876. doi: 10.1109/IIAI-AAI.2019.00177.
[29] M. Ibrahim, O. Abdillah, A. F. Wicaksono, and M. Adriani, “Buzzer Detection and Sentiment Analysis for Predicting Presidential Election Results in a Twitter Nation,” in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, Nov. 2015, pp. 1348–1353. doi: 10.1109/ICDMW.2015.113.
[30] C. Goutte and E. Gaussier, “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation,” 2005, pp. 345–359. doi: 10.1007/978-3-540-31865-1_25.