Effect Of Random Under Sampling and Random Over Sampling Method On Svm Performance

Agil Dwi Saputra; Deni Arifianto; Reni Umilasari

pdf

Published: Aug 1, 2025

Updated: 2025-08-01

Keywords:

sentiment analysis, random under sampling, random over sampling, imbalance data, support vector machine

Agil Dwi Saputra

Universitas Muhammadiyah Jember

Deni Arifianto

Universitas Muhammadiyah Jember

Reni Umilasari

Universitas Muhammadiyah Jember

Abstract

Imbalanced data is a common challenge in sentiment analysis, as it can cause the classification model to be biased towards the majority class and ignore important information from the minority class. This study aims to evaluate the effect of resampling methods, namely Random Under Sampling (RUS), and Random Over Sampling (ROS), on the performance of the Support Vector Machine (SVM) algorithm in handling imbalanced sentiment data. Data were collected from social media X (Twitter) with the topic of naturalization of soccer players in Indonesia. The research process includes preprocessing, TF-IDF weighting, and model testing using K-Fold Cross Validation with K = 2, 5, and 10. Evaluation was carried out based on the F1-score matrix, recall, precision, and accuracy. The results show that the ROS method provides the best performance, especially at K = 10 with an F1-score value of 0.80, recall 0.78, precision 0.84, and accuracy 0.85. and RUS shows a lower performance improvement. These results show that selecting an appropriate resampling method can improve the performance of the classification model when faced with imbalanced data.

Issue

Vol. 1 No. 2 (2025): Integration of Automation and Information Systems in Enhancing Organizational Control and Risk Management

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details