Effect Of Random Under Sampling and Random Over Sampling Method On Svm Performance
Main Article Content
Abstract
Imbalanced data is a common challenge in sentiment analysis, as it can cause the classification model to be biased towards the majority class and ignore important information from the minority class. This study aims to evaluate the effect of resampling methods, namely Random Under Sampling (RUS), and Random Over Sampling (ROS), on the performance of the Support Vector Machine (SVM) algorithm in handling imbalanced sentiment data. Data were collected from social media X (Twitter) with the topic of naturalization of soccer players in Indonesia. The research process includes preprocessing, TF-IDF weighting, and model testing using K-Fold Cross Validation with K = 2, 5, and 10. Evaluation was carried out based on the F1-score matrix, recall, precision, and accuracy. The results show that the ROS method provides the best performance, especially at K = 10 with an F1-score value of 0.80, recall 0.78, precision 0.84, and accuracy 0.85. and RUS shows a lower performance improvement. These results show that selecting an appropriate resampling method can improve the performance of the classification model when faced with imbalanced data.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.