THE EFFECT OF IMPERFECTIVE DATA SAMPLING METHOD ON SUPPORT VECTOR MACHINE ACCURACY
Main Article Content
Abstract
Sentiment analysis is used to understand the direction of public opinion, but problems arise due to the unbalanced distribution of sentiment data, where one class dominates. This imbalance causes classification models such as Support Vector Machine (SVM) to be biased towards the majority class, which results in decreased accuracy and generalizability of the model. This study aims to assess the effectiveness of two data balancing techniques, namely, SVM-SMOTE, and ADASYN, in improving SVM performance. The research data was taken from social media platform X (Twitter), and testing was conducted using the K-Fold Cross Validation method (K=2, 5, and 10) using evaluation metrics such as accuracy, precision, recall, and F1-score. The results show that without data balancing, the SVM model can only achieve an average accuracy of 76.34% and F1-score of 62.38%, which reflects the weakness in recognizing minority classes. The application of the two balancing methods successfully improved the model performance. ADASYN increased the F1-score to 67.94%, while SVM-SMOTE showed the most optimal results with 82.4% accuracy and 74.02% F1-score. These findings indicate that SVM-SMOTE is the most effective technique in handling data imbalance and improving sentiment classification accuracy equally.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[1] Q. Meidianingsih, D. E. Wardani, E. Salsabila, and A. N. Mutia, “Perbandingan Performa Metode Berbasis Support Vector Machine untuk Penanganan Klasifikasi Multi Kelas Tidak Seimbang,” vol. 23, no. 1, pp. 8–18, 2023.
[2] M. Tiara et al., “PEMANFAATAN ALGORITMA ADASYN DAN SUPPORT VECTOR MACHINE DALAM MENINGKATKAN AKURASI PREDIKSI KANKER PARU-PARU,” vol. 8, no. 5, pp. 8773–8778, 2024.
[3] A. Herdhianto, Sentiment Analysis Menggunakan Naïve Bayes Classifier (NBC) pada Tweet Tentang Zakat. 2020.
[4] E. Sonalitha, D. Setyawati, and S. Haryanto, “University transformation towards a learning experience facing the world of work and industry,” J. Penelit., vol. 18, no. 2, pp. 40–54, 2021, doi: 10.26905/jp.v18i2.7066.
[5] A. Deolika, K. Kusrini, and E. T. Luthfi, “Analisis Pembobotan Kata Pada Klasifikasi Text Mining,” J. Teknol. Inf., vol. 3, no. 2, p. 179, 2019, doi: 10.36294/jurti.v3i2.1077.
[6] M. Y. Khan, A. Qayoom, M. S. Nizami, M. S. Siddiqui, S. Wasi, and S. M. K. U. R. Raazi, “Automated Prediction of Good Dictionary EXamples (GDEX): A Comprehensive Experiment with Distant Supervision, Machine Learning, and Word Embedding-Based Deep Learning Techniques,” Complexity, vol. 2021, 2021, doi: 10.1155/2021/2553199.
[7] D. Cahyanti, A. Rahmayani, and S. A. Husniar, “Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara,” Indones. J. Data Sci., vol. 1, no. 2, pp. 39–43, 2020, doi: 10.33096/ijodas.v1i2.13.
[8] F. Tempola, M. Muhammad, and A. Khairan, “Perbandingan Klasifikasi Antara KNN dan Naive Bayes pada Penentuan Status Gunung Berapi dengan K-Fold Cross Validation,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 5, pp. 577–584, 2018, doi: 10.25126/jtiik.201855983.
[9] R. Tineges, A. Triayudi, and I. D. Sholihati, “Analisis Sentimen Terhadap Layanan Indihome Berdasarkan Twitter Dengan Metode Klasifikasi Support Vector Machine (SVM),” J. Media Inform. Budidarma, vol. 4, no. 3, p. 650, 2020, doi: 10.30865/mib.v4i3.2181.