Automatic Complaints Categorization Using Random Forest and Gradient Boosting

Muchamad Taufiq Anwar, Anggy Eka Pratiwi, Khadijah Febriana Rukhmanti Udhayana

Abstract


Capturing and responding to complaints from the public is an important effort to develop a good city/country. This project aims to utilize Data Mining to automatize complaints categorization. More than 35,000 complaints in Bangalore city, India, were retrieved from the “I Change My City” website (https://www.ichangemycity.com). The vector space of the complaints was created using Term Frequency–Inverse Document Frequency (TF-IDF) and the multi-class text classifications were done using Random Forest (RF) and Gradient Boosting (GB). Results showed that both RF and GB have similar performance with an accuracy of 73% on the 10-classes multi-class classification task. Result also showed that the model is highly dependent on the word usage in the complaint's description. Future research directions to increase task performance are also suggested.


Keywords


Automatic complaints categorization; Multi-class classification; Data mining; Random forest; Gradient boosting

Full Text:

Full Turnitin

References


E. Zuliarso, M. T. Anwar, K. Hadiono, and I. Chasanah, “Detecting Hoaxes in Indonesian News Using TF/TDM and K Nearest Neighbor,” in IOP Conference Series: Materials Science and Engineering, 2020, vol. 835, no. 1, p. 12036. https://doi.org/10.1088/1757-899X/835/1/012036

M. T. Anwar, H. D. Pumomo, S. Y. J. Prasetyo, and K. D. Hartomo, “Decision Tree Learning Approach To Wildfire Modeling on Peat and Non-Peat Land in Riau Province,” in 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2018, pp. 409–415. https://doi.org/10.1109/ICACSIS.2018.8618190

M. T. Anwar, S. Nugrohadi, V. Tantriyati, and V. A. Windarni, “Rain Prediction Using Rule-Based Machine Learning Approach,” Adv. Sustain. Sci. Eng. Technol., vol. 2, no. 1, 2020. https://doi.org/10.26877/asset.v2i1.6019

M. T. Anwar, W. Hadikurniawati, E. Winarno, and W. Widiyatmoko, “Performance Comparison of Data Mining Techniques for Rain Prediction Models in Indonesia,” in 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2020, pp. 83–88. https://doi.org/10.1109/ISRITI51436.2020.9315460

W. Hadikurniawati, M. T. Anwar, D. Marlina, and H. Kusumo, “Predicting tuberculosis drug resistance using machine learning based on DNA sequencing data,” in Journal of Physics: Conference Series, 2021, vol. 1869, no. 1, p. 12093. https://doi.org/10.1088/1742-6596/1869/1/012093

L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.

Z. Xia, K. Stewart, and J. Fan, “Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major us metropolitan area,” Comput. Environ. Urban Syst., vol. 87, p. 101599, 2021. https://doi.org/10.1016/j.compenvurbsys.2021.101599

M. Zolfaghari and M. R. Golabi, “Modeling and predicting the electricity production in hydropower using conjunction of wavelet transform, long short-term memory and random forest models,” Renew. Energy, vol. 170, pp. 1367–1381, 2021. https://doi.org/10.1016/j.renene.2021.02.017

K. Liu, X. Hu, H. Zhou, L. Tong, D. Widanalage, and J. Macro, “Feature analyses and modelling of lithium-ion batteries manufacturing based on random forest classification,” IEEE/ASME Trans. Mechatronics, 2021. https://doi.org/10.1109/TMECH.2020.3049046

D. Sun, J. Xu, H. Wen, and D. Wang, “Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest,” Eng. Geol., vol. 281, p. 105972, 2021. https://doi.org/10.1016/j.enggeo.2020.105972

T. Chen, T. He, M. Benesty, V. Khotilovich, and Y. Tang, “Xgboost: extreme gradient boosting,” R Packag. version 0.4-2, pp. 1–4, 2015. ebook

M. T. Anwar, E. Winarno, W. Hadikurniawati, and M. Novita, “Rainfall prediction using Extreme Gradient Boosting,” in Journal of Physics: Conference Series, 2021, vol. 1869, no. 1, p. 12078. https://doi.org/10.26877/asset.v3i1.8460

J. Fan, J. Zheng, L. Wu, and F. Zhang, “Estimation of daily maize transpiration using support vector machines, extreme gradient boosting, artificial and deep neural networks models,” Agric. Water Manag., vol. 245, p. 106547, 2021. https://doi.org/10.1016/j.agwat.2020.106547

W. Zhang, C. Wu, H. Zhong, Y. Li, and L. Wang, “Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization,” Geosci. Front., vol. 12, no. 1, pp. 469–477, 2021. https://doi.org/10.1016/j.gsf.2020.03.007

L. Cui, P. Chen, L. Wang, J. Li, and H. Ling, “Application of Extreme Gradient Boosting Based on Grey Relation Analysis for Prediction of Compressive Strength of Concrete,” Adv. Civ. Eng., vol. 2021, 2021. https://doi.org/10.1155/2021/8878396

A. I. A. Osman, A. N. Ahmed, M. F. Chow, Y. F. Huang, and A. El-Shafie, “Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia,” Ain Shams Eng. J., 2021. https://doi.org/10.1016/j.asej.2020.11.011

P. Kumari and D. Toshniwal, “Extreme gradient boosting and deep neural network based ensemble learning approach to forecast hourly solar irradiance,” J. Clean. Prod., vol. 279, p. 123285, 2021. https://doi.org/10.1016/j.jclepro.2020.123285




DOI: https://doi.org/10.26877/asset.v3i1.8460

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of Advance Sustainable Science, Engineering and Technology (ASSET) 2715-4211 (Online - Elektronik)

Creative Commons License
Advance Sustainable Science, Engineering and Technology (ASSET) is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.