Comparing Optimizer Strategies For Enhancing Emotion Classification In IndoBERT Models

. Emotions are one of the reactions of human when they receive physical or verbal action. Every human action is based on emotion. Every opinion expressed in the comments column also contains the author's emotions. This research aims to classify five emotions, Marah, Takut, Senang, Cinta


Introduction
Emotions are one of the human reactions when receiving physical or verbal actions.Emotions can be shown through facial expressions or actions.[1] Emotions are experiences that humans experience consciously to characterize a psychological state that is important as part of human nature, such as joy, anger, love, sadness, fear, and so on.[2] Every human action is based on emotions.Every opinion expressed in the comments column also contains the author's emotions.Dissemination of information can influence public opinion.The emotional content in this opinion is interesting to analyze in research, especially the emotional patterns contained in this opinion.Text classification aims to analyze, process, and extract information contained in text.In emotion classification, a comment will be taken to obtain the information contained in it so that we can find out the emotion in the comment.[3] Many companies currently use comments from various social media in the form of Instagram, Twitter, and Facebook.[4] Through this, they can understand customers' feelings about the company.This opinion processing usually uses sentiment analysis which has developed in various fields of business, government, and organizations.[5] Natural Language Processing (NLP) plays an important role in the field of Artificial Intelligence (AI).
Natural Language Processing enables machines to understand, interpret, and respond to human language, thereby facilitating humancomputer interactions.NLP is tightly integrated with machine learning (ML) and AI, enabling the development of voicecontrolled systems, language translation, sentiment analysis, and a variety of other applications.The combination of NLP, AI, and ML drives the automation of data analysis processes and changes the way we interact with technology, making advanced tools more accessible and user-friendly.[6] The development of artificial intelligence technology has been enormous in recent years due to the use of the Internet, big data, the Internet of Things, and the use of very massive processing power.[7], [8] Nowadays pre-trained language models have helped increase the sophistication in many areas of NLP (Natural Language Processing).Currently, many machine learning models are being developed for sentiment analysis.This includes conventional models such as SVM (Support Vector Machine), NBC (Naïve Bayes Classifier), and currently the use of deep learning models based on Neural Networks.One of the developments in the deep learning model is Transformer, including Bidirectional Encoder Representations from Transformers (BERT).[9], [10], [11] In recent years Bidirectional Encoder Representations from Transformers (BERT) combines the representation of words and sentences in a Transformer on a large scale.Determining the right model with a dataset is a way to overcome large and heavy computations.[12] In BERT there are various types of models such as IndoBERT, ALBERT, RoBERTa, and many more.Indonesian is the 10th most widely used language.Because of its wide use, NLP practitioners use the IndoBERT model to organize existing language resources.[1] IndoBERT is a transformer model by adapts BERT itself but is drilled purely as a masked language model drilled using huggingface by following the BERT base configuration.[13] BERT models are a development of Transformer by adapting the dataset used by the user.The use of models that match the language in the dataset also reduces the computational complexity.[4] Optimization is one aspect of improving the performance of the model used.[14] This research aims to implement the Indonesian Transformer model, namely the IndoBERT model, to compare the three optimization methods Adam (Adaptive Moment Estimation), Nadam (Nesterovaccelerated Adaptive Moment Estimation), and RMSProp (Root Mean Square Propagation).Optimization plays an important role in managing the accuracy by adjusting the learning rate of the model during the training process to get optimal prediction results.

Author
Year Method Result or Finding Bagus [1] 2022 Comparison BERT Uncased and IndoBERT with Adam Optimizer Classifying emotions using BERT and Adam, resulting in an accuracy value of 90% in BERT Uncased and producing result 81% accuracy in using the IndoBERT.
Hulliyah, Rayya, Bakar [15] 2022 IndoBERT with Adam Optimizer for develop Chatbot The use of optimizers in Natural Language Processing using IndoBERT with the Adam optimizer to develop chatbots for emotion classification obtained accuracy, F1 score, recall, and precision values of 89%, 89%, 89%, and 90% on train data.To validate the data, the accuracy, F1 score, recall, and precision values were 70%, 71%, 70%, and 72%.Wijaya [16]  Based on the provided literature study, it is possible to utilize IndoBERT-Base for classifying emotion from Dataset Twitter.The usage of IndoBERT for emotion classification is still relatively uncommon.However, in some studies, the effectiveness of IndoBERT is reported to be suboptimal compared to BERT-Uncased.[1] Several research works have explored the potential of optimizing IndoBERT's performance, with findings suggesting that the use of the Adam Optimizer can yield improved results.
This research endeavors to surpass previous studies by employing advanced methods and algorithms.The investigation employs IndoBERT-Base with a focus on emotion classification, utilizing a dataset with five labels yaitu Marah (Angry), Takut (Fear), Senang (Joy), Cinta (Love), and Sedih (Sad) from 9,480 Tweets.The dataset is larger than that used in [17].Additionally, three different optimizers are employed to identify the most effective one for achieving optimal classification results.The primary goal of this study is to specifically determine the best optimizer for emotion classification using pretrained IndoBERT-Base.

Figure 1. Research Methods
The research begins by carrying out a dataset search stage which can be illustrated in Figure 1.The dataset has been adjusted through the Preprocessing Stage for model training.Next, the dataset will be divided into 3 parts: Train Data, Validation Data, and Test Data.In the training process, the model is processed in Data Train, and its performance is assessed using various evaluation metrics, including graphical accuracy, graphical loss, confusion matrix, and classification results.Evaluation of this Matrix can provide new insights into the effectiveness of the model and its ability to classify and predict results correctly.

Dataset
In this research, we use the Emotion Dataset from public opinion from Twitter.This research uses a multi-label emotion dataset in Indonesian.This dataset has 5 labels, namely Marah (Angry), Takut (Fear), Senang (Joy), Cinta (Love), and Sedih (Sad).This dataset has a total of 9,480 tweets along with labels.

Text Preprocessing
This research uses several preprocessing, including: case folding such as lower case to change capital text to lower case and delimiters to remove parts that can influence results such as tags, urls, emoji, punctuation and whitespace.Next, the data is given a tokenizer provided in the BERT model.In addition, data that null will be removed.

Modelling IndoBERT
In this research, classification is carried out using the IndoBERT model, which is a development of BERT (BiDirectional Encoder Representations Transformers).This difference is because the architecture of this model focuses on processing and training data in Indonesian based on the Transformer architecture.In the Transformer architecture, there are two parts, namely the decoder and the encoder.Specifically for BERT, it only uses the Encoder architecture.[18] Figure 2. BERT architecture [18] The BERT model architecture is a multi-layer bidirectional Transformer.There are two stages carried out in BERT.There are two stages in BERT, namely the pre-training stage and the fine-tuning stage.In the pre-training stage, the BERT model architecture is drilled using unlabeled data and trained with different tasks, while for the fine-tuning stage, the BERT model is first initialized with pre-drilled parameters, and all parameters are tuned using labeled data.[13] In the pretrain stage, the model is drilled with a labeled dataset.For the IndoBERT model architecture, a large dataset was collected, consisting of about four billion words and about 250 million sentences from the Indonesian text collection.This dataset includes news texts from various sources such as local online, social media, Wikipedia, online articles, subtitle texts from video recordings, and a parallel data set known as Indo4B.Indo4B includes formal word data, informal word data and casual words in Indonesian.[12]

Optimizer 2.4.1. Adam (Adaptive Moment Estimation)
Adam is one of the optimizations that is widely used in training models today.This optimization can be used in machine learning or deep learning.Adam is a gradual optimization algorithm that uses an adaptive learning rate.The Adam optimizer combines the first and second moments of the gradient to update the parameters.
=  1 . −1 + (1 −  1 ).  (1) In this formula, m is the momentum,  is the repetition parameter,  is the gradient of the parameter loss function,  is the parameter, t is the iteration, lr is the learning rate, and epsilon is a small number to avoid divide it by zero.

RMSProp (Root Mean Square Propagation)
RMSPop is the default optimization algorithm used in some neural network models.These algorithms have adjustments for the problem of decreasing the learning rate too quickly.
.   (6) In this formula   is the parameter at iteration time (t),   is the gradient parameter at iteration time (t), [ 2 ]  is the storage of the average square of the gradient at iteration time (t), η is the learning rate, and ϵ is the epsilon is a small number to avoid divide it by zero.

Nadam (Nesterov-accelerated Adaptive Moment Estimation)
Nadam is a development of Adam.Nadam is a combination of the Nesterov Accelerated Gradient (NAG) optimizer with Adam (Adaptive Moment Estimation).
In this formula   is the parameter at iteration time (t),  is the learning rate,  1 and  2 are the decay factors for gradient and gradient square respectively.  is the average change in gradient,   is the average change in the squared gradient, and ϵ is a small value that prevents division by zero.

Evaluation
In this research, evaluation was carried out with three models, namely Adam, RMSProp, and Nadam.Evaluation has been carried out by comparing several optimizers with a confusion matrix.The results of the confusion matrix will be compared and explained to determine the performance of each optimizer.The performance will be compared in terms of precision, recall, f1-score, and accuracy.

Dataset
In this research, we use the Emotion Dataset from public opinion from Twitter.This dataset contains two columns containing tweets and labels.This research uses a multi-label emotion dataset in Indonesian.This dataset has five labels, namely Marah (Angry), Takut (Fear), Senang (Joy), Cinta (Love), and Sedih (Sad).This dataset has a total of 9,480 tweets along with labels.

Figure 3. Splitting Dataset
This research uses a dataset of 9,480 text data from the Emotion Dataset in Indonesian which is divided into 5 labels love 1397, anger 2231, sadness 2000, happy 2292, fear 1560.After being combined the dataset is divid-ed into three parts for train, validation, and test of 8:1:1.The amount of data divided into each is 7580 for train data, 948 for validation data, and 948 for test data.

Percetage Training Result
Emotion classification experiments were carried out using the IndoBERT model.The dataset comes from Tweets that have been labeled.The existing dataset is classified into five labels.The labels consist of 'Love', 'Angry', 'Sad', 'Happy', and 'Fear'.The control variables in training used in this model consist of thje dataset, IndoBERT model, learning rate 1e-6, and batch size 8.In the experimental process, 25 epochs were used by applying an early stop and patient of 5 epochs.In this experiment, the highest accuracy results were obtained on the Adam optimizer.On the other hand, the results of the RMSProp optimizer are more stable compared to other optimizers.In the Average Confusion Matrix results, each optimizer does not have a striking difference because the distance between each optimizer is not far.

Graph Training Result
In  Figure 4 (c) and (d), shows that in the graph using the RMSProp optimizer, the results obtained are quite good because the test results obtained can look stable.Although the obtained RMSProp Optimizer training accuracy is lower than Adam's, the difference between training and validation accuracy is smaller.This graph shows the model tends to be more general and perhaps better at handling new data.

0240203-07
This optimizer may be more resistant to overfitting.The RMSProp optimizer as the default optimizer in the IndoBERT model has a striking difference between train and validation accuracy which is smaller.This shows that the RMSProp optimizer tends to be more stable in processing train, validation, and test data.
Figure 4 (e) and (f), shows that the graph using the Nadam Optimizer in this model has the highest accuracy on train, validation, and test data, and the difference between training and validation accuracy is very large.In this re-search, the Adam optimizer can show the potential for overfitting, where the model is too large.learning de-tails from training data that may be unfamiliar and not applicable to new data.
Data overfitting arises because the distance between the training and validation graphs is quite large.As in the Tabel 3, Adam and Nadam Optimizer graph the train graph is growing but the validation graph is stagnant.The results from RMSProp get a fairly close distance difference between the training and validation graphs.If a model is trained too specifically on the training data, it will likely perform poorly on new data because the model has "memorized" patterns that are not generally patterns that can be applied to new data.

Anomaly Result
In this experiment, several reasons resulted in the data being difficult for the model to classify.Data in the form of sentences can produce different results when processed.Sentences that contain two or more emotions in them can produce different emotions in each process.On the other hand, in the preprocessing process of removing punctuation marks, several sentences have different outputs because semantically the sentences have different meanings.

Conclusion
Based on the results obtained, determining the right optimizer is done by conducting experiments according to the desired aspects.This research aims to classify five emotions, Marah, Takut, Senang, Cinta, and Sedih and evaluate the performance of three commonly used optimizer, Adam, RMSProp, and Nadam.The processed data used IndoBERT model for Indonesian text classification.The research purpose to search the best optimizer for text classification.This research uses control variables in the training used in this model consisting of dataset, IndoBERT model, learning rate 1e-6, and batch size 8.In the training process, 25 epochs are used by applying an early stop and patient of 5 epochs.
The result shows classification used Adam Optimizer 90,21%, RMSProp Optimizer 82.11, and Nadam Optimizer 88.61%.The Adam optimizer applied to the IndoBERT model yielded the best results.This shows a significant improvement from previous studies, which had emotion classification.There are still many ways to improve the performance of Indonesian Text Processing but Indonesian text data processing resources are still lacking and still need to be developed further.Emotion classification can be further developed with multilabels by processing each emotion in one data.
the research, training was carried out using the IndoBERT model.The existing dataset is classified into five labels.The control variables in the training used in this model consist of the dataset, IndoBERT model, learning rate 1e-6, and batch size 8.In the process used, 25 epoch training is applied by applying an early stop and patient of 5 epochs.

Figure 4 .
Figure 4. Graph Result (a) accuracy (b) lossIn Figure4(a) and (b), the graph of train and validation data using the Adam optimizer (a) and (b) shows signs of over-fitting.The training results show that with the IndoBERT optimizer model Adam obtained quite high accuracy results compared to the others, but the results obtained showed quite a large difference between train and validation accuracy.In this research, the Adam optimizer can show the potential for overfitting, where the model learns too many details from the training data that may not be common and cannot be applied to new data.Figure4(c) and (d), shows that in the graph using the RMSProp optimizer, the results obtained are quite good because the test results obtained can look stable.Although the obtained RMSProp Optimizer training accuracy is lower than Adam's, the difference between training and validation accuracy is smaller.This graph shows the model tends to be more general and perhaps better at handling new data.

Table 3 .
Comparison Accuracy and Confusion Matrix