Fast and Accurate Indonesian QnA Chatbot Using Bag-of-Words and Deep-Learning for Car Repair Shop Customer Service

. A chatbot is a software that simulates human conversation through a text chat. Chatbot is a complex task and recent approaches to Indonesian chatbot have low accuracy and are slow because it needs high resources. Chatbots are expected to be fast and accurate especially in business settings so that they can increase customer satisfaction. However, the currently available approach for Indonesian chatbots only has low to medium accuracy and high response time. This research aims to build a fast and accurate chatbot by using Bag-of-Words and Deep-Learning approach applied to a car repair shop customer service. Sixteen different intents with a set of their possible queries were used as the training dataset. The approach for this chatbot is by using a text classification task where intents will be the target classes and the queries are the text to classify. The chatbot response then is based on the recognized intent. The deep learning model for the text classification was built by using Keras and the chatbot application was built using the Flask framework in Python. Results showed that the model is capable of giving 100% accuracy in predicting users’ intents so that the chatbot can give the appropriate responses and the response time is near zero milliseconds. This result implies that developers who aim to build fast and accurate chatbot software can use the combination of bag-of-words and deep-learning approaches. Several suggestions are presented to increase the probability of the chatbot’s success when released to the general public.


Introduction
A chatbot is a software that simulates human conversation through a text chat. It is one of the tasks in Natural Language Processing (NLP) in the field of Artificial Intelligence. Chatbots are expected to be fast and accurate [1] and chatbot performance is important to increase customer experience and therefore improve customer satisfaction [2]. However, the currently available approach for Indonesian chatbots only has low to medium accuracy and high response time. The accuracy of the chatbot is a measure of how likely the chatbot is to give the correct (matching) answer to the question asked by the user [3]. Whereas response time is defined as the time required by the chatbot to give a response based on the user's query [4]. Research [5] has an accuracy of 26-78% using pre-built platforms namely Artificial Intelligence Markup Language (AIML) and Pandorabots. Another research has an accuracy of up to 93.75% [6] and another research using RNN for intent classification has an accuracy of 81% [7]. Another research using the GloVe model and CNN produces the best accuracy of 95.84% [8]. Only one research reports the chatbot's response time and it has a low response time with an average response time of 3.4 seconds [4]. Chatbot in Bahasa Indonesia had been used for FAQ in a university library during a pandemic situation [9], virtual assistant for banking service [10], [11], hotel [12], [13], software requirements elicitation [14], legal information service [15], digital historian [16], maternal and child health [17], and e-commerce [5].
The approach to building a chatbot varies from traditional methods such as pattern matching [11] and cosine similarity [18], to the more modern, neural-network-based approaches such as sequence-to-sequence model [19] and Transformers [20]. However, it has been proved that the development of a pattern-matching or rule-based approach is very difficult [21]. The later approaches will also allow the users to use free text as their queries, rather than a fixed pattern [22]. Some authors may use the pre-build platform such as RASA [23], Dialogflow [9], and AIML [5], [12]- [14]. This research aims to build a deep-learning-based chatbot from scratch using TensorFlow/Keras.
A deep neural network is a type of neural network that has more than one hidden layer. The higher complexity of the deep neural network will allow the model to do a more complex task including language modeling and Question-and-Answering task such as chatbot. For a computer to be able to process text, the text needs to be pre-processed and undergo feature extraction steps before being fed into the model. The text pre-processing tasks include tokenization, lemmatization, stemming, stop-word removal, etc. There are some text feature extraction techniques such as (Term Frequency-Inverse Document Frequency) TF-IDF [24], Term Document Matrix (TDM) [25], Bag of Words (BoW) [26], Word Embeddings [27], etc. Research showed that when using BoW, the text conversion of uppercase letters into lowercase letters gave a significant improvement in text classification accuracy [26]. Whereas stop word removal may improve the accuracy but the result is dependent on the dataset used [26]. The result from text feature extraction is a matrix that later is used for text classification. The text classification part may use some "traditional" Data Mining techniques such as Random Forest and Gradient Boosting [24] or a more modern, neural-network-based approach, such as Transformers. The text classification task may use Transformers' zero-shot learning approach but it is slow / resource-intensive [28].
Customer Service (CS) is one of the important parts of every business, including a car repair shop. Traditionally, CS tasks are held by human CS. However, when so many customers only ask about basic information, this can burden the utilization of human CS where the precious resource of human CS are wasted on answering very basic questions from the customers. The use of a chatbot can help CS [1] to alleviate this burden and reduce human resource costs for CS [29]. Furthermore, the use of chatbots can also avoid human error [29] and provide the latest information about the business [1]. Regardless of being a non-human CS, chatbots are proven to increase business agility [30], form positive customer relationships [31], and increase customer satisfaction [2], [32], [33]. This research will apply the chatbot to help a car repair shop CS. If the customers need further questions, the chatbot will direct the customer to a human CS.

Methods
The research methods are shown in Figure 1 which shows that the methods are separated into two phases: the training phase and the implementation phase. In the training phase, a set of preset queries and their corresponding intent is prepared as the training dataset in the JSON file format. The queries consist of questions such as "What is the opening hour?" and "How long will the repair take time?". There are a total of 16 intents with each intent having several possible queries. There is one special intent to handle the case when the user inputs an invalid or blank query or queries that the chatbot can not understand. This special intent will be responded to by the chatbot asking the user to type a valid / better query. The approach for this chatbot is by using a text classification task where intents will be the target classes and the queries are the text to classify. The chatbot response then is based on the recognized intent. An example of the training data is shown in Figure 2.
The preset queries will undergo text preprocessing and feature extraction steps which includes tokenization, lemmatization / stemming, and bag-of-words creation. Tokenization will list all words in the query. Lemmatization and stemming will change the words into their root form. Bag of Words will create a corpus that lists all unique words and count the word frequency for each word. These steps will result in a matrix that is ready to be used for model training. The model consists of one input layer, two hidden layers, and one output layer. The details of the model architecture are shown in Table 2. The model is trained using SGD optimizer with Learning Rate = 0.1. The trained model will be saved as an H5 file to be further used in the chatbot application development.
In the deployment phase, the user will input queries in the app which will then be replied to by the chatbot. The query will undergo the same text preprocessing and feature extraction as in the training phase before being fed into the model. The model will then output the predicted intent and the app will give the appropriate response based on the predicted intent. The confidence threshold for the intent prediction is set to 0.25 and the model will give a response based on the class with the highest confidence score. If there is more than one preset response, the app will give a random preset response for that intent. In case of a low confidence prediction score, the app will respond with a statement that the chatbot does not understand the user query. And for special or further queries, the chatbot will direct the customer to human customer service. Both the training and deployment phases are done in the Python environment by using several libraries such as NLTK [34] for tokenization and lemmatization, Sastrawi for Indonesian stopword removal and stemming, Tensorflow [35] and Keras [36] for the deep learning model, Pickle for saving the model, and Flask [37] as the framework to develop the web-based chatbot app.

Results and Discussion
By using 200 epochs, the training phase resulted in a model with 100% accuracy as shown in Figure 3. The training time was two minutes. The trained model is saved and then deployed in a web-based Python app. The result of the app is shown in Figure 4 and Figure 5. User will input their queries in the text field and hit the "Send" button and the chatbot will respond to it. For example, when the user asked about facilities, the chatbot will respond with information regarding the available facilities in the car repair shop. The timestamp of each message is displayed in each of the bubble chats. Figure 5 shows that when users' queries contain abbreviated words and/or typos (writing errors), the chatbot can still understand them and give the appropriate response (as long as another/other important keywords are present). When the words in the query are all abbreviated and/or typos, the chatbot will return a response saying that it can not understand the query, as shown in Figure 6. Figure 7 shows the Python console showing the difference between the user's query time and chatbot response time. The difference is near zero milliseconds. This shows that our results is better in both accuracy and speed than previous research mentioned in the Introduction section. The use of this Chatbot will certainly help the car repair shop CS to serve their customers who want to ask questions about their services. CS will only focus on serving customers who need further information about its services.    Figure 6. A screenshot of the chatbot app showing a respond to a sentence containing all abbreviated words and typos. The chatbot will respond by asking the user to input a more detailed / better-typed query

0230201-07
Despite its great accuracy and speed, when released to the public, research showed that a successful chatbot needs to be useful and easy to use [38], [39]. Research also showed that user satisfaction in using a chatbot is influenced by System Quality, Information Quality, and Service Quality [40]. Whereas the intention to use a chatbot is influenced by Information Quality and Personal factors (such as age and occupation) [40]. Further research may address and implement those insights to guarantee a successful chatbot implementation that is well-received by the public. Further research may also use a more sophisticated out-of-scope intent detection [20] rather than by providing one extra intent class as used in this research. Latent Dirichlet Allocation (LDA), a text clustering technique [41], might be explored to be used as an approach to / to assist the text classification task [42], [43]. Future research may also add Typographical Error correction to make the chatbot more robust in handling Typographical Errors using a method such as Schema Matching Technique [44]. Future research may add more datasets for both preset queries and their corresponding intents to further expand the functionality of the chatbot. The chatbot can be further improved by integrating the chatbot into the car repair shop information system so that it can handle service orders, provide real-time information about the current service queue, provide real-time information on the estimated time to complete service, etc. But such system may require a referee policy so that it will not be abused by irresponsible users.

Conclusion
This research aims to build a fast and accurate Indonesian QnA chatbot using bag-of-words and deeplearning approach applied to a car repair shop. The training phase resulted in a 100% accurate model. Results showed that the chatbot app can provide a fast (near-zero milliseconds response time) and highly accurate response based on users' queries. This chatbot can help to reduce the cost use of hiring human CS and reduce human errors. Despite from its already great performance, there are several suggestions to increase its chance of success when released to the general public. A. Nursetyo, E. R. Subhiyakto, and others, "Smart chatbot system for E-commerce assitance