Improving Accuracy of SMS Based FAQ Retrieval System

Report 1 Downloads 21 Views
Improving Accuracy of SMS Based FAQ Retrieval System Anwar D. Shaikh*, Mukul Jain, Mukul Rawat, Rajiv Ratn Shah, and Manoj Kumar Computer Engineering Department, Delhi Technological University, India {anwardshaikh,mukuljain.dce,mukulrawat18869, rajivratn}@gmail.com, [email protected]

Abstract. In the present scenario, we are looking for a better way to access information. Short Messaging Service (SMS) is one of the popularly used services that provide information access to the people having mobile phones. However, there are several challenges in order to process a SMS query automatically. Humans have the tendency to use abbreviations and shortcuts in their SMS. We call these inconsistencies as noise in the SMS. In this paper we present an improved version of SMS based FAQ retrieval system. We have mainly added three improvements to the previous system. They are (i) proximity score, (ii) length score and (iii) an answer matching system. Our experiments show that the accuracy of our system outperforms the accuracy of current state-of-the-art system. We demonstrate the effectiveness of our approach by considering many real-life FAQ-datasets from different domains (e.g. Agriculture, Bank, Health, Insurance and Telecom etc.). Keywords: FAQ retrieval, FAQ system, Similarity score, Proximity score, Length score, SMS Query, SMS processing, IDF.

1

Introduction

Due to increased penetration of the internet, now information can be accessed at any place, any time from any device connected with internet. Huge amount of information is spread over the internet which require a good information retrieval technique to make information access anytime and anywhere to everyone. Therefore, making an information retrieval system convenient has become an interesting area of research. Nowadays, there are several resources through which users can access information such as internet, telephone lines, mobile phones, etc. With the rapid growth in mobile communication, mobile phone has become a common mode of communication for most of the people. The numbers of mobile users are growing at a very fast rate. In India alone, there are around 893 million mobile subscribers1. The popularity of mobile phones is due to its unmatched portability. This encourages different businesses or information providers to think upon implementing information services * 1

Corresponding author. http://www.trai.gov.in/annualreport/English_Front_Page.pdf

P. Majumder et al. (Eds.): FIRE 2010 and 2011, LNCS 7536, pp. 142–156, 2013. © Springer-Verlag Berlin Heidelberg 2013

Improving Accuracy of SMS Based FAQ Retrieval System

143

based on mobile phones. SMS information service is one of the examples of mobile based information services. Existing SMS services such as service to access CBSE Exam Result requires user to type the message in some specific format. For example, to get the result of a particular student in CBSE examination, the user has to send a message CBSE-HS-XXXX (Where XXXX is the Roll number of the student)2. These are constraints to the users who generally feel it easy and intuitive to type a query in a “texting” language (i.e. abbreviations and the shortcuts). Some businesses such as “ChaCha”3 allow their users to make query through the SMSs without using any specific format. These queries, on the other hand, are handled by the human experts. However this approach provides users a kind of independence in writing the query yet this is not an efficient way because the system is limited to handle a small number of queries proportional to the number of human experts on the business side. This approach can be efficient if we have any system which can automatically handle query at the business side. A similar system based on SMS question answering system over a SMS interface was proposed in [1]. This system enabled user to type his/her question in SMS texting language. Such questions might contain short forms, abbreviations, spelling mistakes, phonetic spellings, transliterations etc. The system handled the noise by formulating the SMS query similarity over the FAQ database. This FAQ database was already provided to the system in the pre-processing stage. In this paper we present our approach based on proximity score, length score and an answer matching system. We have implemented this system for English & Hindi language, where the language of SMS and the language of FAQ are same. This system was developed as part of the event organized by Forum for Information Retrieval Evaluation (FIRE). The rest of the paper is organized as follows. Section 2 describes the prior work done in this area. In Section 3 we describe our contributions, explaining in detail the various changes we made in the original system. In Section 4, we provide details about our implementations, experiments and the results. Finally we conclude the paper in Section 5.

2

Prior Work

An automated question answering system was designed by authors of [3]. Authors of [1] proposed an approach named SMS based FAQ retrieval. The proposed system was a SMS based question answering system in which user is allowed to enter the question in the SMS texting language. System was given a FAQ corpus containing all possible frequently asked questions. Noise in the SMS query was handled by formulating the query similarity over the FAQ database as a combinatorial search problem. System views the SMS as a sequence of tokens and each question in the FAQ corpus was viewed as a list of terms. The goal was to find a question from the FAQ corpus that matches best with the SMS query and return the answer of the selected question as a response of the input query. SMS string is bound to have 2 3

SMS service- http://results.icbse.com/cbse-result-class-10/ http://www.chacha.com/

144

A.D. Shaikh et al.

misspellings and other distortions, which needed to be taken care of while performing the match. There is a pre-processing stage in which the system develops a domain dictionary and a synonym dictionary containing all the terms that are present in the FAQ corpus. For each term t in the dictionary and each token si in the SMS query, they defined a similarity measure α (t, si) that measures how closely the term t matches the SMS token si. They said the term t was a variant of si, if α(t, si) > 0. They defined a weight function ω (t, si) by combining the similarity measure and the inverse document frequency (idf) of t in the corpus. Based on the weight function, they defined a scoring function for assigning a score to each question in the corpus Q. The score measures how closely the question matches the SMS string S. _

max

Q

Where

3

ω t,

~

ω t,

α (t, si) idf t

(1)

(2)

Our Contribution

Our work is extension of the system described in [1]. Significant differences between these two systems are described below 1. 2. 3. 4. 5.

FAQ Scoring function is modified to include Length score and Proximity score. Similarity Measure is used as mentioned in [1]. While pre-processing Answers of the FAQs are also considered for the creation of domain dictionary and synonym dictionary. There are no changes in the process of list creation [1] and candidate set generation [1]. If there are many FAQs with similar score then we find the similarity between Answers of FAQs and the SMS query to break the tie. Also, if there is no matching FAQ found then we try to match the Answers with the SMS query to get the result.

In order to increase the accuracy of SMS based FAQ retrieval we proposed few enhancements in evaluating the score of FAQ from candidate set. We proposed that accuracy can be improved by considering proximity of SMS query and FAQ tokens as well as by considering length of the matched tokens from the SMS query to the FAQ question under consideration. We have formalized that: Score(Q) = W1 * Similarity_Score(Q,S) + W2 * Proximity_Score(Q,S) - W3 * Length_Score(Q,S)

(3)

Where Q is the FAQ question under consideration and S = {s1, s2, …, sn} is the SMS query. W1, W2 and W3 are real valued weights. Their values determine the portion of Similarity Score, Proximity Score and Length Score from the overall score of the

Improving Accuracy of SMS Based FAQ Retrieval System

145

FAQ question. W1 and W2 are adjusted such that their sum is 1.0 (or 100%). We have given more than half portion to Similarity score. W3 is assigned comparatively less value, as it tries to reduce the overall score if there are variations in the length of SMS and FAQ text. To calculate Similarity Score we have employed the methods proposed in [1]. Figure 1 shows various steps involved in our SMS based FAQ System.

Step-1: Pre-processing on SMS query. Step-2: for each token in SMS - Find ranked list of dictionary variants of token.

Step-3: Find the Candidate Set C using technique in [1]. Step-4: for each Ǫi in C -Find Similarity_Score using (1). -Find Proximity_Score using (4). -Find Length_Score using (6). -Find total score using (3).

Step-5: Return FAQs having highest score as a result. Fig. 1. Algorithms for SMS based FAQ System

Working of the system is depicted by an example in figure 2. After performing preprocessing on SMS query, Domain dictionary and Synonym dictionary are looked up to find the dictionary terms matching with the SMS token using Similarity measure described in [1], and a list of such terms in maintained, this step is referred as list creation. Based on the list, FAQs containing the terms present in the list are retrieved, all such FAQs are called Candidate set. For question in the candidate set, score of the FAQ is calculated using (3). Questions with the highest score are returned. Where C1, C2, C3, C4 and C5 are Candidate set of FAQ’s having different dictionary variants of tokens of SMS. C is the final Candidate set derived from C1, C2, C3, C4 and C5 Ǫ1 , Ǫ2 … Ǫn-1 and Ǫn are the set of FAQ’s from Candidate set C. fun() is the module responsible for calculating the total score using (1), (4), (6) and (3). e.g. C having the following FAQ’s for given SMS query

C =

{

Ǫ1: Which country won the most medals in Athens Olympics? Ǫ2: Which is the first country who hosted modern Olympics? Ǫ3: Which country will host 2016 Olympics? Ǫ4: Which country won most medals in swimming in Olympics? Ǫ5: Which country won gold medal in hockey in Beijing Olympics? …….

}

146

A.D. Shaikh et al.

 

Wch contry fst hostd mdrn olympcs

Country County Counter Count !

Fast Fist Fust First !



Haste Hosted Husted !



Mourn Morden Modern !

3

4

Olympics Olympus Olmec !

5

Find Similarity Score using (1) and Apply Pruning as in [1]

 Select a FAQ from Candidate Set 

"1

"2: Which is the first country who hosted modern Olympics?

!

"n-1

"n

Pre-processing First

 



country

hosted

      

modern

Olympics

! "    #

 

Use Similarity Score calculated above by (1)









      

Select FAQs having the highest score (above Threshold).

Result

Fig. 2. Working of SMS based FAQ System

3.1

Proximity Score

In further steps of improving the accuracy of the system, we have introduced the concept of proximity search. The working of our proximity search technique is depicted with an example in figure 3 and figure 4.

Improving Accuracy of SMS Based FAQ Retrieval System

147

Fig. 3. Mapping of SMS tokens with FAQ

Relative position of words in a sentence plays an important role; it allows us to differentiate this sentence with various other possible sentences – which have same words but in different order. So while finding a best match, we must consider the proximity of words. In the proposed solution we do not check proximity of a token with all remaining tokens, but we only consider two consequent words. In proximity search process, we save the positions of the matched SMS tokens and FAQ tokens, stop words are removed before saving position of tokens. Based on the distance between two consecutive tokens in SMS text and FAQ the calculation of Proximity Score is done. The proximity score can be calculated by (4): _

1

(4)

Where totalFAQTokens = number of tokens in FAQ and matchedToken = number of matched token of SMS in FAQ absolute difference between adjacent token pairs in SMS and corresponding pair in FAQ

(5)

Where n = number of matched adjacent pairs in SMS Figure 4 describes the calculation of Proximity Score with an example SMS and FAQ question. For calculating the value of distance we have taken only absolute value of distance as we believe that if two tokens were swapped their positions than in most of the cases the meaning of the SMS and FAQ question is unchanged. Unlike the Length Score, Proximity Score is always positive. The algorithms to calculate Proximity_Score in depicted in figure 5. Input to the function is the position of the matched token in the SMS and the FAQ. The function first calculates the distance by (5), which is the absolute difference between the consecutive SMS and FAQ token positions. Based on the distance final proximity score is calculated as per (4).

148

A.D. Shaikh et al.

Fig. 4. Calculation of Proximity Score

Fig. 5. Function for Calculating Proximity_Score

Improving Accuracy of SMS Based FAQ Retrieval System

3.2

149

Length Score

We further improved the accuracy of the system by considering the length of the unmatched SMS tokens in the FAQ under consideration. Length Score is defined as follows: _

1

(6)

Where totalFAQToken = total number of Tokens in FAQ question, totalSMSToken = total number of Tokens in SMS, matchedToken =number of SMS which matched from tokens of FAQ question. Since the Length Score is negative score (i.e. this score is subtracted from the overall score), so best Length Score is achieved when all the tokens of the FAQ question were matched from the all tokens in the SMS query. So in the best case Length Score is Zero (i.e nothing to be subtracted from the overall score). For e.g. In figure (4) we can see that for question Q2 all tokens matched with tokens in SMS. So this is the case of perfect matching and Length_Score can be calculated as follows. totalFAQToken = 5, matchedToken = 5, totalSMSToken = 6 Length_Score = = =0 Though we have used (6) only for calculations Length_Score in our system but we have identified a drawback of using this Length Score in case of a question having more number of tokens than SMS e.g. if there are 40 tokens in the FAQ and only 5 tokens in the SMS, in such cases the result would be always negative, even if there is match in FAQ for all SMS tokens. We think that there can be two possible solutions to the above problem. The first solution is applicable when very few FAQ questions in FAQ database have more number of tokens. Solution to this problem is that rewrite the big FAQ question into the FAQ question having less number of tokens. For e.g. Original FAQ Question: “DTU offers various Tech courses. What are the Internship opportunities for M Tech students at DTU? Do all M Tech students get the Internship offer?” Corresponding Small Question: “What are Internship opportunities for M Tech students at DTU?” The second solution is applicable when there are many big questions in FAQ database and rewriting them is not possible. In this case instead of subtracting the Length_Score (7), we add the Length Score in the overall score (8). In this particular case we think that modified Length Score (7) with modified total Score (8) can be used. A transition function can be designed in future for smooth transition between (6) and (7) to calculate the Length_Score based on the condition stated above. _

1 1

(7)

150

A.D. Shaikh et al.

,

_

,

,

(8)

Where totalFAQToken = total number of Tokens in FAQ question, totalSMSToken = total number of Tokens in SMS, matchedToken = number of SMS which matched from tokens of FAQ question. In the best case Length Score would be 1 when all the tokens in FAQ were matched by all tokens in SMS. 3.3

Matching with Answers

In further steps of improving the accuracy of the system, we have introduced the idea that along with matching of SMS query with the FAQ-Question - we can match the SMS with the FAQ-Answer also, because some of the words in the SMS might be present in the FAQ-Answer but not in the FAQ-Question. Matching with answers is considered in both the cases mentioned below: •

There is more than one FAQ-question having the closest matching with the SMS query. There is no matching FAQ-question found.



Note: In pre-processing step we have also considered FAQ answers for creation of Domain Dictionary. For e.g. Let there is a question-answer in the FAQ database as follows: FAQ: “What are the different insurance schemes?” Answer: “LIC, LIC Jivan Saral, LIC Jivan Tarang, LIC Plus, Bajaj Allianz, ICICI Lombard etc. are different insurance schemes.” SMS:

“wht r difrnt LIC scems?”

Suppose “LIC” word is not present in any other FAQ question then the earlier technique will not able to answer this question correctly but our technique will able to answer this question correctly as in this case we will search for “LIC” token in FAQ answer too and will get the correct result. Also, if there are more than one FAQ’s are having same score, then we find the similarity between the answer and the SMS query. The FAQ Answers having more matching SMS tokens will be considered as the best match.

4

Implementation and Experiments

4.1

Implementation

4.1.1 FAQ Pre-processing As described in [1] we do the pre-processing of the FAQ corpus. In pre-processing a domain dictionary and synonym dictionary is created based on FAQ corpus, we have considered questions as well as answers for the creation of domain and synonym

Improving Accuracy of SMS Based FAQ Retrieval System

151

dictionaries. All questions and answers in FAQ corpus are indexed for fast lookup during FAQ retrieval process. While creating domain dictionary stop words are removed from the FAQ, as the stop words are not important and are generally not used by SMS users. 4.1.2 SMS Pre-processing Stop words are also removed from the SMS input if present. Also if there are numbers present in the SMS then they are converted into their corresponding string format and these strings are used for calculating similarity over the token. e.g. 2day is converted to twoday. Occurrences of single characters in the SMS are also removed, because generally single characters are not much important in deciding the meaning of the SMS. 4.1.3 Tools Used for Implementation We have used Lucene4 for indexing the tokens of FAQ. Wordnet5 English was used to find synonyms of different English words while creating synonym dictionary. 4.1.4 Language Specific Changes There were some changes done to make the system applicable for Hindi language. We have used Hindi Wordnet6 API 1.2 while creation of synonym dictionary. The list of stop words for Hindi language was created and used in the pre-processing of the SMS and FAQs. The similarity threshold for list creation and score threshold for selecting the correct FAQ were changed and made suitable for Hindi language. 4.2

Experiments

4.2.1 SMS Based FAQ Retrieval Task Experiments were conducted for the fulfilment of the tasks organized by Forum for Information Retrieval Evaluation (FIRE7) in year 2011. There were various subtasks of SMS based FAQ Retrieval Task, out of which we have participated in MonoLingual FAQ Retrieval (same language FAQ Retrieval) for English and Hindi language. In this subtask the language of input SMS and the FAQ corpus was same. So, the goal was to find best matching questions from the mono-lingual collection of FAQs for a given SMS8. 4.2.2 Dataset The FAQ and SMS dataset was provided by FIRE. FAQs were collected from online resources and from government and private sector. This dataset contained data from 4

http://www.lucene.apache.org http://www.wordnet.princeton.edu 6 http://www.cfilt.iitb.ac.in/wordnet/webhwn 7 http://www.isical.ac.in/~clia/ 8 http://www.isical.ac.in/~fire/faq-retrieval/faq-retrieval.html 5

152

A.D. Shaikh et al.

various domains – Agricculture, Banking, Career, General Knowledge, Heaalth, Insurance, Online railway reservation, Sports, Telecom and Tourism. Table 1 show the num mber of FAQs and the In-domain and Out-domain SMS queries used in the experim ments. SMS queries for which there is a matching questtion in the FAQ corpus then its In-Domain SMS query, other queries are called O OutDomain SMS queries. Tablle 1. Number of FAQs and SMS Queries Language

FAQs

In-domain SMS

Out-domain SMS

Total SMS

Hindi

1994

200

124

324

English

7251

728

2677

3405

4.2.3

FIRE 2011- SMS Based FAQ Retrieval Task Results

Fig. 6. Results of English-monolingual task

13 teams from various universities u participated in the English Monolingual taask. The results are shown in th he figure-6; Vertical axis represents Mean Reciprocal R Rank (MRR). Performance of ou ur team is marked in red colour in the graph. Our resultt for this task is shown in table 2 – Tab ble 2. English-Monolingual task result

Task English-Monolingual

In-domain Correct

Out-domain Correct

MRR

553 / 704

871 / 2701

0.830

Imp proving Accuracy of SMS Based FAQ Retrieval System

153

H Monolingual task; the results are shown in the figuure7 teams participated in Hindi 7. Performance of our team m is marked in red colour in the graph. Table 3 shows our result in detailTa able 3. Hindi-Monolingual task result

Task Hindi-Monolingual

In-domain Correct

Out-domain Correct

MRR

198 / 200

3 / 124

0.99

Fig. 7. Results of Hindi-monolingual task

As you can observe the out o domain results in case of English and Hindi task is vvery low because the FAQ score (3) ( threshold was not properly selected. So, we have repeaated this experiment with differen nt threshold. The improved results are shown in section 4.22.4. 4.2.4 Effect of Various Proposed Techniques on the Result As explained above, the maatching between FAQ and SMS is performed based on thhree factors- Similarity, proximitty and length. We have conducted experiments to evaluuate the correctness of the systeems based on these three factors in four different possible ways. As the similarity scorre is the base of the matching process, we have consideered similarity score in all exp periments. In first experiments only Similarity alonee is considered for matching, in n second – Similarity along with Proximity is consideredd, in third experiment Similarity and a Length are considered and in fourth experiment all thhree factors are considered for matching. m Table 4 and table 5 shows the result of experimeents conducted for Hindi languag ge and English language respectively. MRR indicates m mean reciprocal rank. Those quesstions, which had score greater than a particular threshoold, were considered for matcching. The experiments were conducted with the saame threshold for all experimen nts. The threshold used for these experiments was differrent and more accurate than the threshold t used in FIRE 2011 task.

154

A.D. Shaikh et al. Table 4. Results of Hindi FAQ retrieval experiments

In-domain Correct 197

Similarity

Out-domain Correct

MRR 6

0.99005

Similarity & Proximity

197

22

0.99005

Similarity & Length

197

97

0.99116

Similarity & Length & Proximity

198

118

0.99449

We can see from our exp perimental results in table 4 and 5 that the accuracy of our system is much better than the accuracy of current state-of-the-art system. It aalso shows that we are achievin ng the best accuracy when we are considering similarrity, length and proximity score in calculation of overall score of a FAQ. Table 5. Results R of English FAQ retrieval experiments

In-domain Correct

Out-domain Correct

MRR

Similarity

519

234

0.7529

Similarity & Proximity

520

393

0.7568

Similarity & Length

538

1981

0.8877

Similarity & Length & Prroximity

521

2281

0.90041

These results are shown in the form of graph in Figure 8 and 9:

Fig. 8. Result of Hindi FAQ retrieval task

Imp proving Accuracy of SMS Based FAQ Retrieval System

155

Fig. 9. Result of English FAQ retrieval task

The results show that th he Proximity factor does not improve the in domain reesult too much but improved th he out domain result. As per our observations, the reaason behind the improvement off the out domain result is that – in some FAQs there are tokens much more similar to t those in SMS- compared to the other FAQs, but they are located at positions far diffferent that the expected/original SMS token position. In such cases, the proximity sccore tries to eliminate these FAQs from the result. Length factor improved d the result by a large extent as compared to the reesult obtained by considering Proximity factor. And after combining the effectt of Similarity, Length and pro oximity the results are better as compared to the previious experiments.

5

Conclusion and Future Work

SMS based Question Answ wering systems may become one of the efficient, conveniient and cheapest way to extracct information. SMS based FAQ retrieval system propoosed by [1] is an automatic quesstion answering system that handles the noise in the SMS query by formulating the query q similarity over the FAQ database. In this paper, we present three techniques (ii) Proximity Score, (ii) Length Score and (iii) an answ wer matching system in order to o improve accuracy the SMS based FAQ retrieval system. We have demonstrated with h experiments that after applying our proposed techniquues, the accuracy of our system m outperforms the accuracy of the current state-of-thee-art system. In future this system caan be extended for FAQ retrieval using spoken querries, instead of SMS queries. On ne such approach is described in [4]. Acknowledgement. We provide p our sincere thanks to Dr. L. Venkata Subramaniiam for his continuous support and encouragement to complete this SMS based Questtion Answering systems.

156

A.D. Shaikh et al.

References 1. Govind, K., Sumit, N., Tanveer, A.F., Venkatesan, T.C., Subramaniam, L.V.: SMS based interface for FAQ retrieval. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Singapore, pp. 852–860 (2009) 2. Contractor, D., Kothari, G., Faruquie, T.A., Subramaniam, L.V., Negi, S.: Handling Noisy Queries in Cross Language FAQ Retrieval. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, October 9-11, pp. 87–96. MIT, Massachusetts (2010) 3. Sneiders, E.: Automated question answering using question templates that cover the conceptual model of the database. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 235–239. Springer, Heidelberg (2002) 4. Chang, E., Seide, F., Meng, H.M., Chen, Z., Shi, Y., Li, Y.-C.: A system for spoken query information retrieval on mobile devices. Proceedings of the IEEE Transactions on Speech and Audio Processing, 531–541 (November 2002)