Incorporating Phrase-level Sentiment Analysis on Textual Reviews for ...

Report 14 Downloads 66 Views
Incorporating Phrase-level Sentiment Analysis on Textual Reviews for Personalized Recommendation Yongfeng Zhang, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University, Beijing, 100084, China

[email protected], {z-m,yiqunliu,msp}@tsinghua.edu.cn ABSTRACT Previous research on Recommender Systems (RS), especially the continuously popular approach of Collaborative Filtering (CF), has been mostly focusing on the information resource of explicit user numerical ratings or implicit (still numerical) feedbacks. However, the ever-growing availability of textual user reviews has become an important information resource, where a wealth of explicit product attributes/features and user attitudes/sentiments are expressed therein. This information rich resource of textual reviews have clearly exhibited brand-new approaches to solving many of the important problems that have been perplexing the research community for years, such as the paradox of cold-start, the explanation of recommendation, and the automatic generation of user or item profiles. However, it is only recently that the fundamental importance of textual reviews has gained wide recognition, perhaps mainly because of the difficulty in formatting, structuring and analyzing the free-texts. In this research, we stress the importance of incorporating textual reviews for recommendation through phrase-level sentiment analysis, and further investigate the role that the texts play in various important recommendation tasks.

Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Filtering; I.2.7 [Artificial Intelligence]: Natural Language Processing; H.3.5 [Online Information Services]: Webbased services

Keywords Personalized Recommendation; Collaborative Filtering; Sentiment Analysis; Text Mining

1.

INTRODUCTION

The continuous prospering of various Web2.0 online applications such as e-commerce and social networks has pushed users into the problem of information overwhelming [7]. The difficulty in accessing the desired online items further conPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. WSDM’15, February 2–6, 2015, Shanghai, China. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-3317-7/15/02 ...$15.00. http://dx.doi.org/10.1145/2684822.2697033.

tributed to the emerging of Personalized Recommender Systems (PRS) [12], which attempt to make personalized and targeted item recommendations to users on various platforms and devices. The research of personalized recommendation can be generally classified into Content-based [20], Collaborative Filtering (CF)-based [26] and Hybrid approaches [3]. The contentbased approach attempts to construct user and item profiles, and thus to make recommendations through some meticulously designed matching strategies [14], while CF-based approaches attempt to learn the preferences of users automatically by considering the historical choices of other users [12]. Hybrid recommender system, on the other hand, aims to combine the advantages of various strategies to make more informed recommendations. The CF-based approach has gained much attention from the research community, especially since the Netflix grand prize in the year of 2007 through 2009 [2], because the CFapproaches, especially those based on Matrix Factorization (MF) techniques [28, 8] on user-item numerical rating matrices, achieved important success in the task of rating prediction [12], and also exhibited great advantage in many other recommendation tasks [4]. However, the application of CF on numerical ratings has come across many difficulties in face of some key problems like data sparsity [35] and the explainability of numerical ratings [29, 32]. This further leads to some of the most concerned tasks in the research community, such as cold-start recommendation [36, 13], explainability of the recommendations [32], and automatic user/item profile generation. The continuous growing of online textual user reviews, as another important information resource besides numerical ratings, has shed light on brand new solutions towards these issues. For example, although a user may have only made a single numerical rating towards a product in online shopping websites, she usually expresses more detailed opinions towards various product features/aspects in the corresponding piece of review text [33]. This is exposited in the sampled review in Figure 1, where the user expressed positive

Figure 1: A piece of sampled user review towards the iPhone 5s product extracted from Amazon.com

opinions towards the features service and phone quality of the product, with the opinion words excellent and perfect, correspondingly, which composes into his overall numerical rating of five stars. We see that the textual reviews contain both productoriented information (i.e. product features) and user-oriented information (i.e. user opinions), and they usually exist in the form of pairs (user takes an opinion word to express his/her attitude towards a product feature). By conducting phraselevel sentiment analysis [15, 10, 5, 31] on the textual reviews, we are able to extract these feature-opinion word pairs, thus to gain more detailed information about the user’s overall opinion towards a product (the overall rating), which helps to understand the item characterises and user needs in a wider range of dimension, and to alleviate the problem of data sparsity in the scenario of cold-start recommendation. The extracted features and opinions also help to make explanations about why or why not an item is recommended [32], and to construct user (or item) profiles automatically by estimating their preferences towards the features. In this research, we aim to stress the importance of making further use of textual reviews in recommender systems. We focus on leveraging phrase-level sentiment analysis on the reviews to better solve the above mentioned cuttingedge research problems. In the following part, we review the related work in Section 2, and exposit some of the research topics, current research progress and the upcoming research plans on each topic in Section 3. Finally, we discuss, conclude and summarize the future directions in Section 4.

2.

RELATED WORK

Collaborative Filtering (CF)-based techniques [26] have achieved great success in personalized recommender systems [12] due to their ability to take advantage of the wisdom of crowds, especially in the task of numerical rating prediction. With the remarkable performance on prediction accuracy, the Matrix Factorization (MF) [28] approaches have gained great popularity in both research community and the industry. Some of the commonly used matrix factorization algorithms include Singular Value Decomposition (SVD) [1, 24], Non-negative Matrix Factorization (NMF) [9], Probabilistic Matrix Factorization (PMF) [23, 22] and Max-Margin Matrix Factorization (MMMF) [25, 21]. However, the ratings made by each each is usually far less than the large volume of products in a typical system, which implies that the user-item rating matrices that CF algorithms attempt to tackle with are usually very sparse [35], as exampled in Figure 2, which shows the scattered small communities corresponding to the sparse submatrices on the Yelp rating dataset [35]. Besides, new users and items are continuously added to the online systems, which further worsens the sparsity [34]. All these factors lead to the important cold-start problem in recommender systems, where it is difficult to estimate the preferences or make recombinations to a user who rated only a few of the items [36, 13]. Fortunately, the ever growing availability of textual reviews has shed light on new approaches to alleviate the cold-start problem. The product features and user opinions included in the textual reviews can be extracted, formatted and summarized through Sentiment Analysis [11, 18] techniques. One of the core tasks in sentiment analysis is to determine the sentiment orientations that users express in reviews, sentences or on specific product features, corre-

(a) Yelp dataset Matrix

(b) Yelp dataset Graph

Figure 2: Structures of Yelp dataset. In the left is the exampled structure of the rating matrix, and in the right is the real structure of the scattered blocks. sponding to review(document)-level [19], sentence-level [30, 17] and phrase-level [31, 15, 5] sentiment analysis. Review- and sentence-level sentiment analysis attempt to label a review or sentence as one of some predefined sentiment polarities, which are typically positive, negative and sometimes neutral [11]. Phrase-level sentiment analysis aims to analyze the sentiment expressed by users in a finer-grained granularity. It considers the sentiment expressed on specific product features or aspects [6]. One of the most important tasks in phrase-level sentiment analysis is the construction of Sentiment Lexicon [27, 10, 5, 15], which is to extract featureopinion word pairs and their corresponding sentiment polarities from these opinion rich user-generated free-texts. In [16], McAuley et al leveraged topic modelling to help extract the hidden topics from user reviews, thus to help improve the rating prediction accuracy. Textual reviews also help to construct intuitional explanations about why an item is recommended against the others. In [32], Zhang et al proposed a feature-level explainable recommendation strategy where the system persuades a user by telling him about his previously concerned product features in historical reviews. As an integration of content- and CF-based recommendation strategies, the hybrid recommendation techniques [3] have achieved state-of-the-art performance in real-world applications [12]. However, the manual construction of user and item profiles requires a vast amount of domain knowledge, which is expensive and time consuming [20, 14]. Phraselevel sentiment analysis on textual reviews makes it possible to conduct automatic profile construction, by analyzing and structuring the reviews corresponding to a target user or product. In this work, we exposit the promising potentialities that textual reviews bring into recommender systems, state our current research achievements on the related topics, and pose some of the future research directions.

3. 3.1

RESEARCH TOPICS Cold-Start Recommendation

In CF-based recommendation algorithms, one of the most fundamental causes of cold-start comes from the absence of purchasing or rating information of new users or items. Although various CF techniques attempt to construct meticulously designed algorithms to estimate user preferences from a small number of ratings [13, 36], the performance remains limited as we know little about a user philosophically.

3.2

Recommendation Explanation

An important problem of traditional CF-based recommendation algorithm in real-world application is the difficulty to explain the recommendation results. This is partially because of the fact that we do not know how a user composed his opinions from the many aspects into a single and simple numerical rating, and that CF algorithms (especially those based on matrix factorization techniques) only attempt to estimate these ratings in a latent (unknown) factorization space. These Latent Factor Models (LFM) makes it even more difficult to make the recommendations explainable, although the algorithm may achieve satisfactory rating prediction accuracies [29]. However, the existence of textual user reviews, as exposited in the previous section, provides a brand new information resource to help understand the user preferences and specific needs. By extracting the frequently mentioned product features from a user’s historical reviews, we are able to get to know the product aspects that he/she concerns. Different users may care about different produce features when making purchasing decisions. For example, a user may choose a mobile phone product given its large screen and good graphics performance, while another may make the same choice while considering its nice product design, although they may well give the same numerical rating of five stars. In many similar cases, the numerical ratings are insufficient to distinguish the preferences of different users, but the textual reviews tell us why a user made such a choice. Preliminary studies on this research topics have been published in our paper [32], which attempts to improve the rating prediction accuracy and at the same time construct intuitional recommendation explanations. We will further investigate the explanations constructed from textual reviews by considering different explanation forms like product tags and word clouds, etc., as well as the scrutability, effective1

http://www.yelp.com/dataset_challenge

Users  pay  a4en