Enhancing Accuracy of Recommender System through Adaptive Similarity Measures Based on Hybrid Features Deepa Anand and Kamal K. Bharadwaj School of Computer and System Sciences, Jawaharlal Nehru University, New Delhi, India
[email protected],
[email protected] Abstract. Collaborative Filtering techniques offer recommendations to users by leveraging on the preferences of like-minded users. They thus rely highly on similarity measures to determine proximity between users. However, most of the previously proposed similarity measures are heuristics based and are not guaranteed to work well under all data environments. We propose a method employing Genetic algorithm to learn user similarity based on comparison of individual hybrid user features. The user similarity is determined for each feature by learning a feature similarity function. The rating for each item is then predicted as an aggregate of estimates garnered from predictors based on each attribute. Our method differs from previous attempts at learning similarity, as the features considered for comparison take into account not only user preferences but also the item contents and user demographic data. The proposed method is shown to outperform existing filtering methods based on user-defined similarity measures. Keywords: Recommender Systems, Collaborative Filtering, Learning Similarity, Hybrid User Features.
1 Introduction The need to prune and filter large information spaces and to provide personalized services, has led to the emergence of recommender systems(RS). Collaborative Filtering(CF)[1][6] is a recommendation technique that emulates the process of word-ofmouth, where people glean opinions about objects not experienced by themselves, from like-minded friends and acquaintances. CF algorithms offer the advantage of cross-genre recommendations because of their non-dependence on content description. Breese et al.[5] categorized CF algorithms into memory-based, which make predictions based on the entire collection of previously rated items and model-based which depends on building a user model for making predictions. Memory-based algorithms are popular due to their ability to incorporate up-to-date preference information, when recommending items. The performance of a memory-based CF depends on construction of a reliable set of neighbors who contribute towards prediction of measure of interest in an object for a user. The estimation of degree of concurrence between users is thus a crucial step in N.T. Nguyen, M.T. Le, and J. Świątek (Eds.): ACIIDS 2010, Part II, LNAI 5991, pp. 1–10, 2010. © Springer-Verlag Berlin Heidelberg 2010
2
D. Anand and K.K. Bharadwaj
the filtering process. There have been several attempts [5][7][12][15] at capturing the elusive notion of user similarity based on a set of commonly preferred items. However, most of such measures are defined manually and their performance is often dataset dependent. Since the measures of similarity are used as weighs in determining the extent to which a user contributes to the prediction, the idea of learning such weights seems to be promising, since the weights learnt are adaptive, dataset dependent and should result in optimal/near optimal performance. Machine learning techniques have been employed to learn similarities between entities in different domains such as content-based image retrieval[14] and case-based reasoning systems[16]. Cheung and Tian [9] propose to learn the optimal weights representing user similarities as well as the user bias, which removes the subjectivity from the ratings, in order to minimize a criterion function. Using gradient descent, the weights and user biases are updated and the process iterates till it converges. A bidirectional similarity metric computation technique[8], learns the similarity between users and items simultaneously by employing matrix factorization. A model-based technique[4] derives weights for all user-user/ item-item coefficients with the objective of minimizing a quadratic cost function. A related work [11] applies optimization techniques to learn significance weights for items for clustering users according to their preferences. We propose a different approach to measuring the closeness between the tastes that two users share, by learning their similarity based on comparison of hybrid user attributes. The hybrid user features[2] allow us to construct a user profile by examining the item contents that the user has preferred thus supporting “collaboration via content”[13]. Such content-based features are then pooled with demographic user data to give a compact and hybrid set of features. The use of hybrid feature, instead of user ratings, offers twin advantage of a compact user profile representation and factoring in preference, content-description and user demographic information in the similarity estimation. Each attribute is treated as an independent predictor, thus allowing the feature to influence utility of an item for the active user. A distance-to-similarity function or a similarity table is learnt for each attribute depending on if the attribute is numeric or symbolic respectively. The predictions from each of the feature-level predictors are aggregated to give the final predictions. The experimental results support our ideas and demonstrate that the proposed methods are superior to the several predefined similarity measures. The rest of the paper is organized as follows: Section 2 provides a overview into the different predefined similarity measures. A method for learning feature-level similarities is introduced in Section 3. Section 4 presents an experimental evaluation of the proposed scheme and compares them with several user-defined similarity measures. Finally, Section 5 presents conclusions and points out some directions for future research.
2 Similarity Measures There have been several measures of similarity which gauge how closely the opinions of a user pair match. Such measures generally rely on computing the degree of agreement based on the set of items co-rated by the users. Pearson and Cosine similarity
Enhancing Accuracy of Recommender System through Adaptive Similarity Measures
3
measures are two of the most popular among them. Cosine similarity measure is defined as follows;
∑
i ∈ S xy
sim ( x , y ) =
∑
i ∈ S xy
r x ,i r y , i
r x ,i
2
∑
i∈ S x y
r y ,i
2
,
where Sxy is the set of items which users x and y have co-rated and
(1)
rx is the mean
rating for user x. Whereas Pearson correlation coefficient is defined as;
∑ (r
sim( x, y ) =
i∈S xy
∑ (r
i∈S xy
x ,i
x ,i
− rx )(ry ,i − ry )
− rx ) 2
∑ (r
i∈S xy
y ,i
− ry ) 2
,
(2)
Jaccard is another similarity measure, which computes the similarity between users based on the number of items co-rated by them, regardless of the rating conferred on the item. Candillier et. al.[7], introduce several weighted similarity measures for userbased and item-based CF. The method proposed, uses jaccard similarity as a weighting scheme and combines it with other similarity measures such as Pearson correlation coefficient to emphasize similarity of users who share appreciation on several items with the active users.
3 Learning User Similarity Similarity estimation is one of the important steps in the CF process. In this work, we propose to employ genetic algorithms to estimate this similarity between users, based on comparison of various hybrid features, for the movie recommendation domain. In the following subsections we briefly discuss the construction of a user profile based on hybrid features, the technique of learning the similarity for each feature and the proposed recommendation framework. 3.1 Hybrid User Features To assess the similarity between users we intend to use hybrid user attributes as proposed in [2]. In the movie domain, each movie can belong to one or more than one genre. In particular MovieLens defines 18 movie genres. The inclination of a user towards a particular genre can be evaluated by examining the set of movies belonging to a particular genre, that a user has highly rated. The degree to which a user prefers a genre is captured through GIM(Genre Interestingness Measure)[2]. The GIM corresponding to the 18 genres are augmented with the available demographic attributes such as user age, occupation, gender and state to give a set of 22 hybrid features. Hence the user profile which initially consisted of the ratings of a large number of
4
D. Anand and K.K. Bharadwaj
movies is squeezed to give a compact user profile, which involves content-based and demographic features as well. 3.2 Learning Similarity Function by Feature Comparison Traditionally defined similarity measures such as Pearson and Cosine, treat all attributes (item ratings) similarly while trying to evaluate user closeness. This may not truly reflect the contribution of each attribute towards the similarity. For example, the attribute GIMcomendy, reveals the degree of interest of any user in genre “Comedy”. It is possible that the predictions using this attribute alone gives best performance when all users who have shown interest in the genre are weighed equally, rather than weighing them by their degree of interest in the genre, i.e. the fact that the user “has liked” the genre is more important than “how much” he has liked it. Whereas for another genre, say romance, a small difference in the GIM values may imply a large reduction in the actual similarity between two users. Moreover traditional similarity measures cannot be easily extended when symbolic attributes such as occupation, gender, state etc are involved. When symbolic attributes are involved then two users are considered similar only if they have the same value for that attribute, thus leading to a coarse-grained approach to similarity computation. In real life, however, a person with occupation “teacher” and “student” may be quite similar in their tastes. To overcome all these shortcomings we adopt a different approach by viewing each feature as a means to compute predictions for the user, learning an optimal similarity function for each attribute and aggregating the predictions so obtained for the final estimated prediction. The feature set consists of numeric attributes (GIM1,..,GM18, Age) and symbolic attributes(Occupation, State, Gender). A precise representation of the similarity function depends on the data type. We follow the similarity representation of tables for symbolic attributes(occupation,state) and vector for numeric attributes(GIMs, age) from [16]. For gender feature, two values are similar only if they have the same values, and hence no similarity vector/table needs to be learnt. Note that the definition for similarity table as well as distance based similarity function is slightly altered such that the similarity values lie in the range [-1,1]. This is to deter users whose values for the particular attribute are far apart in terms of preferred items, from contributing to each other‘s prediction. 3.3 Learning Similarity Function by Using Genetic Algorithms Genetic algorithms base their operation on the Darwanian principle of “survival of the fittest” and utilize artificial evolution to get enhanced solutions with each iteration. The GA process starts with a population of candidate solutions known as chromosome or genotype. Each chromosome in the population has an associated fitness and these scores are used in a competition to determine which chromosomes are used to form new ones[10]. New individuals are created by using the genetics inspired operators of crossover and mutation. While crossover allows creation of two new individuals by allowing two parent chromosomes to exchange meaningful information, mutation is used to maintain the genetic diversity of the population by introducing a completely new member into the population. The process iterates over several generations till a convergence criterion is met. The task of learning an optimal
Enhancing Accuracy of Recommender System through Adaptive Similarity Measures
5
similarity function for each of the attributes can be accomplished by means of Genetic Algorithms. Chromosome Representation Each individual of a similarity table I, used for representing similarity function for symbolic attributes, is represented by a matrix of floating numbers, in the range [-1,1], of size n x n where n is the number of distinct values taken by the attribute and I(a,b) represents the similarity between attribute values ‘a’ and ‘b’. The similarity function for numeric attributes can be approximated by a similarity vector, which provides similarity values corresponding to a fixed number of distance values. The sampling points are chosen for each attribute by using “dynamic sampling” where an optimal distribution of sampling points is chosen from an interval depending on the number of difference values that fall in the interval, gauged from the training data. Fitness Function A fitness function quantifies the optimality of a chromosome and guides the process towards achieving its optimization goal by allowing fitter individuals to breed and thus hopefully improve the quality of individuals over the generations. The fitness of a similarity function represented as a table or a vector needs to be measured by the prediction accuracy offered by the selection and weighting of users according to the similarity function. To evaluate the fitness of chromosomes representing similarity functions, the training data for the user is divided into training and validation sets. The training data is utilized to build user neighborhood and prediction, whereas the validation set is used to learn the optimal similarity function by allowing the GA to search for the best similarity table/vector which leads to the least average prediction error for the validation set. The fitness of a similarity vector/table for an attribute A, is obtained by computing the average prediction error for the validation set, where the prediction is performed by constructing the neighborhood for the active user based solely on similarity of attribute A. The fitness function for an individual I based on attribute A is defined as;
fitness IA =
1 I | ra ,i − pra ,i A | , ∑ | V | i∈V
(3)
where V is the set of all ratings in the validation set, ra,i is the actual rating for item i by the user a, and
I
pra ,i A is the predicted score for the active user using similarity
vector/table represented by the individual I based on attribute A. Genetic operators To maintain the genetic diversity of the population through generations, it desirable to generate new individuals from the ones in the current generation. Crossover and mutation are the two most common genetic transformations. Crossover works by letting a pair of chromosomes exchange meaningful information to create two offspring, while mutation involves a random manipulation of a single chromosome to create to a new individual. The discussion on genetic operators follows from [16].
6
D. Anand and K.K. Bharadwaj
In our framework, the crossover for similarity vectors is done either using simple crossover or arithmetic crossover, the probability of choosing among the two methods being equal. For similarity matrices, arithmetic crossover, row crossover and column crossover are performed with every method having equal probability of selection. Components of the vector/matrix are mutated by modifying their value randomly. The numbers of components thus modified are also random. Note that each time a new individual of type similarity vector is created, the constraint of values being nonincreasing must hold. 3.4 Proposed Recommendation Framework The proposed technique of learning similarity at the attribute level can be employed to obtain predictions for an active user at the feature level and aggregate them to arrive at the final predicted vote for the active user.The dataset is divided into training set, TR, validation set, V, and test set, T. The main steps of the proposed recommender system framework are given below: Step1: Compute the GIM values of all users using formula (4) based on the training data set TR. Step 2: Find the optimal similarity vector/table for each attribute Step 3: Predict ratings based on feature level similarity function The predicted rating for an item i for active user u is based on Resnick’s prediction formula [15]. The final prediction for user i is obtained by aggregating predictions based on all attributes. Note that some attributes might not contribute to the predictions since the neighborhood set based on them might be empty.
4 Experimentation and Results To demonstrate the effectiveness of proposed technique of learning user similarities employing GA, we conducted experiments on the popular MovieLens dataset. The experiments are conducted with the goal of establishing the superiority of the proposed similarity learning technique over predefined similarity measures. 4.1 Design of Experiments The MovieLens dataset consists of 100,000 ratings provided by 943 users on 1682 movies. The ratings scale is in the range 1-5 with 1 - “bad” to 5 –“excellent”. The ratings are discrete. Each user in the dataset has rated at least 20 movies. For our experiments we chose five subsets from the data, containing 100,200, 300,400, and 500 users called ML100, ML200, ML300, ML400 and ML500 respectively. This is to illustrate the effectiveness of the proposed scheme under varying number of participating users. Each of the datasets was randomly split into 60% training data, 20% validation data and 20% test data. The ratings of the items in the test set are treated as items unseen by the active user, while the ratings in the training set is used for neighborhood construction and for prediction of ratings. The ratings in the validation
Enhancing Accuracy of Recommender System through Adaptive Similarity Measures
7
Fig. 1. MAE for ML400 over 30 runs
set are used to guide the GA learning process. For each dataset the experiment was run 30 times to eliminate the effect any bias in the data. The effectiveness of the proposed scheme is compared with the Pearson correlation coefficient(PCC)(Eq.2), cosine similarity(COS)(Eq. 1) and Weighted Pearson (WPCC) [7]. 4.2 Performance Measurements To compare the prediction accuracy we compare the various schemes via two metrics namely, Mean Absolute Error(MAE) and Root Mean Squared Error(RMSE). While MAE measures the average absolute deviation of the predicted rating from the actual ratings, RMSE uses a quadratic scoring rule to emphasize large errors. When preferences are binary i.e. when the task at hand is to guess if a user will or wont like an item, then classification metrics such as precision and recall are used to evaluate the performance of a recommendation algorithm. Precision estimates the proportion of useful recommendations among all items recommended to the user and Recall represents the fraction of useful items selected from among the number of actual useful items. In addition, F-Measure is an classification accuracy measure, which allows us to consider both recall and precision together by computing their harmonic mean. 4.3 Results To demonstrate the ability of the proposed method to offer better prediction accuracy we compare the MAE and RMSE with that PCC, COS and WPCC. The results are as presented in Table 1. The MAE and RMSE are computed based on the average over 30 runs of the experiment over the different datasets. A lower value of MAE and RMSE corresponds to a better performance. As is clear from the results in Table 1 the
8
D. Anand and K.K. Bharadwaj Table 1. MAE and RMSE comparison of proposed scheme with PCC, WPCC and COS Datasets MAE ML100 RMSE MAE ML200 RMSE MAE ML300 RMSE MAE ML400 RMSE MAE ML500 RMSE
PCC 0.856 1.07 0.862 1.161
WPCC 0.827 0.991 0.832 1.072
COS 0.856 1.164 0.891 1.309
Proposed 0.770 0.924 0.803 1.022
0.847 1.077 0.827 1.101 0.831 1.114
0.815 0.988 0.788 1.000 0.795 1.017
0.866 1.216 0.848 1.191 0.864 1.259
0.776 0.93 0.761 0.93 0.760 0.939
Table 2. Comparison of precision, recall and F1 measure of proposed scheme with PCC, WPCC and COS Datasets Precision ML100
ML200
ML300
ML400
ML500
Recall F-Measure Precision Recall F-Measure Precision Recall F-Measure Precision Recall F-Measure Precision Recall F-Measure
PCC 63.2
WPCC 64.6
COS 65.3
Proposed 68.3
78.7 67.6 62.7 79.9 67.7 62 80.4 67.6 61.9 79.4 66.9
80.5 68.8
82.7 69.8 62.6 83.1 68.4 62.1 84.1 68.5 62.4 83.1 68 61.6 81.8 67.1
86.1 73.1 62.7 84.7 69.5 62.5 86.1 70 62.8 85.3 69.6 61.9 85.6 69.5
61.2 79.2 66.7
63.2 82.1 68.9 62.5 82.5 68.7 62.6 82 68.3 61.6 81.4 67.7
proposed scheme considerably outperforms other user-defined similarity measures for all datasets with respect to both MAE and RMSE. This is due to the ability of the proposed technique to adapt the user similarity computation according to the dataset. Table 2 presents the performance comparison(in percentage) based on the classification accuracy by comparing the precision, recall and F-Measure for each of the different techniques. A higher value of these measures imply better performance. The proposed scheme again outperforms the user-defined similarity measures in terms of precision, recall and F-Measure in almost all cases, with the only exception being for ML200 where WPCC has a higher precision. The MAE and F1 measure for the different runs of the experiment for ML400 are shown in Fig 1 and 2 respectively. A total of 30 runs were made for each dataset. For all the runs the proposed method performed better than any of the user-defined measures in terms of predictive as well as classification accuracy.
Enhancing Accuracy of Recommender System through Adaptive Similarity Measures
9
Fig. 2. F-Measure for ML400 over 30 runs
5 Conclusions In our work we introduced a novel technique of evolving user similarity functions based on a set of hybrid user features. The method of evolving individual similarity function for each of the hybrid feature allows each feature to influence ratings prediction independently, thus allowing integration of predictions based on features whose type or range are vastly different. To evaluate our approach we tested it on the highly popular MovieLens dataset. The experiments establish the superiority of our method of learning user similarities over popular methods which are based on predefined similarity measures. Though the use of GA increases the time complexity of the proposed method, the learning process can be performed offline in a periodic manner, to adapt to the changes in data over a period of time. In our future work we plan to integrate the current approach of learning of featurewise similarity function with learning of the user-wise attribute weights for each of the hybrid attributes, thus quantifying the degree of importance of each feature for the active user. The current framework is specific to the movie domain and it would be interesting to explore the feasibility of extending the framework to other domains e.g. books, jokes etc. Another important direction for future work would be to incorporate the concepts of trust and reputation [3] to enhance recommendation accuracy.
References 1. Adomavicius, G., Tuzhilin, A.: Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transaction on Knowledge and Data Engineering 17(6), 734–749 (2005) 2. Al-Shamri, M.Y.H., Bharadwaj, K.K.: Fuzzy-Genetic Approach to Recommender System Based on a Novel Hybrid User Model. Expert Systems with Applications 35(3), 1386– 1399 (2008)
10
D. Anand and K.K. Bharadwaj
3. Bharadwaj, K.K., Al-Shamri, M.Y.H.: Fuzzy Computational Models for Trust and Reputation systems. Electronic Commerce Research and Applications 8(1), 37–47 (2009) 4. Bell, R.M., Koren, Y., Volinsky, C.: Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 95–108. ACM, New York (2007) 5. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: 14th annual conference on uncertainty in artificial intelligence, pp. 43–52. Morgan Kaufmann, San Fransisco (1998) 6. Burke, R.: Hybrid Recommender Systems: Survey and Experiments. User Modeling and User-Adapted Interaction 12, 331–370 (2002) 7. Candillier, L., Meyer, F., Fessant, F.: Designing Specific Weighted Similarity Measures to Improve Collaborative Filtering Systems. In: Perner, P. (ed.) ICDM 2008. LNCS (LNAI), vol. 5077, pp. 242–255. Springer, Heidelberg (2008) 8. Cao, B., Sun, J., Wu, J., Yang, Q., Chen, Z.: Learning Bidirectional Similarity for Collaborative Filtering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 178–194. Springer, Heidelberg (2008) 9. Cheung, K., Tian, L.F.: Learning User Similarity and Rating Style for Collaborative Recommendation. Information Retrieval 7(3-4), 395–410 (2004) 10. De Jong, K.A.: Learning with genetic algorithms: An overview. Machine Language 3(2-3), 121–138 (1988) 11. Jin, R., Chai, J.Y., Si, L.: An automatic weighting scheme for collaborative filtering. In: 27th annual international ACM SIGIR conference on research and development in information retrieval, pp. 337–344. ACM, New York (2004) 12. Ma, H., King, I., Lyu, M.R.: Effective missing data prediction for collaborative filtering. In: 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 39–46. ACM, New York (2007) 13. Pazzani, M.J.: A Framework for Collaborative, Content-Based and Demographic Filtering. Artificial Intelligence Review 13(5-6), 393–408 (1999) 14. Torres, R.S., Falcão, A.X., Zhang, B., Fan, W., Fox, E.A., Gonçalves, M.A., Calado, P.: A new framework to combine descriptors for content-based image retrieval. In: 14th ACM international conference on Information and knowledge management, pp. 335–336. ACM, New York (2005) 15. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: An Open Architecture for Collaborative Filtering of Netnews. In: ACM CSCW 1994 Conference on Computer-Supported Cooperative Work, pp. 175–186. ACM, New York (1994) 16. Stahl, A., Gabel, T.: Using evolution programs to learn local similarity measures. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 537–551. Springer, Heidelberg (2003)