A Collaborative Filtering Recommendation ... - Semantic Scholar

Report 9 Downloads 214 Views
A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation Zhimin Chen, Yi Jiang, Yao Zhao

A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation Zhimin Chen, Yi Jiang, Yao Zhao Institute of Information Engineering, Yangzhou University, Yangzhou, China e-mail:{zmchen,jiangyi,zhaoyao} @yzu.edu.cn doi:10.4156/jdcta.vol4. issue9.13

Abstract Collaborative filtering algorithm is one of the most successful technologies used in personalized recommendation system. However, traditional algorithms focus only on user ratings and do not consider the changes of user interest and the credibility of ratings data, which affected the quality of the system's recommendation seriously. To solve this problem, this paper presents an improved algorithm. Firstly, the user’s rating is given a weight by a gradual time decrease and credit assessment in the course of user similarity measurement, and then several users highly similar with active user are selected as his neighbor. Finally, the active user’s preference for an item can be represented by the average scores of his neighbor. Experimental results show that the algorithm can make the neighbor recognition more accurately and enhance the quality of recommendation system effectively.

Keyword: Collaborative Filtering, Similarity Measure, Time Weight, Trust Evaluation 1. Introduction To solve the problem of information overload in Internet, various recommender systems have been developed to provide personalized services at many large electronic commerce sites. They are ranked by sales, user interests and user ratings to recommend the goods. Currently, collaborative filtering (CF) has been one of the most successful technology for personalized recommendation[1, 2], in which the typical algorithm is the user-based nearest neighbor algorithm which was first used by Goldberg in building mail filtering systems Tapestry[3, 4]. The main idea is based on the assumption that similar users have similar preferences. By computing users similarity based on the user ratings to find the neighbors who have the similar interest with the active user. Then the active user’s preference for an item can be predicted by combining the neighbor’s scores for the same item. Finally, top-N items which the active user will most probably like are provided. But, as the most critical step, similarity measurement in traditional collaborative filtering algorithm only pay attention to the similarity score, not considering the user interest shifting with the change of time. The same treatment for the user scores at different time will lead to the recommended result departure from the user’s current information needs. In addition, as an important basis of similarity measuring, the reliability of user ratings will also affect the recommend quality. Unreal score data will result in the inaccurate neighbor set, which will reduce the forecast accuracy and recommendation performance. Therefore, in this paper, a new collaborative filtering algorithm is proposed based on time-weight and credit evaluation. It can reflect the changes of user interest in time and measure the trust degree of user ratings, which will effectively make better the performance of the recommendation system.

2. User-based collaborative filtering 2.1. algorithm description In the user-based collaborative filtering recommendation system, the user ratings data are usually described as a user-item rating matrix Rmn , in which m means the number of all users, n is the number of all items, and Ri, j is the score of item j rated by user i , indicating the user’s preference degree for the item.

- 106 -

International Journal of Digital Content Technology and its Applications Volume 4, Number 9, December 2010

The most important step in the user-based CF is the searching of the target user’s neighbor. Usually, the similarity is adopted as a means to measure the similar degree of user interests and hobbies through the common user ratings data. There are three main methods: cosine similarity, Pearson correlation coefficient similarity and the modified cosine similarity. Many experiments show that the Pearson correlation coefficient (PCC) can represent the similarity of users or items better that other methods [5, 6, 7]. So it is used in this paper, which is defined as follows:

 (R

u ,i

sim(u, v) 

 Ru )( Rv ,i  Rv )

iIuv

 (R

 Ru ) I 2

u ,i

iIuv

 (R

v ,i

 Rv )

(1) 2

iI u v

Where, sim(u, v) represents the similarity between user u and user v, I uv ( I uv  I (u)  I (v) ) means the item set rated simultaneously by user u and user v, Ru , i and Rv, i are the scores of item i rated by user u and v respectively. Ru and Rv represent the average scores of user u and v for their rated items respectively. At last, suppose N t denotes the current user t’s neighbor set. His rating for item i can be predicted as follow:

 (R

Puser _ based( t , i )  U t 

j N t

j ,i

 R j )  sim(t , j )



(2)

sim(t , j )

j N t

Where, U t represents the average score of user t for his rated items, R j ,i is the score of item i

rated by neighbor user j, R j means the average score of neighbor j for his rated items. sim(t , j ) means the similarity between user t and his neighbor

j, ru , j denotes the score of item j

rated by user u.

2.2. The problems of existing algorithms Existing user-based collaborative filtering algorithms calculate the user similarity for recommendation based on user-item ratings matrix. Although the method is simple, there are still two significant problems. Firstly, it focused only on the user ratings, without considering the user accessing time for items. However, in real environment, the user’s demand for resources is changing over time and their ratings for different items will also change when their interests change. But the existing recommendation system can hardly find the change, which will cause the system to deviate from the user needs for recommended resources to a large extent. Secondly, the existing system can not guarantee the reliability of user rating data. To sell products or carried out an attack, unfair or non-objective ratings would seriously affect the prediction accuracy and recommendation system performance. To solve this problems, this paper will improve the existing user-based algorithm by integrating the weight of user accessing time and the weight of reliability degree of user ratings, which can reflecting the change of user interest in time and enhance the evaluation accuracy of user reliability.

3. CF recommendation algorithm based on user interest and credit evaluation 3.1. Time weight based on user interest In the traditional user-based CF algorithm, the current user’s preference for an item is predicted by the neighbors that have similar interests to the same item. So the searching for neighbor users

- 107 -

A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation Zhimin Chen, Yi Jiang, Yao Zhao

is critical. As the user’s interest is change dynamically over the time, the user may have different ratings for the same item at different times. However, the traditional method has the equal treatment to the user ratings in the search for the nearest neighbors, causing the inaccuracy of the neighbor recognition and the poor quality of recommendation. Generally, recent rated items may play a more important role in predicting the current user’s interest item, while the early rated data have relatively small contribution to final recommendation. So it is necessary and reasonable that we compare the rating similarity between the target user and other user at the same time or relatively close period of time and determine the nearest neighbor. Table 1 is an example of user interest changes. Table 1.The example for user interest changes Rating interval

I1

I2

I3

I4

I5

U1

4

5



4

3

5

U2

3

5

4

4

2

20

U3

4

4

3

2

3

5

U4

3

4

4

4

2

10

In the table, there are four users’ rating scores for five items and the corresponding evaluation time interval that the greater value indicates the earlier evaluation time. By measuring the similarity only based on the score values, the traditional method can obtains the order of user's candidate neighbor as U 2  U 3  U 4 . Considering the time factor in our method, we should be appropriate to reduce the impact of user U 2 ’s interest to user U1 ’s preference due to their rating time far apart. Finally, the candidate neighbor order as U 3  U 4  U 2 may better reflect the user's current interest. So, we introduce the time weight of user access time to improve the importance of recent evaluation data in the course of producing the recommendation. Inspired by the forgotten rule, we find that people forget their memorial content faster after the short period of learning time and the forgotten procedure will become more slowly after a long enough time intervals. It is a nonlinear process from fast to slow[8, 9]. Therefore, the gradually forgotten strategy is adopted in this paper. By attenuating the importance of user scores according to the rating time at different velocity, the contribution of recent rating scores to the prediction will be enhanced while the early scores’ importance will be weaken. Suppose the reference starting time as t 0 and the user actual rating time as t r , we define t  tr  t0 as the time interval of user ratings, t Min  Min(t r  t 0 ) as the minimum interval and f tMax  Max(tr  t0 ) as the maximum interval. The final time-based weighting function can be defined as follows: H (t )  m  (

t  t Min ) 2 1 m t Max  t Min

(3)

Where, parameters m reflects the forgotten ability of the function, the greater the value of m is, the faster the attenuation is. The set of the m is determined by the change speed of user interest in the recommendation system, which rapid change means a relatively bigger setting value and a smaller value under slow interest change.

3.2. User credit evaluation A necessary prerequisite of effective tradition CF algorithm is that the score value in user-item

- 108 -

International Journal of Digital Content Technology and its Applications Volume 4, Number 9, December 2010

rating matrix is true and reliable. That is, the participating users are trustworthy. But for various reasons, such a premise is very difficult to guarantee. For example, in order to promote their products, some sellers often make fake and malicious appraisal through various illegal means so that their products can be recommended priority. In addition, the user’s casual and sloppy attitude will also reduce rating objectivity and impartiality. These noise data will seriously affect the accuracy of prediction and the performance of recommendation system. Therefore, it is necessary to assess the user’s credit in advance so as to ensure the true and reliable score. In order to search the credible users, we introduce the trust concepts to measure the user's reliability extent, mainly from two aspects, evaluation fairness and accuracy. Definition 1: evaluation fairness The indicator of evaluation fairness is used to measure the just extent of user’s attitude when they are rating for resource items. We use mean square error (MSE) to quantify the indicator, the greater the MSE is, the fairer user rating is. If the user u’s fairness is marked as E(u), it can be described as follows:    E (u )     

 (R

u ,i

 Ru ) 2

i N u

Nu 0

Nu  0

(4)

Nu  0

Where, N u means all of the items evaluated by user u, Ru ,i is the score rated by user u for item i in set N u , and Ru represents the average score of user u for all the items in set N u . Definition 2: evaluation accuracy The metric is used to measure the accuracy of user rating for resources, if the user’s score is very close to the item’s average score from all the users, it indicates that the user’s rating accuracy is high. In this way, the contribution of the unfair or non-objective ratings from the seller or malicious attacker will be reduce, which can improve the recommended accuracy of system. We define user u’s evaluation accuracy C(u) as follows:



 ( Ru , i  Ri )  iN u  C (u )   Nu   0 

Nu  0

(5)

Nu  0

Where, N u also means the items set evaluated by user u, Ru ,i is the item i’s score rated by user u, and Ri represents the average score of items i. In the whole item space, the fairer and more accurate rating represents the more trustworthy user. Therefore, we integrate the two indicators as a composite index to measure the reliability of user ratings. Supposing that the final trust degree of a user rating is marked as T (u ) , it can be defined as follows: T (u)  E(u)  C (u)

(6)

For the convenience for data process, we needs to standardize the value of calculated trust degree and map it into the interval [10,11] according to the proportion. In this paper, the most common standardized method is used as follows: T ' (u ) 

T (u )  Min(T (u )) Max (T (u )  Min(T (u ))

- 109 -

(7)

A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation Zhimin Chen, Yi Jiang, Yao Zhao

The above two weighted metrics have their own advantages. The time-based weight highlights the importance of recent rating data and can reflect the user interest changes in time. At the same time, the weight based on user credibility can find the trusted users and ensure the prediction accuracy of the recommendation system.

3.3. Improved CF algorithm description In order to find out the neighbor of the target user, the traditional PCC method mainly depends on the items rated by common users to calculate the user similarity, which may face the following problem: although the number of the item simultaneously rated by two users is extremely rare, the value of similarity calculated by traditional methods will still be high only because the limited rating score are very close. But in fact, this is not reliable because the interests may not be similar between the users with less common rating items. Therefore, we need to consider not only the similar degree of the different user’s ratings but also the number of simultaneously rated items. Only in this way, the computational accuracy of the user interest similarity can be ensured effectively. Thus we introduce a weighted factor to adjust the traditional similarity calculation, which is defined as follows S (u ) 

Min(k ,  )

(8)



Where, k indicates the number of items rated by user u and user v in common, γis the preset threshold for adjusting the user similarity, mainly determined by the sparsity of user-item ratings matrix. Generally, if the matrix is very sparse, the number of items commonly evaluated by two users is relatively small, the threshold should be set smaller. From the formula, we can find that if k is bigger than the thresholdγ, there is no necessary to adjust the similarity. Otherwise we need to adjust according to formula (8). At last, we combine above three weighting factors as the final user similar weight. It is defined as follows: W  H (t )  T ' (u)  S (u)

(9)

Then, we make use of the similar weight to alter the traditional PCC method in formula (1). The improved similarity measure formula can be described as follows:

 (R

u ,i

sim * (u, v)  W 

 Ru )( Rv,i  Rv )

iIuv

 (R

u ,i

 Ru ) I

iIuv

2

 (R

v ,i

 Rv )

(10) 2

iI u v

By using the improved similarity method to the traditional CF algorithms, we put forward a new algorithms based on the user interest change and credit evaluation. The main steps can be described as follows: CF arithmetic based on user interest change and trust evaluation. Input: target user T, user-item ratings matrix R, number of neighbor k. Output: The top-N recommended set for user T.

Step1. retrieve from matrix R for all the users, items and the items rated by target user T, denoted as U m , I n and I T . setp2. for each user u U m (u  T ) , calculate the similarity with the target user as sim * (T , u) according to our proposed method in formula (10), then select k users with larger similar value as user T’ nearest neighbor set, which is marked as NT   j1, j2 ,, jk  . Step3. for each item i  I c ( I c  I n  IT ) , compute item i ’s recommended degree PT , i for

- 110 -

International Journal of Digital Content Technology and its Applications Volume 4, Number 9, December 2010

current user T.

 (R

pT , i  U T 

 R j )  sim * (T , j ))

j ,i

j N T

 sim' (T , j )

(11)

jN T

Step4. Choose N items with higher predicted scores as the top  N recommended set for target user T.

4. Experiment and results 4.1. Date set and evaluation metrics In order to evaluate the performance of our proposed algorithm, we use the MovieLens collaborative filtering dataset collected by the GroupLens Research Project (http://www.grouplens.org), the dataset contains 943 users and 1682 movies. From the dataset, we extract 10000 ratings (1-5 scales) records rated by 212 users on 986 movies and each user at least rated 20 movies, in which 80% is as the training set and 20% as the test set. Accuracy is an important indicator for the evaluation of recommended system performance. As one of the most commonly used methods, the mean absolute error (MAE) is adopted as a metric in this paper to compare the prediction quality of our proposed approach with other collaborative filtering methods. Supposing the top-N prediction rating set for the active user is p , p 2 , , p N , and corresponding actual 1

rating set is q1 , q 2 ,, q N , the MAE can be defined as follows. N

MAE 



pi  qi

(12)

i 1

N

Where, N is the number of the items recommended to the active use. The lower the MAE is, the more accuracy the prediction for user interest of the recommendation system is.

4.2. Experimental design and analysis for results In this part, we design three groups of experiments to validate the effectiveness of our proposed CF algorithm. First of all, we analyze the parameters γ’s influence to prediction performance in S(u). Due to the diversity of data sparsity in different E-commerce systems, we can optimize the recommended results by dynamically adjusting the value of γ. Figure 1 shows the recommendation performance under different values. 0.95

MAE

0.90

γ=5

γ=10

0.85

γ=15

γ=20

0.80 0.75 0.70 5

10

15

20

25

30

the number of neighbor

Figure 1. The performance under different γ’s value From the figure, we can find that too small value of γ under different number of neighbor (5-30) may

- 111 -

A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation Zhimin Chen, Yi Jiang, Yao Zhao

lead to the computational result equal to 1 according to formula (8) , because the items rated simultaneously by two users is usually greater than the smaller threshold, while too big setting of γ may result in a lower weight value which can not improve the recognition accuracy of active user’s neighbor. Therefore, the correct assignation of the threshold is critical. For this dataset, the value of γ set as 15 can obtain the best quality. Secondly, we need to examine the parameter m’s influence to prediction performance in time weighting function H (t), which is adopted to reflect the forget speed of user interest. The values of MAE under different m varying from 0 to 0.8 are shown as Figure 2. 0.95

MAE

m=0 m=0.4 m=0.8

0.90 0.85

m=0.2 m=0.6

0.80 0.75 0.70 5

10

15

20

25

30

the number of neighbor

Figure 2. The performance under different forgotten velocity m When the value of m is 0, it means that the user ratings in different time are treated equally. It is equivalent to the traditional methods and can not reflect user’s interest change. When m is assigned with other different values (0.2 ~ 0.8), the algorithm will reduce the importance of early user ratings at different rate. For the dataset in our experiment, when the value of m is 0.4, the algorithm can get a better recommendation performance, that is to say, it can reflect the user’s real interest and satisfy their needs effectively. Finally, we compare the recommended performance of our proposed collaborative filtering algorithm based on interest change and credit evaluation (ICCEBCF) with the traditional user-based collaborative filtering algorithm (UBCF) under the premise of γ= 15 and m = 0.4.

Figure 3. Recommended performance comparison of two methods Due to the introduction of time weighted factor and user credit evaluation function, it can be seen from the figure that the values of MAE under different neighbor number are significantly lower than traditional methods. The organic combination of two weighted methods can not only capture the user’s recent interest accurately but also make a judgment for the objectivity of user ratings, which makes the final recommended results more reasonable and effective.

5. Conclusion To improve the quality of the recommendation system, we propose an effective collaborative filtering algorithm based on user interest change and credit evaluation in this paper. By integrating the time weight based on non-linear forgotten function and user credit evaluation during the course of the user similarity computation, we can find the target user’s neighbor with similar

- 112 -

International Journal of Digital Content Technology and its Applications Volume 4, Number 9, December 2010

interests more accurately, making final recommended results based on neighbor average ratings can better reflect the change of user interest in time, Compared with traditional algorithm, the experimental results show that our proposed method can enhance the system performance effectively. The future work is to resolve the serious scalability problems caused by the rapidly growing the users and items.

6. Acknowledgment The work described in this paper is supported by the Natural Science Fund of Jiangsu Province (No. BK2009699).

7. References [1] B.Sarwar, G.Karypis, J.Konstan, J.Riedl, “Analysis of recommendation algorithms for E-commerce”, In: ACM Conference on Electronic Commerce, pp.158-167, 2000. [2] Matthew R. McLaughlin, Jonathan L. Herlocker, “Content-based filtering & collaborative filtering: A collaborative filtering algorithm and valuation metric that accurately model the user experience”, In Proc. Of the 27th ACM SIGIR Conf, pp. 329-336, 2004. [3] Goldberg D, Nichols D, Oki B M,et a1, “Using collaborative filtering to weave fin information Tapestry”, Communications of the ACM,vol. 35, no.12, pp. 61-70, 1992. [4] Herlocker L. J. , Konstan A. J. , Ried T. J., “Empirical analysis of design choices in neighborhood-based collaborative filtering algorithm”, Information Retrieval, vol. 5, no.4, pp. 287-310, 2002. [5] Breese J, Hecherman D, Kadie C, “Empirical analysis of predictive algorithms for collaborative filtering.”, In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI’98), pp:43-52, 1998. [6] Billsus D., and Pazzani M. J., “Learning Collaborative Information Filters”, In Proceedings of ICML ’98, pp. 46-53, 1998. [7] Zhou Jun Feng, Tang Xian, Guo Jing Feng, “An Optimized Collaborative Filtering Recommendation Algorithm”, Journal of Computer Research and Development, vol. 14, no.10, pp.1842-1847, 2004. [8] Zheng Xian Rong, Cao Xian Bin, “Research on Lineal Gradual Forgetting Collaborative Filtering Algorithm”, Computer Engineering, vol. 33, no.6, pp.72-75, 2007. [9] Wang Lan, Zhai Zheng Jun, “Collaborative filtering algorithm based on time weight”, Journal of Computer Applications, vol.27, no.9, pp.2302-2305, 2007. [10] G.Karypis, “Evaluation of item-based top-N recommendation algorithms,” in Proc. of CIKM 2001, pp. 247–254, 2001. [11] Herlocker L. J , Konstan A.J ,Terveen G. L ,et a1, “Evaluating collaborative filtering recommender systems”, ACM Transactionon Information Systems,vol. 22, no.1, pp. 5-53, 2004.

- 113 -