Star Quality: Aggregating Reviews to Rank Products and Merchants

Report 9 Downloads 16 Views
STAR QUALITY: AGGREGATING REVIEWS TO RANK PRODUCTS AND MERCHANTS Mary McGlohon (CMU/Google), Natalie Glance (Google), Zach Reiter (Google)

Reviews for product Sort by average rating

Reviews for merchant

The problem 

Given reviews aggregated from different sources (Amazon, Epinions, etc.):  How

to measure “true quality” of a product or merchant?  Can we do better than “average number of stars?”  How can we tell if we’re doing better?

The challenges 

Different sources have different review scales  0-5



stars, 1-5 stars, 0-100%, A/B/C…

Different sources have different rating distributions  “Rant



sites” and “Rave sites”

Reviews may be plagiarized or irrelevant  This

happens a lot. [Danescu-Niculescu-Mizil+ 2009]

Outline    

Analyze ratings aggregated from many review sites Propose models to determine “true quality” Build evaluation framework Compare results

Outline    

Analyze ratings aggregated from many review sites Propose models to determine “true quality” Build evaluation framework Compare results

Data 

Product Reviews 



Merchant Reviews 



8M ratings (560K products, 3.8M authors, 230 sources)

1.5M ratings (17K merchants, 1.1M authors, 19 sources)

Netflix Prize 

100M ratings (17K movies, 480K authors)

Obs 1: People like passing out 5’s 

Product reviews: Single-review authors disproportionately more so

Count

Number of stars

Obs 2: Authors/sources have biases  

Ratings for the same product will differ widely. Authors are consistent across products.  (Like



everything or hate everything.)

But… even different authors on the same site rate objects similarly!  Not

just “rant sites”  ReviewCentre.com avg 2.9*, Pricegrabber.com 4.5*

Obs 3: The rated object matters  Products

Merchants

Merchant reviews more “binary”

Movies

Netflix more “normal”

Obs 4: How much an object is rated matters 

Netflix:

Movies with O(10^5) ratings (red) were 63% positive (4-5*)

Movies with O(10^3) ratings (yellow) were 42% positive (4-5*)

Outline    

Analyze ratings aggregated from many review sites Propose models to rank “true quality” Build evaluation framework Compare results

Proposed Models 

1. Mean rating for object (baseline)  “On



2. Median rating for object  “The



average, users gave it 3.8 stars” q=3.8 middle rating was 4 stars”, q=4

3. Lower bound on normal confidence interval  “95%



sure that it’s at least 3.5*”, q=3.5

4. Lower bound on binomial confidence interval  “95%

sure that at least 60% of users will like it”, q=.6

Proposed Models 

5. Average percentile of order statistic  “Most



websites liked it better than other products”

6. Filtering anonymous reviews, then average  “Anonymous



7. Filtering non-prolific authors, then average  “N00bs



people are spammy”

are dumb”

8. Reweighting authors by “reliability”  “Account

for author bias”

Outline    

Analyze ratings aggregated from many review sites Propose models to determine “true quality” Build evaluation framework Compare results

Evaluation Method  



There’s no “ground truth” for quality. Goal: to see how reliably our ranking of “true quality” q^i agrees with user preferences. Idea: Hold out a pair of ratings from same author.  Does

our ranking of the objects (based on other people’s ratings) agree with theirs?

Toy Example Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

Step 1: For every “prolific” author, pick a pair to hold out for test set. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

Step 1: For every “prolific” author, pick a pair to hold out for test set. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

“prolific:” n>2 for Alice and Danielle

Step 1: For every “prolific” author, pick a pair to hold out for test set. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

Choose 2 random reviews from each

Step 2: The rest are training data. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

¥

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

Step 2: The rest are training data. Training set Object oi

Author aj

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

Step 3: In training data, calculate q^i , rank objects accordingly. Training set

Object oi

Author aj

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

Step 3: In training data, calculate q^i , rank objects accordingly. Training set

Object oi

Author aj

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Splosions™ Chem Set Danielle

oi Nerf Halo

Author Rating r(oi,aj)

Splosions Robo

q^i

Step 3: In training data, calculate q^i , rank objects accordingly. Training set

Object oi

Author aj

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Splosions™ Chem Set Danielle

oi

q^i

Nerf Halo

Author Rating r(oi,aj)

Splosions Robo

Here, qi is average rating in training data

3.0

Step 3: In training data, calculate q^i , rank objects accordingly. Training set

Object oi

Author aj

Splosions™ Chem Set Alice Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Splosions™ Chem Set Danielle

Author Rating r(oi,aj)

Our ranking oi

q^i

Nerf

5.0

Halo

4.5

Splosions

3.0

Robo

3.0

Step 3: In training data, calculate q^i , rank objects accordingly.

Our ranking oi

q^i

Nerf

5.0

Halo

4.5

Splosions

3.0

Robo

3.0

Step 4: Compare our ranking with ranking in each pair in test data. Test set

Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Robo-raptor

Danielle

Halo 2

Danielle

Author Rating r(oi,aj)

Our ranking oi

q^i

Nerf

5.0

Halo

4.5

Splosions

3.0

Robo

3.0

In test set, Alice says Robo MISCLASSIFICATIONOur ranking claims Nerf outranks Nerf.  outranks Robo.

Step 4: Compare our ranking with ranking in each pair in test data. Test set

Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Halo 2

Danielle

Robo-Raptor

Danielle

Author Rating r(oi,aj)

CORRECT CLASSIFICATION In test set, Danielle says Halo  outranks Robo.

Our ranking oi

q^i

Nerf

5.0

Halo

4.5

Splosions

3.0

Robo

3.0

Our ranking claims Halo outranks Robo.

80 70 Accuracy (%)

60 50 40 30 20 10 0

Nothing significantly outperformed average rating

Products Merchants Netflix

Potential improvements 

Leveraging other review features  Careful

selection of sources for reliability and bias  Observe longitudinal user behavior  Timestamps  

Cleaning data for plagiarism and spam Leveraging other data sources  Better

Business Bureau, etc.

Conclusion    

User behavior in reviews follows interesting patterns Proposed diverse set of ranking systems. Devised evaluation methodology. Outperforming the average may be more nuanced than we thought.

Experiment  

“Prolific” author = 100 or more reviews One pair from each prolific author

Step 1: Calculate qi for each object, rank. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set

Alice

Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem

Danielle

Author Rating r(oi,aj)

Step 1: Calculate qi for each object, rank.

oi Robo Nerf

Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set

Alice

Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem

Danielle

Author Rating r(oi,aj)

Splosions Halo Halo

qi

Step 1: Calculate qi for each object, rank. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set

Alice

Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem

Danielle

Author Rating r(oi,aj)

oi

qi

Robo

3.3 3

Nerf Splosions Halo Halo

Step 1: Calculate qi for each object, rank. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set

Alice

Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem

Danielle

Author Rating r(oi,aj)

oi

qi

Robo

3.3 3

Nerf

4.5

Splosions Halo Halo

Step 1: Calculate qi for each object, rank. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set

Alice

Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem

Danielle

Author Rating r(oi,aj)

oi

qi

Robo

3.3 3

Nerf

4.5

Splosions

3.0

Halo

5.0

Halo

3.5

Step 1: Calculate qi for each object, rank. Object oi

Author aj

Author Rating r(oi,aj)

oi

qi

Halo

5.0

Nerf

4.5

Halo

3.5

Robo-raptor

Alice

Robo

Nerf MK40

Alice

3.3 3

Splosions

3.0

Splosions™ Chem Set

Alice

Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem

Danielle

Step 1: For every “prolific” author, pick pairs to hold out for test set. Object oi

Author aj

Robo-raptor

Alice

Nerf MK40

Alice

Splosions™ Chem Set

Alice

Halo 2

Alice

Robo-raptor

Bob

Nerf MK40

Charlie

Halo 2

Charlie

Robo-raptor

Danielle

Halo 2

Danielle

Splosions™ Chem

Danielle

Author Rating r(oi,aj)