I Know where you tweet - Semantic Scholar

Report 5 Downloads 13 Views
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin Chen-chuan Chang University of Illinois at Urbana and Champaign

User profiling infers users’ essential attributes and is important for many services.

Search Engines

Personalized Search

User

Advertisers Richard Job: Student Location: Champaign and many others.

Targeted Advertisement

This paper aims to profile Twitter users’ home locations from both Tweets and Following Network Output Profiling a User’s Home Location Location: Champaign

Input

A user’s home location is defined as the place most his activities happen. It is different from a real-time geo position (e.g., Starbucks at green street)

In Context of Twitter Network User Centric Data (Tweets)

Social Network Data (Following network)

Lady Gaga

TechChruch

Richard

Cindy

Rob

Jessie

The problem is difficult due to scarce signal challenge

Tweets

Following Network

Lady Gaga New York

TechChruch Unknown Only 6% messages contains location related terms!

Richard

Rob Unknown

Cindy

Jessie Champaign

San Francisco

Only 16% users have locations on their profiles!

The problem is difficult due to noisy signal challenge

Tweets

Following Network

Lady Gaga New York

TechChruch Unknown A user tweets about locations different from his home location.

Richard

Rob Unknown

Cindy

Jessie Champaign

San Francisco

User follows friends who live different locations from his home location.

We propose a unified and discriminative probabilistic framework.

Scarce Signal Challenge

Noisy Signal Challenge

Unify two types of resources as a twitter graph

Model the likelihood of an edge between two nodes via a discriminative Influence model

Profile locations via maximizing the likelihood of observing the graph.

We unify two types of resources as a Directed Heterogeneous Graph  We unify two types of resources as Head Node nodes on a heterogeneous graph New York  We model it as a directed graph. ? u2 Tail Node  We associate locations to the nodes. Beijing U6 v1 ?  We aim to infer the locations of u1 unlabeled nodes with locations of Champaign Champaign ? labeled nodes. v2 u3

Unlabeled Node

u5

San Francisco u4 labeled Node

We observe two key characteristics for the probability of an edge between two nodes How likely a tail node nj at L(nj) builds an edge e a head node ni at L(ni) Spread of Word "Champaign"

500 450 400 350

count

300 250 200 150 100 50 0 110 45

100 40

90 35

80 70 longitude

30

latitude

Observation 1 The probability decreases as their distance increases

Observation 2 At the same distance, different head (Chicago, Champaign) nodes have different probabilities to attract tail nodes.

We propose a discriminative influence model to capture the two key characteristics Conceptual level  Discriminative Influence Model θni  Influence probabilities decrease from the center.  Different nodes have different influence scope. Mathematical Level  Gaussian Model 1 P(e  n j , n i | θ n , L(n i ))  e 2π n 2

i

i

(x u i  x uj ) 2  (y u i  y uj ) 2  2π n i 2

A local profiling algorithm profiles the location of a user via the edges from and to his labeled neighbors.

 simple but efficient  closed-from solution.

New York Beijing

v1

?

Average Distance of a User’ s Followers

Influence Scope

u2

Champaign

u1

Champaign

v2 User Location Weighted Average of Different Resources

u5 San Francisco u4

A global algorithm profiles all the users’ locations together via all the edges in the graph.

The local algorithm only uses limited information.

Our global algorithm aims to use all information.  complex but accurate  iterative algorithm.

Beijing v1

New York u2

Champaign ? v2 u3

? u1

? U6 Champaign u5

San Francisco u4

We incorporate additional knowledge as constraints for maximizing the likelihood function.

Additional Knowledge: e.g., users only live in cities or towns

Constraint Optimization: we maximize the likelihood in each method under constraints.

We compare our method with the-state-of-arts methods on a large Twitter corpus.  Data Set:  We crawled a subset of Twitter.  We used the users having locations on profiles.  There are 139K users, 50 million tweets and 2 million following relationships.  Methods:  User-based Location Profiling  Content-based Location Profiling

Our algorithms are better than the baseline methods as we model edges discriminatively.

Our algorithms can take advantages of modeling two different types of resources

The global profiling algorithm can further improve the local profiling algorithm.

Conclusion and Future work  We explore both social network and user-centric data for profiling users locations in a unified approach.  We introduce a discriminative influence model.  We develop two effective profiling methods and extend the methods via modeling constraints.  The framework could be further extended to profiling other attributes.

Questions?