Mobile User Movement Prediction Using Bayesian ... - Geospatial World

Report 11 Downloads 51 Views
Mobile User Movement Prediction Using Bayesian Learning for Neural Networks Presented by: Sherif Akoush American University in Cairo

Map Middle East 2007

Outline       

Overview Motivations Objectives Background Information The Proposed System Experiments Questions

Overview 





Humans tend to act (move) according to some patterns Several machine learning techniques are available that can be extract patterns from historical data (Neural Networks – Bayesian Belief Networks) In wireless networks, users must have to access services while roaming  

Location Management Handoff Management

Motivations 

 

Extracting users’ movements patterns using a powerful machine learning technique to aid in location management How much predictable is our life? One of the most critical issues regarding wireless networks is mobility management (location and handoff management)

Objectives 



Apply Bayesian learning for Neural Networks in movement prediction  Hybrid machine learning model  First time in movement prediction Build a solid inference model that can be used in location management in wireless cellular networks  Efficient (intelligent) location management   



Services prediction 



Intelligent Decisions Better Bandwidth Utilization Reducing Update Cost

Quality of Service

Apply for other wireless networks (Ad-Hoc , Mobile IP, WiMax)

Wireless Cellular Networks VLR contains selected

A

administrative information from the HLR, necessary for call control and provision of the subscribed services, for each mobile currently located in the geographical area controlled by the VLR

AB HLR contains all the

1 MSC acts like a normal switching node, and in addition provides all the functionality needed to handle a mobile subscriber, including registration, authentication, location updating, inter-MSC handovers, and call routing to a roaming subscriber

B

C

2

D BTS houses the radio tranceivers that define a cell and handles the radio interface protocols with the mobile station. BSC manages the radio interface channels (setup, teardown, frequency hopping, etc.) as well as handovers

CD

administrative information of each subscriber registered in the corresponding GSM network, along with the current location of the subscriber

ABCD

The OMC provides remote monitoring of the network performance and permits remote reconfiguration and fault management activity as well as alarm and event monitoring.

Bur Dubai

Deira

Bur Dubai

Based on move ment prediction , check directly th e cell in which the user is expecte d to be

Neural Networks   







Neural networks are powerful models capable of learning (by examples) relationships in complex data sets Neural networks may be considered as a black box that produces output (prediction) for a given input How the output depends on the input is controlled by several parameters (weights) that is adjusted in the learning process of the network The learning process is able to adjust these weights so that to minimize the error between the expected (target) and the actual output of the input We set the model parameters such as number of hidden nodes, connections between the nodes, initial values of weights, learning rates, weight decay… After learning, the network should be able to generalize

Problems in Neural Networks 

The choice of model parameters is essentially arbitrary 

    

Trial-and-error and heuristic procedures

No meaningful semantics, that we could compare with our beliefs. Over-fitting No models comparison Single weight vector No confidence for output

Bayesian learning for Neural Networks 

Objective choice of type of weight decay terms or regularizes



Bayesian MLP returns theoretically all possible solutions and integrates them out (Average).  Solution for Overfitting – Underfitting problems. Prior knowledge represents what is our belief for the weights parameters before seeing the data.  Use of hyperparameters Infer what the free parameters might be, given the data Confidence intervals for the results (error bars) Objective models comparison



  

The Proposed Model

The Proposed Model

Experiments 

Input Data  





Data from Reality Mining Project at MIT. This dataset contains over 500,000 hours (~60 years) of continuous data on daily human behavior. The dataset has already been used by researchers in a wide range of fields (including epidemiology, sociology, physics, artificial intelligence, and organizational behavior). The dataset includes call logs, Bluetooth devices in proximity, cell tower IDs, application usage, and phone status (such as charging and idle).

Input Data (For Location Prediction) Cellspan Oid

Endtime

Starttime

person_oi d

Celltower_oid

1097401

7/26/2004 8:58:34 PM

7/26/2004 8:58:14 PM

29

38

1097402

7/26/2004 8:59:37 PM

7/26/2004 8:58:34 PM

29

42

1097403

7/26/2004 9:00:13 PM

7/26/2004 8:59:37 PM

29

40

1097404

7/26/2004 9:01:34 PM

7/26/2004 9:00:13 PM

29

1552

Cellname Oid

Name

person_oid

celltower_oid

643 ML

29

3393

644 Office

29

19290

647 Greg's apt

29

4377

648 Jon's apartment

29

3442

Input Data (For Services Prediction) Callspan oid

endtime

starttime

person_oid

description

direction

duration

370982

8/3/04 7:07 PM

8/3/04 7:07 PM

29

Packet Data

Outgoing

0

370986

8/3/04 7:07 PM

8/3/04 7:07 PM

29

Packet Data

Outgoing

0

370987

8/5/04 4:52 PM

8/5/04 4:40 PM

29

Voice Call

Outgoing

718

370991

8/6/04 10:37 PM

8/6/04 10:37 PM

29

Short Message

Incoming

0

370992

8/6/04 11:08 PM

8/6/04 11:06 PM

29

Voice Call

Outgoing

149

370993

8/6/04 11:54 PM

8/6/04 11:53 PM

29

Voice Call

Outgoing

15

371003

8/7/04 2:15 AM

8/7/04 2:15 AM

29

Voice Call

Outgoing

5

371004

8/7/04 2:20 AM

8/7/04 2:18 AM

29

Voice Call

Outgoing

119

371005

8/7/04 2:23 AM

8/7/04 2:23 AM

29

Short Message

Incoming

0

371006

8/7/04 4:06 AM

8/7/04 4:06 AM

29

Short Message

Outgoing

0

Training     

One month training The following month as a test Cluster cells with the same semantics 5 cells history Minute resolution

Compare Results With Other Neural Networks Models Hidden nodes

Epochs / No of Sampes

Cell History

Prediction Accuracy (exact)

Bayes 1

15

3400

5

24%

Bayes 2

25

2000

5

10%

Bayes 3

15

2000

0

12%

Bayes 4

15

1000

5

52%

Bayes 5

25

600

5

47%

Bayes 6

15

600

0

16%

Resilient1

15

25000

5

0.5%

Resilinet2

15

250000

5

1%

Levenberg-Marquadt

15

25000

5

0%

One Step Secant

15

500

5

0%

Elman / RP train

15

500

0

1%

Vary Number of Hidden Nodes Prediction Accuracy (Exact)



15 hidden nodes

45%

25 hidden nodes

47%

No big difference!

Vary Number of Samples



No of Samples

Prediction Accuracy

500

37%

1000

44%

1500

46%

2000

48%

2500

50%

3000

51%

3500

53%

Results show that generally prediction accuracy increases proportionally with the number of samples generated

Prediction Accuracy for Weekends vs. Weekdays Mon

Tues

Wed

Thurs

Fri

Sat

Sun

37%

49%

58%

61%

66%

61%

39%



Days towards the end of week are the highest predictable in contrast to the start of week where prediction is low

Test prediction accuracy for wider window



Window

Prediction

1st Month

52%

2nd Month

56%

3rd Month

44%

Results show that prediction accuracy remains almost the same for the 3 months, which strengthen the idea that humans tend to repeat their behaviors

Page 6 Neighbor Cells 

We tried to see the outcome of searching nearby cells in case of the system hasn't found the user in the predicted cell

Bayes 4



Window

Prediction Accuracy (exact)

Paging 6 Neighbor Cells

1st Month

52%

69%

2nd Month

56%

70%

3rd Month

44%

60%

Results are very promising. Prediction accuracy increases on average 65% in relative to checking only the one predicted cell

Services Prediction Starttime

Endtime

Service

Direction

8/3/2004 7:07:26 PM

8/3/2004 7:07:26 PM Packet Data

Outgoing

8/3/2004 7:07:26 PM

8/3/2004 7:07:26 PM Packet Data

Outgoing

8/3/2004 4:37:52 PM

8/3/2004 4:26:53 PM Voice Call

Outgoing

8/3/2004 7:07:26 PM

8/3/2004 7:07:26 PM Packet Data

Outgoing

8/5/2004 4:52:37 PM

8/5/2004 4:40:39 PM Voice Call

Outgoing

8/5/2004 9:02:34 PM

8/5/2004 9:02:30 PM Voice Call

Outgoing

8/6/2004 9:39:56 PM

8/6/2004 9:39:05 PM Voice Call

Outgoing

8/6/2004 10:37:18 PM

8/6/2004 10:37:18 PM Short Message

Window

Service Prediction

1st Month

48%

2nd Month

62%

3rd Month

93%

Incoming

Thank You Questions?

Finding the Posterior Distribution 



The posterior distribution for the model parameters given the observed data is found by combining the prior distribution with the likelihood for the parameters given the data. This is done using Bayes' Rule: Prior

X

Evidence

Likelihood

Finding the Posterior Distribution 

The denominator is just the required normalizing constant, and can often be filled in at the end, if necessary (evidence of the model). we can rewrite:  



P(parameters|data) ~ P(parameters)P(data|parameters) Posterior ~ Prior X Likelihood

We make predictions by integrating with respect to the posterior:

The Computational Challenge 



A big challenge in making Bayesian modeling work is computing the posterior distribution. We have to drawn random samples and average those samples. 



It is important that these samples represent the true posterior probability Markov Chains Monte Carlo Methods (MCMC) sampling methods

MCMCStuff Toolbox 





MCMCstuff toolbox is a collection of Matlab functions for Bayesian inference with Markov chain Monte Carlo (MCMC) methods. Some of the most computationally critical parts have been coded in C for faster computation Provides different sampling methods to implement MCMC such as Metropolis-Hastings sampling, Hybrid Monte Carlo sampling, Gibbs sampling and Reversible jump Markov chain Monte Carlo sampling.

The Proposed Model 

Network Sub-System.     

The network subsystem resides at the core of the wireless network. Bayesian Neural Network is used as the users’ movements prediction model. Historical information is used to train the model. Predict the next cell where the user is expected to be. We can also use this model to predict the services the users are expected to user in advance.

The Proposed Model 

Mobile Host Sub-System.  



An application that is installed on every mobile phone. The purpose of this application is to check if the predicted cell (predicted by the network subsystem) is the same as the actual cell. If the actual movement of the user agrees with the predicted scenario, nothing is done. Otherwise, the mobile station initiates a location update to the core network as the predicted cell is not the same as the actual cell.

Input Data Person Characteristics

Value

Phonenumber_oid

264

survey_Sick_Recently

No

survey_Position

Student

survey_Travel

Often - week/month

Survey_Data_Plan

Unlimited

survey_Provider

T-mobile

Survey_Calling_plan

National

survey_Minutes

1000*

survey_Texts

Very often

Survey_Like_Intros

Occasionally

survey_ML_Community

Very close

survey_Neighborhoo d

Boston

survey_Hours

10am-8pm*

survey_Regular

Somewhat

Survey_Hangouts

Restaurant/bar; friends

survey_Predictable_li fe

Very

survey_Forget_phon e

Never

survey_Run_out_of_ batteries

Rarely once/month

survey_How_often_g et_sick

occasionally (2-4 times a year)

Experiments 

     

Compare Results from Bayesian Neural Network with Results from traditional neural networks. (backpropagation learning) – Accuracy, Speed and Complexity. Vary number of hidden nodes. Vary number of samples generated by Markov Chain Monte Carlo methods (MCMC). Prediction accuracy for weekends against weekdays. Test prediction accuracy for wider window. Page 6 neighbor cells. Services prediction.

Wireless Networks 





Wireless mobile networks are flexible data communication systems which use wireless media to transmit and receive data over the air, minimizing the need for wired connections Ability to provide mobile users with access to services and resources while roaming  Location management  Handoff management Different types and architectures  Wireless Cellular Networks (2G – 3G)  Ad-Hoc networks  WiMax  Mobile IP

Neural Networks in Movement Prediction

1) Prediction-based location management using multilayer neural networks 

Predict the future location of the mobile object in wireless networks.   

  

Model with 3 layers (input-hidden-output). The number of neurons is chosen through experimentation or by trial. Example input: {p1, p2, p3} = {(d1, ds1), (d2, ds2), (d3, ds3)} = {(North, 2), (East, 1), (East, 3)}.  



93% prediction accuracy for uniform movement. 40% to 70% for regular movement. 2% to 30% for random movement patterns.

d is direction of movement. ds is distance traveled.

Output is the next direction and distance traveled.

2) A User Pattern Learning Strategy for Managing Users’ Mobility in UMTS Networks 

  

Present a user pattern learning strategy using neural networks to reduce the location update signaling cost by increasing the intelligence of the location procedure in 3G cellular networks. Artificial Neural Network model which learns the movement patterns of roaming users. This strategy associates to each user a list of cells where he is likely to be with a given probability in each time interval. Users are divided into three categories   

Users who have a very high probability of being where the system expects them to be. users who have a certain likelihood of being where the system expects them to be. Users whose position at a given moment is unpredictable.

3) Person Movement Prediction Using Neural Networks 





Location prediction of person movements in office building.  Next location prediction accuracy of 92%. NN model is composed of three layers (input-hidden-output).  Number of input neurons depends on the room history length which is optimally 2.  Number of neurons in the hidden layer is N+1, where N is the number of neurons in the input layer.  The output layer predicts the next location of the user. Tested prediction with other models: Bayesian Networks, Markov Models, Hidden Markov Models.  No model outperforms in all prediction cases.  Future research direction: use a hybrid model to improve prediction accuracy.

4) The Prediction of Bus Arrival Time Using Automatic Vehicle Location Systems Data   

 

Develop and apply a model to predict bus arrival time using Automatic Vehicle Location data. Identify the prediction interval of bus arrival time and the probability of a bus being on time. Inputs to the neural networks are:  Bus arrival time.  Dwell time.  Schedule adherence.  Traffic congestion. The optimal number of neurons in the Hidden layer is 15 according to experiments. Output of the neural net is the Prediction of the bus arrival time at the following stops.

Bayesian Belief Networks in Movements Prediction

1) Managing Uncertainty: Modeling Users in LocationTracking Applications 



  

This paper presents a user model based on his characteristics and preferences to facilitate more precise estimation of the user's location. The technique used can cut required bandwidth in half without losing any precision in location estimation compared with standard models. The location prediction is done using Bayesian Belief Network. The model is tested in predicting the location of taxicabs. variables that used in the model:    



Temporal variables represent when events occur: daymonth-year. Spatial variables represent possible user's locations: town-highway-building. Environmental variables represent things such as weather conditions, road conditions, and special events. Behavioral variables represent things such as typical speeds, resting patterns, preferred work areas, and common reactions in certain situations.

The user model proposed takes all these variables and builds a set of causality relationships among them.

2) Opportunity Knocks: a System to Provide Cognitive Assistance with Transportation Services  

This system is to aid below average cognitive people in public transportations. The proposed system is based on Hierarchical Dynamic Bayesian Network.    

Model time The top level estimates the current goal. The middle layer represents segments of a trip and mode of transportations. The lowest layer estimates the person’s location on the street map.

3) Modeling Transportation routines using Dynamic Mixed Networks 

This research describes the application of Hybrid Dynamic Mixed Networks to a real-world problem of inferring car travel activity of individuals. 





Hybrid Dynamic Mixed Network is extension of Dynamic Bayesian Network to include both discrete and continuous variables in the model.

The major query in the model is to predict where a traveler is likely to go and what his/her route to the destination is likely to be, given the current location of the traveler’s car. Extends the location prediction model with higher variables such as time of day and day of week affecting the user goal.

Movement Prediction 



Prediction attempts to form patterns from historical data that permit it to predict the next events given the available input data We are creatures of habits   

We follow regular routines We move with destination in mind Location Prediction based on historical movements   



Intelligent Decisions Better Bandwidth Utilization Reducing Update Cost

Services Prediction 

Quality of Service

The OMC provides remote monitoring of the network performance and permits remote reconfiguration and fault management activity as well as alarm and event monitoring.

BTS houses the radio tranceivers that define a cell and handles the radio interface protocols with the mobile station. BSC manages the radio interface channels (setup, teardown, frequency hopping, etc.) as well as handovers

HLR contains all the administrative information of each subscriber registered in the corresponding GSM network, along with the current location of the subscriber

MSC acts like a normal

VLR contains selected

switching node, and in addition provides all the functionality needed to handle a mobile subscriber, including registration, authentication, location updating, inter-MSC handovers, and call routing to a roaming subscriber

administrative information from the HLR, necessary for call control and provision of the subscribed services, for each mobile currently located in the geographical area controlled by the VLR

Add-Hoc Network  





Self-configuring network of mobile routers and associated hosts Distributed Architecture  No centralized Access-Point  Nodes might suddenly disappear from, or show up in, the network Dynamic Network Topology  The routers are free to move randomly and organize themselves at random Predict Future Network Topology based on the movements patterns of nodes

WiMax (Worldwide Interoperability for Microwave Access ) 

Description 





802.16e (Full Mobility)  



Wireless technology that provides highthroughput broadband connections over long distances 50 km at 70Mbit/s Location Management Handover Management

Next Cell Prediction

Mobile IP 





Provides an efficient, scalable mechanism for node mobility within the Internet Nodes may change their point-of-attachment to the Internet without changing their IP address Predict Next Point of Attachment based on the movements history of the node

Representing the Prior and Posterior Distributions by Samples 





The complex distributions we will often use as priors, or obtain as posteriors, may not be easily represented or understood using formulas. A very general technique is to represent a distribution by a sample of many values drawn randomly from it. We can then:  Visualize the distribution by viewing these sample values, or lowdimensional projections of them.  Make Monte Carlo estimates for probabilities or expectations with respect to the distribution, by taking averages over these sample values. Obtaining a sample from the prior is often easy. Obtaining a sample from the posterior is usually more difficult but this is nevertheless the dominant approach to Bayesian computation.

Priors 

Objective Priors 



Subjective Priors 



Priors should capture our beliefs as well as possible. They are subjective but not arbitrary

Hierarchical Priors 



Non-informative priors that attempt to capture ignorance

Multiple levels of priors

Empirical Priors 

Learn some of the parameters of the prior from the data (“Empirical Bayes")

The Challenge of Specifying Models and Priors 



The first challenge in making the Bayesian approach work is to choose a suitable model and prior. This can be especially difficult for the complex, high-dimensional problems that are traditional in machine learning.  A suitable model should encompass all the possibilities that are thought to be at all likely. Unrealistically limited forms of functions (eg, linear) or distributions should be avoided.  A suitable prior should avoid giving zero or tiny probability to real possibilities, but should also avoid spreading out the probability over all possibilities, however unrealistic. Unfortunately, the effort in doing a good job can easily get out of hand. One strategy is to introduce latent variables into the model, and hyperparameters into the prior. Both of these are devices for modeling dependencies in a tractable way.

Bayesian Methodology* 

  

We formulate our knowledge about the situation probabilistically:  We define a model that expresses qualitative aspects of our knowledge (eg, forms of distributions, independence assumptions). The model will have some unknown parameters.  We specify a prior probability distribution for these unknown parameters that expresses our beliefs about which values are more or less likely, before seeing the data. We gather data. We compute the posterior probability distribution for the parameters, given the observed data. We use this posterior distribution to:  Make predictions by averaging over the posterior distribution.

*Slides extracted from : Radford Neal. Tutorial: Bayesian Methods for Machine Learning. Neural Information Processing Systems Conference- 2004.

Inference at a Higher Level: Model Comparison 







So far, we've assumed we were able to start by making a definite choice of model. What if we're unsure which model is right? We can compare models based on the marginal likelihood (the evidence) for each model, which is the probability the model assigns to the observed data. This is the normalizing constant in Bayes' Rule that we previously ignored (the denominator in Bayes formula):

Here, M1 represents the condition that model M1 is the correct one (which previously we just assumed). Similarly, we can compute P(data | M2), for some other model (which may have a different parameter space). We might choose the model that gives higher probability to the data, or average predictions from both models with weights based on their marginal likelihood, multiplied by any prior preference we have for M1 versus M2.

Model Comparison 

Compare model classes, e.g. m and m’, using posterior probabilities given Data  



Model classes that are too simple are unlikely to generate the data set. Model classes that are too complex can generate many possible data sets, so again, they are unlikely to generate that particular data set at random.

We can also base prediction from different models not just from one model (weighted by their posterior probabilities)

Hyperparameters 



The priors we give to weights, such as wij ~ Gaussian(0; σu2) affect the nature of functions drawn from the network prior. Here are samples of three functions (of one input) drawn using Gaussian priors for networks of 1000 hidden units using two different σu2 A larger σu produces ”wigglier" functions. Usually, we won't know exactly how wiggly the function should be. So we make σu a variable hyperparameter, and give it a prior distribution that spans a few orders of magnitude.

Markov Chain Monte Carlo Sampling Methods  





In the MCMC, samples are generated using a Markov chain that has the desired posterior distribution as its stationary distribution. The strategy is to start with arbitrary values θ, let the Markov chain run until it has practically reached convergence, say after T iterations, and use the next k observed values of the chain as an approximate posterior sample A = { θ1, θ2..., θk}. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error.  Metropolis-Hastings sampling  Hybrid Monte Carlo sampling  Gibbs sampling  Reversible jump Markov chain Monte Carlo sampling MCMCStuff toolbox in Matlab