Mobile User Movement Prediction Using Bayesian Learning for Neural Networks Presented by: Sherif Akoush American University in Cairo
Map Middle East 2007
Outline
Overview Motivations Objectives Background Information The Proposed System Experiments Questions
Overview
Humans tend to act (move) according to some patterns Several machine learning techniques are available that can be extract patterns from historical data (Neural Networks – Bayesian Belief Networks) In wireless networks, users must have to access services while roaming
Location Management Handoff Management
Motivations
Extracting users’ movements patterns using a powerful machine learning technique to aid in location management How much predictable is our life? One of the most critical issues regarding wireless networks is mobility management (location and handoff management)
Objectives
Apply Bayesian learning for Neural Networks in movement prediction Hybrid machine learning model First time in movement prediction Build a solid inference model that can be used in location management in wireless cellular networks Efficient (intelligent) location management
Services prediction
Intelligent Decisions Better Bandwidth Utilization Reducing Update Cost
Quality of Service
Apply for other wireless networks (Ad-Hoc , Mobile IP, WiMax)
Wireless Cellular Networks VLR contains selected
A
administrative information from the HLR, necessary for call control and provision of the subscribed services, for each mobile currently located in the geographical area controlled by the VLR
AB HLR contains all the
1 MSC acts like a normal switching node, and in addition provides all the functionality needed to handle a mobile subscriber, including registration, authentication, location updating, inter-MSC handovers, and call routing to a roaming subscriber
B
C
2
D BTS houses the radio tranceivers that define a cell and handles the radio interface protocols with the mobile station. BSC manages the radio interface channels (setup, teardown, frequency hopping, etc.) as well as handovers
CD
administrative information of each subscriber registered in the corresponding GSM network, along with the current location of the subscriber
ABCD
The OMC provides remote monitoring of the network performance and permits remote reconfiguration and fault management activity as well as alarm and event monitoring.
Bur Dubai
Deira
Bur Dubai
Based on move ment prediction , check directly th e cell in which the user is expecte d to be
Neural Networks
Neural networks are powerful models capable of learning (by examples) relationships in complex data sets Neural networks may be considered as a black box that produces output (prediction) for a given input How the output depends on the input is controlled by several parameters (weights) that is adjusted in the learning process of the network The learning process is able to adjust these weights so that to minimize the error between the expected (target) and the actual output of the input We set the model parameters such as number of hidden nodes, connections between the nodes, initial values of weights, learning rates, weight decay… After learning, the network should be able to generalize
Problems in Neural Networks
The choice of model parameters is essentially arbitrary
Trial-and-error and heuristic procedures
No meaningful semantics, that we could compare with our beliefs. Over-fitting No models comparison Single weight vector No confidence for output
Bayesian learning for Neural Networks
Objective choice of type of weight decay terms or regularizes
Bayesian MLP returns theoretically all possible solutions and integrates them out (Average). Solution for Overfitting – Underfitting problems. Prior knowledge represents what is our belief for the weights parameters before seeing the data. Use of hyperparameters Infer what the free parameters might be, given the data Confidence intervals for the results (error bars) Objective models comparison
The Proposed Model
The Proposed Model
Experiments
Input Data
Data from Reality Mining Project at MIT. This dataset contains over 500,000 hours (~60 years) of continuous data on daily human behavior. The dataset has already been used by researchers in a wide range of fields (including epidemiology, sociology, physics, artificial intelligence, and organizational behavior). The dataset includes call logs, Bluetooth devices in proximity, cell tower IDs, application usage, and phone status (such as charging and idle).
Input Data (For Location Prediction) Cellspan Oid
Endtime
Starttime
person_oi d
Celltower_oid
1097401
7/26/2004 8:58:34 PM
7/26/2004 8:58:14 PM
29
38
1097402
7/26/2004 8:59:37 PM
7/26/2004 8:58:34 PM
29
42
1097403
7/26/2004 9:00:13 PM
7/26/2004 8:59:37 PM
29
40
1097404
7/26/2004 9:01:34 PM
7/26/2004 9:00:13 PM
29
1552
Cellname Oid
Name
person_oid
celltower_oid
643 ML
29
3393
644 Office
29
19290
647 Greg's apt
29
4377
648 Jon's apartment
29
3442
Input Data (For Services Prediction) Callspan oid
endtime
starttime
person_oid
description
direction
duration
370982
8/3/04 7:07 PM
8/3/04 7:07 PM
29
Packet Data
Outgoing
0
370986
8/3/04 7:07 PM
8/3/04 7:07 PM
29
Packet Data
Outgoing
0
370987
8/5/04 4:52 PM
8/5/04 4:40 PM
29
Voice Call
Outgoing
718
370991
8/6/04 10:37 PM
8/6/04 10:37 PM
29
Short Message
Incoming
0
370992
8/6/04 11:08 PM
8/6/04 11:06 PM
29
Voice Call
Outgoing
149
370993
8/6/04 11:54 PM
8/6/04 11:53 PM
29
Voice Call
Outgoing
15
371003
8/7/04 2:15 AM
8/7/04 2:15 AM
29
Voice Call
Outgoing
5
371004
8/7/04 2:20 AM
8/7/04 2:18 AM
29
Voice Call
Outgoing
119
371005
8/7/04 2:23 AM
8/7/04 2:23 AM
29
Short Message
Incoming
0
371006
8/7/04 4:06 AM
8/7/04 4:06 AM
29
Short Message
Outgoing
0
Training
One month training The following month as a test Cluster cells with the same semantics 5 cells history Minute resolution
Compare Results With Other Neural Networks Models Hidden nodes
Epochs / No of Sampes
Cell History
Prediction Accuracy (exact)
Bayes 1
15
3400
5
24%
Bayes 2
25
2000
5
10%
Bayes 3
15
2000
0
12%
Bayes 4
15
1000
5
52%
Bayes 5
25
600
5
47%
Bayes 6
15
600
0
16%
Resilient1
15
25000
5
0.5%
Resilinet2
15
250000
5
1%
Levenberg-Marquadt
15
25000
5
0%
One Step Secant
15
500
5
0%
Elman / RP train
15
500
0
1%
Vary Number of Hidden Nodes Prediction Accuracy (Exact)
15 hidden nodes
45%
25 hidden nodes
47%
No big difference!
Vary Number of Samples
No of Samples
Prediction Accuracy
500
37%
1000
44%
1500
46%
2000
48%
2500
50%
3000
51%
3500
53%
Results show that generally prediction accuracy increases proportionally with the number of samples generated
Prediction Accuracy for Weekends vs. Weekdays Mon
Tues
Wed
Thurs
Fri
Sat
Sun
37%
49%
58%
61%
66%
61%
39%
Days towards the end of week are the highest predictable in contrast to the start of week where prediction is low
Test prediction accuracy for wider window
Window
Prediction
1st Month
52%
2nd Month
56%
3rd Month
44%
Results show that prediction accuracy remains almost the same for the 3 months, which strengthen the idea that humans tend to repeat their behaviors
Page 6 Neighbor Cells
We tried to see the outcome of searching nearby cells in case of the system hasn't found the user in the predicted cell
Bayes 4
Window
Prediction Accuracy (exact)
Paging 6 Neighbor Cells
1st Month
52%
69%
2nd Month
56%
70%
3rd Month
44%
60%
Results are very promising. Prediction accuracy increases on average 65% in relative to checking only the one predicted cell
Services Prediction Starttime
Endtime
Service
Direction
8/3/2004 7:07:26 PM
8/3/2004 7:07:26 PM Packet Data
Outgoing
8/3/2004 7:07:26 PM
8/3/2004 7:07:26 PM Packet Data
Outgoing
8/3/2004 4:37:52 PM
8/3/2004 4:26:53 PM Voice Call
Outgoing
8/3/2004 7:07:26 PM
8/3/2004 7:07:26 PM Packet Data
Outgoing
8/5/2004 4:52:37 PM
8/5/2004 4:40:39 PM Voice Call
Outgoing
8/5/2004 9:02:34 PM
8/5/2004 9:02:30 PM Voice Call
Outgoing
8/6/2004 9:39:56 PM
8/6/2004 9:39:05 PM Voice Call
Outgoing
8/6/2004 10:37:18 PM
8/6/2004 10:37:18 PM Short Message
Window
Service Prediction
1st Month
48%
2nd Month
62%
3rd Month
93%
Incoming
Thank You Questions?
Finding the Posterior Distribution
The posterior distribution for the model parameters given the observed data is found by combining the prior distribution with the likelihood for the parameters given the data. This is done using Bayes' Rule: Prior
X
Evidence
Likelihood
Finding the Posterior Distribution
The denominator is just the required normalizing constant, and can often be filled in at the end, if necessary (evidence of the model). we can rewrite:
P(parameters|data) ~ P(parameters)P(data|parameters) Posterior ~ Prior X Likelihood
We make predictions by integrating with respect to the posterior:
The Computational Challenge
A big challenge in making Bayesian modeling work is computing the posterior distribution. We have to drawn random samples and average those samples.
It is important that these samples represent the true posterior probability Markov Chains Monte Carlo Methods (MCMC) sampling methods
MCMCStuff Toolbox
MCMCstuff toolbox is a collection of Matlab functions for Bayesian inference with Markov chain Monte Carlo (MCMC) methods. Some of the most computationally critical parts have been coded in C for faster computation Provides different sampling methods to implement MCMC such as Metropolis-Hastings sampling, Hybrid Monte Carlo sampling, Gibbs sampling and Reversible jump Markov chain Monte Carlo sampling.
The Proposed Model
Network Sub-System.
The network subsystem resides at the core of the wireless network. Bayesian Neural Network is used as the users’ movements prediction model. Historical information is used to train the model. Predict the next cell where the user is expected to be. We can also use this model to predict the services the users are expected to user in advance.
The Proposed Model
Mobile Host Sub-System.
An application that is installed on every mobile phone. The purpose of this application is to check if the predicted cell (predicted by the network subsystem) is the same as the actual cell. If the actual movement of the user agrees with the predicted scenario, nothing is done. Otherwise, the mobile station initiates a location update to the core network as the predicted cell is not the same as the actual cell.
Input Data Person Characteristics
Value
Phonenumber_oid
264
survey_Sick_Recently
No
survey_Position
Student
survey_Travel
Often - week/month
Survey_Data_Plan
Unlimited
survey_Provider
T-mobile
Survey_Calling_plan
National
survey_Minutes
1000*
survey_Texts
Very often
Survey_Like_Intros
Occasionally
survey_ML_Community
Very close
survey_Neighborhoo d
Boston
survey_Hours
10am-8pm*
survey_Regular
Somewhat
Survey_Hangouts
Restaurant/bar; friends
survey_Predictable_li fe
Very
survey_Forget_phon e
Never
survey_Run_out_of_ batteries
Rarely once/month
survey_How_often_g et_sick
occasionally (2-4 times a year)
Experiments
Compare Results from Bayesian Neural Network with Results from traditional neural networks. (backpropagation learning) – Accuracy, Speed and Complexity. Vary number of hidden nodes. Vary number of samples generated by Markov Chain Monte Carlo methods (MCMC). Prediction accuracy for weekends against weekdays. Test prediction accuracy for wider window. Page 6 neighbor cells. Services prediction.
Wireless Networks
Wireless mobile networks are flexible data communication systems which use wireless media to transmit and receive data over the air, minimizing the need for wired connections Ability to provide mobile users with access to services and resources while roaming Location management Handoff management Different types and architectures Wireless Cellular Networks (2G – 3G) Ad-Hoc networks WiMax Mobile IP
Neural Networks in Movement Prediction
1) Prediction-based location management using multilayer neural networks
Predict the future location of the mobile object in wireless networks.
Model with 3 layers (input-hidden-output). The number of neurons is chosen through experimentation or by trial. Example input: {p1, p2, p3} = {(d1, ds1), (d2, ds2), (d3, ds3)} = {(North, 2), (East, 1), (East, 3)}.
93% prediction accuracy for uniform movement. 40% to 70% for regular movement. 2% to 30% for random movement patterns.
d is direction of movement. ds is distance traveled.
Output is the next direction and distance traveled.
2) A User Pattern Learning Strategy for Managing Users’ Mobility in UMTS Networks
Present a user pattern learning strategy using neural networks to reduce the location update signaling cost by increasing the intelligence of the location procedure in 3G cellular networks. Artificial Neural Network model which learns the movement patterns of roaming users. This strategy associates to each user a list of cells where he is likely to be with a given probability in each time interval. Users are divided into three categories
Users who have a very high probability of being where the system expects them to be. users who have a certain likelihood of being where the system expects them to be. Users whose position at a given moment is unpredictable.
3) Person Movement Prediction Using Neural Networks
Location prediction of person movements in office building. Next location prediction accuracy of 92%. NN model is composed of three layers (input-hidden-output). Number of input neurons depends on the room history length which is optimally 2. Number of neurons in the hidden layer is N+1, where N is the number of neurons in the input layer. The output layer predicts the next location of the user. Tested prediction with other models: Bayesian Networks, Markov Models, Hidden Markov Models. No model outperforms in all prediction cases. Future research direction: use a hybrid model to improve prediction accuracy.
4) The Prediction of Bus Arrival Time Using Automatic Vehicle Location Systems Data
Develop and apply a model to predict bus arrival time using Automatic Vehicle Location data. Identify the prediction interval of bus arrival time and the probability of a bus being on time. Inputs to the neural networks are: Bus arrival time. Dwell time. Schedule adherence. Traffic congestion. The optimal number of neurons in the Hidden layer is 15 according to experiments. Output of the neural net is the Prediction of the bus arrival time at the following stops.
Bayesian Belief Networks in Movements Prediction
1) Managing Uncertainty: Modeling Users in LocationTracking Applications
This paper presents a user model based on his characteristics and preferences to facilitate more precise estimation of the user's location. The technique used can cut required bandwidth in half without losing any precision in location estimation compared with standard models. The location prediction is done using Bayesian Belief Network. The model is tested in predicting the location of taxicabs. variables that used in the model:
Temporal variables represent when events occur: daymonth-year. Spatial variables represent possible user's locations: town-highway-building. Environmental variables represent things such as weather conditions, road conditions, and special events. Behavioral variables represent things such as typical speeds, resting patterns, preferred work areas, and common reactions in certain situations.
The user model proposed takes all these variables and builds a set of causality relationships among them.
2) Opportunity Knocks: a System to Provide Cognitive Assistance with Transportation Services
This system is to aid below average cognitive people in public transportations. The proposed system is based on Hierarchical Dynamic Bayesian Network.
Model time The top level estimates the current goal. The middle layer represents segments of a trip and mode of transportations. The lowest layer estimates the person’s location on the street map.
3) Modeling Transportation routines using Dynamic Mixed Networks
This research describes the application of Hybrid Dynamic Mixed Networks to a real-world problem of inferring car travel activity of individuals.
Hybrid Dynamic Mixed Network is extension of Dynamic Bayesian Network to include both discrete and continuous variables in the model.
The major query in the model is to predict where a traveler is likely to go and what his/her route to the destination is likely to be, given the current location of the traveler’s car. Extends the location prediction model with higher variables such as time of day and day of week affecting the user goal.
Movement Prediction
Prediction attempts to form patterns from historical data that permit it to predict the next events given the available input data We are creatures of habits
We follow regular routines We move with destination in mind Location Prediction based on historical movements
Intelligent Decisions Better Bandwidth Utilization Reducing Update Cost
Services Prediction
Quality of Service
The OMC provides remote monitoring of the network performance and permits remote reconfiguration and fault management activity as well as alarm and event monitoring.
BTS houses the radio tranceivers that define a cell and handles the radio interface protocols with the mobile station. BSC manages the radio interface channels (setup, teardown, frequency hopping, etc.) as well as handovers
HLR contains all the administrative information of each subscriber registered in the corresponding GSM network, along with the current location of the subscriber
MSC acts like a normal
VLR contains selected
switching node, and in addition provides all the functionality needed to handle a mobile subscriber, including registration, authentication, location updating, inter-MSC handovers, and call routing to a roaming subscriber
administrative information from the HLR, necessary for call control and provision of the subscribed services, for each mobile currently located in the geographical area controlled by the VLR
Add-Hoc Network
Self-configuring network of mobile routers and associated hosts Distributed Architecture No centralized Access-Point Nodes might suddenly disappear from, or show up in, the network Dynamic Network Topology The routers are free to move randomly and organize themselves at random Predict Future Network Topology based on the movements patterns of nodes
WiMax (Worldwide Interoperability for Microwave Access )
Description
802.16e (Full Mobility)
Wireless technology that provides highthroughput broadband connections over long distances 50 km at 70Mbit/s Location Management Handover Management
Next Cell Prediction
Mobile IP
Provides an efficient, scalable mechanism for node mobility within the Internet Nodes may change their point-of-attachment to the Internet without changing their IP address Predict Next Point of Attachment based on the movements history of the node
Representing the Prior and Posterior Distributions by Samples
The complex distributions we will often use as priors, or obtain as posteriors, may not be easily represented or understood using formulas. A very general technique is to represent a distribution by a sample of many values drawn randomly from it. We can then: Visualize the distribution by viewing these sample values, or lowdimensional projections of them. Make Monte Carlo estimates for probabilities or expectations with respect to the distribution, by taking averages over these sample values. Obtaining a sample from the prior is often easy. Obtaining a sample from the posterior is usually more difficult but this is nevertheless the dominant approach to Bayesian computation.
Priors
Objective Priors
Subjective Priors
Priors should capture our beliefs as well as possible. They are subjective but not arbitrary
Hierarchical Priors
Non-informative priors that attempt to capture ignorance
Multiple levels of priors
Empirical Priors
Learn some of the parameters of the prior from the data (“Empirical Bayes")
The Challenge of Specifying Models and Priors
The first challenge in making the Bayesian approach work is to choose a suitable model and prior. This can be especially difficult for the complex, high-dimensional problems that are traditional in machine learning. A suitable model should encompass all the possibilities that are thought to be at all likely. Unrealistically limited forms of functions (eg, linear) or distributions should be avoided. A suitable prior should avoid giving zero or tiny probability to real possibilities, but should also avoid spreading out the probability over all possibilities, however unrealistic. Unfortunately, the effort in doing a good job can easily get out of hand. One strategy is to introduce latent variables into the model, and hyperparameters into the prior. Both of these are devices for modeling dependencies in a tractable way.
Bayesian Methodology*
We formulate our knowledge about the situation probabilistically: We define a model that expresses qualitative aspects of our knowledge (eg, forms of distributions, independence assumptions). The model will have some unknown parameters. We specify a prior probability distribution for these unknown parameters that expresses our beliefs about which values are more or less likely, before seeing the data. We gather data. We compute the posterior probability distribution for the parameters, given the observed data. We use this posterior distribution to: Make predictions by averaging over the posterior distribution.
*Slides extracted from : Radford Neal. Tutorial: Bayesian Methods for Machine Learning. Neural Information Processing Systems Conference- 2004.
Inference at a Higher Level: Model Comparison
So far, we've assumed we were able to start by making a definite choice of model. What if we're unsure which model is right? We can compare models based on the marginal likelihood (the evidence) for each model, which is the probability the model assigns to the observed data. This is the normalizing constant in Bayes' Rule that we previously ignored (the denominator in Bayes formula):
Here, M1 represents the condition that model M1 is the correct one (which previously we just assumed). Similarly, we can compute P(data | M2), for some other model (which may have a different parameter space). We might choose the model that gives higher probability to the data, or average predictions from both models with weights based on their marginal likelihood, multiplied by any prior preference we have for M1 versus M2.
Model Comparison
Compare model classes, e.g. m and m’, using posterior probabilities given Data
Model classes that are too simple are unlikely to generate the data set. Model classes that are too complex can generate many possible data sets, so again, they are unlikely to generate that particular data set at random.
We can also base prediction from different models not just from one model (weighted by their posterior probabilities)
Hyperparameters
The priors we give to weights, such as wij ~ Gaussian(0; σu2) affect the nature of functions drawn from the network prior. Here are samples of three functions (of one input) drawn using Gaussian priors for networks of 1000 hidden units using two different σu2 A larger σu produces ”wigglier" functions. Usually, we won't know exactly how wiggly the function should be. So we make σu a variable hyperparameter, and give it a prior distribution that spans a few orders of magnitude.
Markov Chain Monte Carlo Sampling Methods
In the MCMC, samples are generated using a Markov chain that has the desired posterior distribution as its stationary distribution. The strategy is to start with arbitrary values θ, let the Markov chain run until it has practically reached convergence, say after T iterations, and use the next k observed values of the chain as an approximate posterior sample A = { θ1, θ2..., θk}. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. Metropolis-Hastings sampling Hybrid Monte Carlo sampling Gibbs sampling Reversible jump Markov chain Monte Carlo sampling MCMCStuff toolbox in Matlab