Churn models for prepaid customers in the cellular telecommunication ...

Report 3 Downloads 23 Views
Expert Systems with Applications 37 (2010) 4710–4712

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Short Communication

Churn models for prepaid customers in the cellular telecommunication industry using large data marts Marcin Owczarczuk Institute of Econometrics, Warsaw School of Economics Al. Niepodleglosci 164, 02-554 Warsaw, Poland

a r t i c l e

i n f o

Keywords: Churn prediction Retention Wireless Cellular CRM

a b s t r a c t In this article, we test the usefulness of the popular data mining models to predict churn of the clients of the Polish cellular telecommunication company. When comparing to previous studies on this topic, our research is novel in the following areas: (1) we deal with prepaid clients (previous studies dealt with postpaid clients) who are far more likely to churn, are less stable and much less is known about them (no application, demographical or personal data), (2) we have 1381 potential variables derived from the clients’ usage (previous studies dealt with data with at least tens of variables) and (3) we test the stability of models across time for all the percentiles of the lift curve – our test sample is collected six months after the estimation of the model. The main finding from our research is that linear models, especially logistic regression, are a very good choice when modelling churn of the prepaid clients. Decision trees are unstable in high percentiles of the lift curve, and we do not recommend their usage. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction 1.1. The need of churn models In telecommunication companies, the retention of customers is one of the key activities of the CRM (customer relationship management) departments. The CRM actions are based on the direct communication to the customer, for example, via sms or direct call. When communicating, certain services are proposed in order to make a customer stay. The following sms may be the illustrative example ‘‘make a least 30 PLN value recharge during next 7 days and you will receive additional 10 PLN for calls”.1 When the offer is accepted, the company has certain cost associated with the bonus, but also the profit generated by the recharge. In addition, the ‘‘life” of the customer extends by tens of days, which is the usual time of spending the recharge and the bonus. Of course, there is a natural question: which clients should be the target group of such marketing actions. These actions should not be addressed to loyal customers who would make a recharge anyway, because it generates only loss associated with the bonus. On the contrary, customers who are likely to churn, may change their mind after receiving such message. So it is important to predict which customers are likely to churn in the near future and address a marketing message only to them. The problem of churn prediction, regardless on the economy sector, is well documented, see for example Ngai, Xiu, and Chau

(2009) for the overview. As far as churn in the cellular telecommunication industry is concerned, see for example Pendharkar (2009), Wei and Chiu (2002), Hung, Yen, and Wang (2006). In these papers, data is collected on contractual customers. This sector is called postpaid. The churn there is well defined. If the client wants to churn, he or she has to sign a proper document, usually in advance of a month. Also, much is known about such customers: personal data like age, gender and address. We have also information about their call direct records (cdr), and we may derive additional variables from cdr like average minutes of usage, etc. 1.2. Prepaid customers In this article, we deal with a different type of clients, that is prepaid. In our opinion, modelling prepaid is far more challenging. Prepaid clients do not sign any contract and are anonymous. So we do not have any personal data about them. All is known is their tariff and usage derived from cdr. Prepaid customers do not pay the monthly subscription fee and their usage is less regular. We also do not have a strict definition of churn. Of course, there is a term called the expiration of the SIM card, but in our opinion, it is not a good definition of churn. This problem is described in the next subsection, and the description is based on the Polish cellular telecommunication market. 1.3. Churn definition

1

E-mail address: [email protected] PLN is the abbreviation of the Polish currency unit.

0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.11.083

When a prepaid customer makes a recharge, he or she is able to make outgoing calls during the certain period of time. In Poland,

M. Owczarczuk / Expert Systems with Applications 37 (2010) 4710–4712

when the value of the recharge is equal 30 PLN, this period is usually equal to 1 month (or 30 days depending on the brand). After that period, a customer is able only to receive calls, also during certain period of time (usually 365 days, depending on the brand and the value of the recharge). After that period, the SIM card is deactivated. Of course, until the deactivation, a customer may make a recharge and the expiration date extends. Let us analyze the following example: the client makes a recharge, spends its value during the same day and throws the SIM card away. Under the above regulations, the telecommunication company deactivates the SIM card one year after the real churn. Retention actions are usually addressed short before the expected date of churn, so relying on the date of expiration can be very ineffective. So, it is crucial to have a proper definition. In this study, we use the following: ‘‘the client churned if he or she had a 6-week period without incoming and outgoing calls”. The moment of churn is the beginning of this period. Because the marketing message should be sent short before the client churns, we predict whether a client churns four weeks after the moment of analysis. So, we want to predict in advance of four weeks the occurrence of six weeks of inactivity. This definition is the result of the separate analysis and we discuss it here only briefly. The churn definition should allow fast verification – we want to wait as short as possible to find out that the client really stopped using the service. The definition should also be certain – we want to be sure that after this period of inactivity, the client makes no call and receives no call until the SIM card deactivation. Six weeks is the compromise of these two goals – clients who did not use the service for six weeks rarely made a call after that period and there are many clients who awake after three, four or five weeks of inactivity. In addition, if we would like to rely on a date of expiration, we could only use outdated data about clients, because we had to wait 365 days until the expiration to find out which clients really churned in order to calculate dependent variable for the models. The telecommunication sector changes very quickly and such a long delay is unacceptable. 1.4. Data mart Previous studies on churn prediction in the cellular telecommunication market, for example Pendharkar (2009), Wei and Chiu (2002), Hung et al. (2006), used data with relatively small number of explanatory variables. We are surprised, how little was known about the customers. In our study, there are 1381 variables. All of them are derived from cdr, tariffs and components. Variables associated with components represent usually the presence or absence of packages the clients may activate, for example, a package of cheaper calls to selected clients. In comparison to previous studies, our variables derived from the cdr data are far more detailed. For example, we collect data about overall minutes of usage, but also minutes of usage splitted by days of the week (working days and weekend) and time of calls (morning, midday, night). We also collected data about the ‘‘dispersion” of calls – we measured, how many calls were made to the most frequent number and received from the most frequent number. Our data is gathered from one of the Polish mobile operator in 2007 and 2008. 2. Churn modelling In our study, we used the following models: logistic regression, linear regression, Fisher linear discriminant analysis and decision trees. The ground for this choice is following: we want to use interpretable models which give understanding what is the reason (or at least a symptom) of churn. We are aware of black-box models

4711

like random forests or support vector machines, but we argue that their usage is improper when predicting churn. Linear models like regression or Fisher discriminant analysis have a simple interpretation: positive coefficient by a variable suggests that larger values of this feature are symptoms of churn. Decision trees have also clear interpretation which can be expressed in terms of what-if rules. Interpretable models are also much easier to debug which is very important when using such a large data mart. Our data mart is general purpose mart, not necessary churn-oriented. So it is very easy to accidentally include irrelevant variables (like clients’ identifier) or treat variables in the improper way, for example, using categorical variables as if they were numerical (for example, the identifier of the client’s SIM card status which is coded as a numerical variable with a few levels). When the model is interpretable and selects only a small subset of significant features, such errors are easy to detect. Also, mistakes during the data mart generation phase (like a lack of certain attributes for certain clients or abnormal values) may be easily detected when the model uses only a small subset of them and does it in a clear way. 2.1. Data Our data set consists of the train sample – 85,274 observations, the calibration sample – 36,824 observations and the test sample – 45,497 observations. Data in the train sample and the calibration sample come from the dataset collected at the same time, which was then split randomly into the train and validation part. The test sample was collected six months after the train and calibration sample. 2.2. Models Since applying regression models and Fisher discriminant analysis directly to such a large data set may be difficult (long time of computation, possible numerical instability due to the collinearity), we applied the following preliminary variable selection (calculated on training set). To each variable the Student’s t-test was applied. The null hypothesis states that the means of the particular variable among churners and non-churners do not differ. The alternative hypothesis states that there is a significant difference between these two means. So variables that are potentially interesting when modelling churn should have a significant difference of these means. So we selected 50 variables with the highest absolute value of the t statistics and used these 50 variables in the regression models. We used full regression and regression with the stepwise, forward and backward selection based on the Wald test applied to earlier selected 50 variables. As far as decision trees are concerned, we used two versions: with all the 1381 variables (decision trees are computationally fast and it is possible to estimate such models) and with 50 best variables according to t -statistics similarly to the regression approach. 2.3. Results We tested our models using lift curves that measure the relation of churners in the top deciles of the score generated by the models to the fraction of churners in the whole population (lifts expressed as factors not as percentage). Since all the linear models had similar performance, regardless on the additional variable selection method (stepwise, backward, forward, none), but the logistic regression was slightly better than linear regression and Fisher discriminant analysis; in this article, we present results only for the logistic regression with stepwise selection. Applying preliminary variable selection to decision trees gives similar results to the full decision tree, so we present only decision trees with the

M. Owczarczuk / Expert Systems with Applications 37 (2010) 4710–4712

15

15

4712

o − decision tree

o − decision tree

o +

+

+ − logistic regression 10

lift 5

+ + oo o + o+ o+ o+ o+ o+ o+ o++ o+ oo++ oo+++ ooo+++ ooo+ +++++ o+ o+ o+ oo oooo+ ++ +++ oo +++ ooo ooo +++ ooo ++ oo ++ oo ++ oo +++ ooo +++ ooo ++++ oooo +++ ooo ++++ oooo ++++ oooo +++++ ooooo +++++ ooooo +++++ ooooo ++++++++ oooooooo ++++++++ oooooooo o

0

0

5

lift

10

+ − logistic regression

o + o + o +o + + o + o+ o++ o ++ o ++ oo ++ oo ++++ oooo ++++ oooooo++++++ ++ ooooooo +++ ++++ ooo oooo ++ oo ++ oo ++ oo ++ ++ oo +++ oo +++ ooo +++ ooo +++ ooo ++++ ooo ++++ oooo oooo +++++ ooooo +++++ ooooo +++++++ ooooooo +++++++ ooooooo +++++++ oooooooo

0.0

0.2

0.4

0.6

0.8

1.0

0.0

15

o − decision tree + − logistic regression

o + + o o + o + o + + o+ o++ o ++ oo ++ ooo++++ ooo +++++ ooooo +++++ ++++ oooooo+ ++++ oooo oooo +++ ooo oo ++ oo ++ ++ oo ++ oo ++ oo +++ ooo +++ ooo +++ +++ ooo ++++ ooo oooo +++++ ooooo +++++ ooooo ++++++ oooooo +++++++ ooooooo ++++++++ oooooooo oooo +++

0

5

lift

10

+ o

0.2

0.4

0.4

0.6

0.8

1.0

Fig. 3. Lift curves for test sample.

Fig. 1. Lift curves for train sample.

0.0

0.2

quantile

quantile

0.6

0.8

1.0

quantile

The poor results for the decision trees may be easily explained. Trees use rules of the form ‘‘if xi < C i then pðchurnÞ ¼ q”. Telecommunication market in Poland evolves very quickly: calls are cheaper and cheaper so clients make more and more calls. New services are introduced, some components become very common, other are obsolete. So the distribution of the variables shifts in time. When using fixed splits ‘‘xi < C i ”, more and more clients start to fulfill these conditions or their negations. As a result, more and more clients fall into certain leafs of the tree and the prediction becomes less precise. On the contrary, when constructing the lift curve for the linear models, observations are sorted by the score, which is in fact a linear combination of the variables and has the form ‘‘a1 x1 þ    þ ak xk ”. So the shift in distribution generates the shift in score, but the sorting does not change and the lift curve remains stable.

Fig. 2. Lift curves for calibration sample.

3. Summary, conclusions and direction for future work preliminary variable selection. The results for the train dataset are presented in Fig. 1, for the calibration dataset in the Fig. 2 and for the test dataset in the Fig. 3. We may observe that all the models give similar results for very lower quantiles among all data sets. It corresponds to the situation, when the target group of the marketing campaign is large. The key difference can be observed for high and medium quantiles. High quantiles correspond to the situation, when a really small group of clients should be selected to the marketing campaign. As far as the calibration sample is concerned, logistic regression (and all the linear models) outperforms decision trees. However, these differences are significant only for medium quantiles. This may suggest that all the models have similar performance in short term and are valid for the period, when the model is built. The key question is whether the models are stable and valid when applied to datasets long after the models were built. When comparing results on the test dataset, we may observe that the lift curve of the decision trees is much lower and that the lift curve of the logistic regression for high quantiles. For the logistic regression, the shape of the lift curve is similar for calibration and test sample. For the decision tree, the shape of the lift curve differs for the calibration and test sample. For the test sample, it is non-monotonic. It suggests that linear models are more stable. It is a very important aspect of retention programs. Models that get old very quickly need frequent updates, which is time and cost consuming.

In this article, we evaluated usefulness of regression and decision trees approach to the problem of modelling churn in the prepaid sector of the cellular telecommunication company. Linear models are more stable than decision trees that get old quickly and their performance weakens in time, especially in top deciles of the score. Nevertheless, we showed that prepaid churn can be effectively predicted using large data mart. As far as future work is concerned, it would be interesting to model churn in the sector that is somewhere between postpaid and prepaid – the mix sector. Mix clients have to sign contract and personal data is available for them, like for the postpaid customers. In addition, they make recharges which makes them similar to prepaid. References Hung, S. Y., Yen, D. C., & Wang, H. Y. (2006). Applying data mining to telecom churn management. Expert Systems with Applications, 31, 515–524. Ngai, E. W. T., Xiu, L., & Chau, D. C. K. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36, 2592–2602. Pendharkar, P. C. (2009). Genetic algorithm based neural network approaches for predicting churn in cellular wireless network services. Expert Systems with Applications, 36, 6714–6720. Wei, C. P., & Chiu, I. T. (2002). Turning telecommunications call details to churn prediction: A data mining approach. Expert Systems with Applications, 23, 103–112.