Learning from Experience, Simply - Semantic Scholar

Comment

Report 4 Downloads 19 Views

Learning from Experience, Simply by

Song Lin Juanjuan Zhang and John R. Hauser

June 1, 2012

Song Lin is a PhD Candidate, MIT Sloan School of Management, Massachusetts Institute of Technology, E62-580, 77 Massachusetts Avenue, Cambridge, MA 02139, (617) 225-1639, [email protected].

Juanjuan Zhang is an Associate Professor of Marketing, MIT Sloan School of Management, Massachusetts Institute of Technology, E62-537, 77 Massachusetts Avenue, Cambridge, MA 02139, (617) 452-2790, [email protected].

John R. Hauser is the Kirin Professor of Marketing, MIT Sloan School of Management, Massachusetts Institute of Technology, E62-538, 77 Massachusetts Avenue, Cambridge, MA 02139, (617) 253-2929, [email protected].

Learning from Experience, Simply

Learning From Experience, Simply Abstract There is substantial interest in the marketing literature in modeling and estimating consumer-learning dynamics. However, (approximately) optimal solutions to forward-looking learning problems are computationally complex, limiting their empirical applicability and behavioral plausibility. Drawing on theories of cognitive simplicity from marketing, psychology, and economics, we propose a behaviorally intuitive (and tractable) solution – index strategies. We argue that index strategies balance thinking costs and discounted utility. Index strategies also avoid exponential growth in computational complexity as the size of the decision problem increases, enabling researchers to study learning models in more complex situations. The existence of index strategies depends upon a structural property called indexability, which is hard to establish in general. We prove the indexability of canonical consumer learning models in which both brand quality and future utility shocks are uncertain. We establish invariance properties which make index strategies feasible for consumers to intuit. Using synthetic data, we demonstrate that index strategies achieve nearly optimal utility at low computational costs. Using IRI data for a product category where we expect forward-looking learning, we find that an index-strategy model fits behavior well, provides plausible parameter estimates, predicts out-ofsample as well as or better than alternative models, and requires substantially lower computational costs. Keywords:

dynamic consumer learning, structural models, cognitive simplicity, index strategies, heuristics, multi-armed bandit problems, restless bandits, indexability.

1

Learning from Experience, Simply

1. Introduction and Motivation Considerable effort in marketing is devoted to modeling and estimating the dynamics by which consumers learn from their consumption experience (e.g., Roberts and Urban 1988; Erdem and Keane 1996; Erdem et al. 2005; Narayanan and Manchanda 2009; Ching and Ishiara 2010; Ching et al. 2011). Researchers have developed theory-rich models of optimizing forwardlooking consumers who balance exploitation (choosing the brand that yields the highest current reward) with exploration (trying brands to gather information so that future consumption experiences might improve). Pillars of these models are an explicitly specified description of consumer utility and an explicitly specified process by which consumers learn. Most models assume consumers choose brands by solving a dynamic program which maximizes expected total utility taking learning into account. Researchers argue that theory-based models are more likely to uncover insight and be invariant for new-domain policy simulations (Chintagunta et al. 2006, p. 604). However, these advantages often come at the expense of difficult and time-consuming solution methods. The dynamic programs for forward-looking learning models are, themselves, extremely difficult to solve optimally. We cite evidence below that the problems are PSPACE-hard – they cannot be solved using polynomial space (i.e., computation memory). PSPACE-hard implies the more-familiar notion of NP-hard. This intractability presents both practical and theoretical challenges. Practically, problem difficulty requires researchers to rely on approximate solutions. Without explicit comparisons to the optimal solution, we do not know the impact of the approximations on estimation results. Moreover, the well-known “curse of dimensionality” prevents researchers from investigating problems with moderate or large numbers of brands or marketing variables, whereby even approximate solutions are not feasible. Theoretically, it is reasonable to

2

Learning from Experience, Simply

posit that a consumer cannot solve in his or her head a dynamic problem optimally that the best computers cannot solve. In contrast, well-developed theories in marketing, psychology and economics suggest that observed consumer decision rules are cognitively simple (e.g., Payne et al. 1988, 1993; Gigerenzer and Goldstein 1996). We propose and evaluate an alternative theory of forward-looking consumers’ solutions to the dynamic learning problems – index strategies. We retain the basic pillars of structural modeling: an explicit description of consumer utility, an explicit Bayesian learning process, and an assumption that consumers seek to optimize expected discounted utility. We posit in addition a cost to thinking (e.g., Shugan 1980; Johnson and Payne 1985). We assume the consumer chooses a strategy that is likely to optimize expected total utility minus thinking costs. While thinking costs might be observable in the laboratory, say through response latency, they are inherently unobservable in vivo. To posit index strategies we identify domains where index strategies are nearly optimal in the reduced problem of maximizing expected discounted utility. If, in such domains, index strategies are substantially simpler for the consumer to implement, then it is likely that savings in thinking costs exceed the slight reduction in optimality in the reduced problem. In the special cases where index strategies are optimal in the reduced problem, we argue they are superior as a description of forward-looking consumers. Following the same logic, we also establish conditions where myopic learning strategies suffice. To prove the viability of index strategies as a descriptive model of consumers we must (1) establish when well-defined index strategies exist, (2) provide intuition that they are cognitively simple and behaviorally intuitive (and hence might be used by consumers), (3) investigate when index strategies are optimal or near optimal solutions to the reduced problem of utility maximization even if there were no thinking costs, and (4) test whether index strategies explain

3

Learning from Experience, Simply

observed consumer behavior at least as well as alternative models. We address #1 analytically by proving the “indexability” property of canonical forward-looking learning models. (Indexability is hard to establish in general.) We address #2 by examining the form and properties of index strategies and arguing they are cognitively simple and behaviorally intuitive relative to current solution methods. We address #3 with both analytical arguments and synthetic data. We address #4 by estimating alternative models using IRI data on the purchase of diapers, a product category where we expect to see forward-looking learning. We begin with a review of concepts upon which we build. Our basic hypothesis is that consumers use a cognitively-simple index strategy. We demonstrate viability by showing that the “Whittle index” (to be defined later) is one candidate that satisfies the four criteria. Figure 1 is a conceptual summary of our hypothesis. The concept of an index solution could apply to other indices (or approximations to the Whittle index) if such indices are shown to be better descriptions of observed consumer behavior. Future papers might evaluate such indices on the four criteria and compare their performance to the Whittle index. [Insert Figure 1 about here.] 2. Related Literatures We draw on concepts from four literatures: learning dynamics, cognitive simplicity, descriptive models based on optimal solutions to reduced problems, and bandit problems. 2.1. Learning Dynamics

Many influential papers study consumer learning dynamics and apply learning models to explain or forecast consumer choices. Using data from automotive consumers Roberts and Urban (1988) estimate a model in which consumers use Bayesian learning to integrate information from a variety of sources to resolve uncertainty about “brand quality.” They, like subsequent authors, 4

Learning from Experience, Simply

define brand quality generally either by a multi-attributed utility function or by a match of a brand’s features to the consumer’s needs. Erdem and Keane (1996) build upon the concept of Bayesian learning to include forward-looking consumers who tradeoff exploitation with exploration. For frequently-purchased goods, their model fits data better than a purely myopic model (reduced form of Guadagni and Little 1983) and as well as the Roberts-Urban myopic-learning model. These papers stimulated a line of research that estimates the dynamics of consumer learning – for a recent review see Ching et al. (2011). Some models retain myopic consumers with Bayesian learning (e.g., Narayanan et al. 2005; Mehta, et al. 2008; Chintagunta et al. 2009; Narayanan and Manchanda 2009; Ching and Ishihara 2010), while others explicitly model forwardlooking consumers (e.g., Ackerberg 2003; Crawford and Shum 2005; Erdem et al. 2005, 2008; Kim et al. 2010).1 Because forward-looking learning problems are computationally intractable, many applications estimate models based on myopic learning (Narayanan and Manchanda 2009; Ching and Ishihara 2010). In general, forward-looking models fit empirical data well but have not improved prediction much relative to myopic learning models. However, when the theory is accurately descriptive, the more-complex models should improve policy simulations. Because the forward-looking assumption requires consumers to solve computationally hard dynamic problems, some authors have suggested that “the future development of structural models in marketing will focus on the interface between economics and psychology (Chintagunta et al. 2006, p. 614).” 2.2. Cognitive Simplicity

Parallel literatures in marketing, psychology, and economics provide evidence that consumers use decision rules that are cognitively simple. In marketing Bettman et al. (1998) and 1

Kim et al. (2010) model consumers’ sequential search of products. The search problem can be seen as a forwardlooking learning problem in which a consumer’s product “experience” completely reveals product value.

5

Learning from Experience, Simply

Payne et al. (1988, 1993) present evidence that consumers use simple heuristic decision rules to evaluate products. For example, under time pressure, consumers often use conjunctive rules (require a few “must have” features) rather than linear-utility-based rules. Using simulated thinking costs with “elementary information processes,” Johnson and Payne (1985) illustrate how heuristic decision rules can be rational when balancing utility and thinking costs. Methods to estimate the parameters of cognitively-simple decision rules vary, but, in general, such rules predict as well as or better than linear utility (e.g., Bröder 2000; Gilbride and Allenby 2004; Kohli and Jedidi 2007; Yee et al. 2007; Hauser et al. 2010). Building on Simon’s (1995, 1956) theory of bounded rationality, researchers in psychology argue that human beings use cognitively simple rules that are “fast and frugal” (e.g., Gigerenzer and Goldstein 1996; Martignon and Hoffrage 2002). Fast and frugal rules evolve when consumers learn decision rules from experience. Consumers continue to use the decision rules because they lead to good outcomes in familiar environments (Goldstein and Gigerenzer 2002). For example, when judging the size of cities, “take the best” often leads to better judgments than a linear rule.2 Indeed, in 2010-2011 two issues of Judgment and Decision Making were devoted to the recognition heuristic alone (e.g., Marewski et al. 2010). Related concepts include accessibility (e.g., Bruner 1957), fluency (e.g., Jacoby and Dallas 1981), and availability (e.g., Tversky and Kahneman 1973). The costly nature of cognition has also received attention in economics (see Camerer 2003 for a review). A line of research looks to extend or revise standard dynamic decisionmaking models with the explicit recognition that cognition is costly. For example, Gabaix and Laibson (2000) empirically test a behavioral solution to decision-tree problems – decision-

2

The take-the-best rule is, simply, if you recognize one city and not the other it is likely larger; if you recognize both use the most diagnostic feature to make the choice.

6

Learning from Experience, Simply

makers actively eliminate low-probability branches to simplify the task. Gabaix et al. (2006) develop a “directed cognition model,” in which a decision-maker acts as if he/she has only one more opportunity to search. In the laboratory, the directed cognition model explains subjects’ solutions to a simple dynamic problem better than a standard search model that assumes costless cognition. Cognitive process mechanisms are still being debated in the marketing, psychology, and economics literatures. Our hypothesis that consumers use index or index-like strategies needs only the observation that consumers favor decision rules that are cognitively simple and that such rules often lead to very good outcomes. The cognitive-simplicity hypothesis assumes that consumers tradeoff utility gains versus thinking costs, but does not require explicit measurement of thinking costs. 2.3. Descriptive Models Based on Optimal Solutions to Reduced Problems

If a ball player wants to catch a ball that is already high in the air and traveling directly toward the player, then all the player need do is gaze upon the ball, start running, and adjust his/her speed to maintain a constant gaze angle with the ball (Hutchison and Gigerenzer 2005, p. 102).3 The gaze heuristic is an example where a cognitively simple rule accomplishes a task that might otherwise involve solving difficult differential equations. But the principle is more general: complex optimization problems often have simple solutions. Suppose a consumer knows the utilities and prices of a set of durable goods and wishes to choose the maximum utility set subject to a budget constraint. If the brands were infinitely divisible, then simple “greedy” solutions lead to the optimal allocation: choose brands in the order of either utility/price or utility – ×price (where

is the shadow price of the budget constraint).

3

Professional athletes use more-complicated heuristics that give them greater range, for example, in baseball, prepositioning based on prior tendencies and the expected pitch, and the sound as the bat hits the ball.

7

Learning from Experience, Simply

Do so until the budget is exhausted. When brands are not infinitely divisible neither heuristic is optimal, but both greedy heuristics provide excellent (and empirically indistinguishable) descriptions of consumer behavior (Hauser and Urban 1986). Greedy heuristics are one justification for a utility specification that is linear in price. Another heuristic, choosing the information source with the maximum gain in value per unit time while looking only one-step ahead, explains well how consumers search for automobiles (Hauser et al. 1993). There are many examples in psychology and marketing where seemingly simple decision rules solve more-complex problems. We argue in §4 and §5 that some index strategies are simple decision rules that solve complex dynamic optimization problems. 2.4. Bandit Problems

The multi-armed bandit problem is a prototypical problem that illustrates the fundamental tradeoff between exploration and exploitation in sequential decision making under uncertainty. In a bandit problem the consumer faces a finite number of choices, each of which has an uncertain value. The consumer must make choices, observe outcomes, and update beliefs with a sequential decision rule to maximize expected discounted values. First formulated by the British in World War II, for over thirty years no simple solution was known. Then Gittins and Jones (1974) demonstrated a simple index solution – develop an index for each “arm” (each choice alternative) by solving a sub-problem that involves only that arm, then choose the arm with the largest index. Gittins and Jones (1974) proved the surprising result that the index solution is the optimal solution to the classic bandit problem whenever the non-chosen choice alternatives do not change over time4. When non-chosen choice alternatives change over time, say due to random shocks, 4

The Gittins’ index has been successfully applied in a variety of fields. For example, Hauser et al. (2009) apply Gittins’ index to derive optimal “website morphing” strategies that match website design with customers’ cognitive styles. Recently morphing was applied to AT&T’s banner advertising on CNET.

8

Learning from Experience, Simply

Gittins’ index is no longer guaranteed to be optimal. Such problems are known as “restless bandits” (Whittle 1988) and, in general, are computationally intractable (Papadimitriou and Tsitsiklis 1999). In his seminal paper, Whittle (1988) proposed a tractable heuristic solution grounded on the optimal solution to a relaxed problem. His solution generalizes Gittins’ index such that the problem can be solved optimally or near optimally by associating an index (referred to as Whittle’s index) separately with each alternative and choosing the alternative with the largest index. This index solution reduces an exponentially complex, intractable problem to a set of one-dimensional problems. The existence of well-defined index solutions relies on a structural property called indexability, which is not guaranteed for all restless bandit problems. We show that the canonical forward-looking learning problem is indexable. We also show that the index of an alternative is a simple function of key parameters pertaining to this brand, including means and variances of brand quality, quality beliefs, and utility shocks. We then argue that it is reasonable for the consumer to intuit how an index varies as a function of these parameters. 3. Canonical Forward-Looking Learning Problem We consider the following canonical forward-looking learning problem. A consumer sequentially chooses from a set consumer’s utility,

containing brands. Let index brands and index time. The

, from choosing at has three components. “Quality” (enjoyment, fit

with needs, weighted sum of brand features, etc.),

, is drawn from a distribution

with parameters

distribution functions are independent over

uncertain to the consumer. The

;

. This independence assumption rules out learning about brand by choosing another brand. Quality is realized after each consumption occasion. Conditionally on

, quality draws

are independent over time; however, quality draws may be correlated over time through 9

.The

Learning from Experience, Simply

interpretation is as follows. Because quality is a general concept that includes enjoyment and fit with needs that might vary among consumption occasions, a single brand experience is not sufficient to resolve all uncertainty. However, other things being equal, a good brand experience suggests that future experiences with the same brand are likely to be good. This gives forwardlooking consumers the “exploration” incentive – by trying out a brand, consumers make betterinformed decisions about this brand in future. The second component of utility is a set of “observable shocks”,

, such as advertising,

price, promotion, and other control variables that are observable to the researcher and consumer.5 For simplicity, we assume the

affect utility directly, although the model is extendable to indi-

rect effects as in Ackerberg (2003), Erdem and Keane (1996), and Narayanan et al. (2005). The third component of utility is an “unobservable shock”,

, which represents random preference

fluctuations observed by the consumer but not by the researcher. We refer to the weighted sum of observable and unobservable shocks, ′ “utility shocks,” where

, as

is a vector of weight parameters. Utility shocks are important because,

without them, the consumer would learn to choose a single brand (if the exogenous variables stabilized to a known value, say constant price and advertising), an outcome that is often violated in real-world observations. We let utility shocks be drawn from a joint distribution, ,

;

, independently over time with parameters, .6 The

consumer knows the distribution

are independent over . The

and the value of , observes the current utility shocks prior

to his/her decision at , but does not know future realizations of the shocks. We make the con5

Our consumer learning model treats observable utility shocks as exogenous. However, the same insight applies to endogenous observable utility shocks as long as (1) each “atomic” consumer’s learning does not affect these shocks (e.g., a brand’s advertising expenditure), and (2) these shocks do not directly convey quality information. 6 Observable shocks can be independently distributed over time for a number of reasons. For example, firms may intentionally randomize price promotions in response to competition. Such “mixed strategies” can generate observed prices that appear to be freshly drawn in each period from a known distribution (Narasimhan 1988).

10

Learning from Experience, Simply

servative assumption that utility shocks are independent of

and thus do not help consumers

learn quality directly. However, utility shocks do shape learning indirectly by varying consumers’ utility from exploitation, which in turn affects their incentive for exploration. In summary, we write the consumer’s utility from choosing at as follows: (1)

′

.

The consumer uses Bayes Theorem to update his or her beliefs about the parameters,

,

after each consumption experience (assumed to occur after choice but before the next choice). Let

be a set of parameters that summarize the consumer’s beliefs about

beliefs about

are summarized by a prior distribution,

evant prior experience. After the are summarized by

where

0,

is based on all rel-

consumption experience the consumer’s posterior beliefs

. For example, when both

;

;

at time . At

an updating is naturally conjugate. We obtain The parameters of posterior beliefs,

and prior beliefs are normal, Bayesiusing standard updating formulae.

̅ ,

∈ Ω and the realized utility shocks,

∈

and

∈ ,

summarize the state of information about brand . The collection of brand-specific states, ,

,

,

,…,

,

,

,…,

,

,

,…,

represent the set of states relevant to

the decision problem at . → , that maps the state space to

We seek to model a decision strategy, Π: Ω

the choice set. Without further assumptions, the consumer must choose a decision strategy to maximize expected discounted utility:

(2)

where

,

,

′

max

is the discount factor and the expectation 11

,

,

,

is taken over the stochastic process generat-

Learning from Experience, Simply

ed by the decision strategy (in particular, the transition between states that may depend on the consumer’s brand choice), its implication for Bayesian updating, and the

,

, and

distribu-

tion functions. The infinite horizon can be justified either by repeat consumption over a long horizon or by the consumer’s subjective belief that he/she will terminate the decision problem randomly. The optimal solution to the consumer’s decision problem can be characterized as the solution to Bellman’s equation. (3)

,

,

max ∈

′

, ′

,

| ,

.

While Bellman’s equation is conceptually simple, the full solution is computationally intractable because, even after integrating out the utility shocks

and

, it evolves on a state space of size

|Ω| , where |Ω| is the number of elements in Ω. Not only is this exponential in the number of brands, but problem dimensionality gets extremely large if the state space is large. Even when the optimal solution is approximated by choosing discrete points to represent Ω, |Ω| is large. 4. An Index Strategy in the Absence of Utility Shocks The learning problem we examine includes utility shocks, but it is easier to illustrate the intuition of index strategies using a problem without utility shocks. Temporarily assume 0 for all and , although the same result holds when there is no inter-temporal variation in

. In this special case the consumer’s decision problem is a classic multi-arm bandit.

Gittins’ insight is as follows. To evaluate a brand , the consumer thinks as if he/she is choosing between this brand and a fixed reward

, which, once chosen, is chosen for all future

periods. In each period , the consumer solves an independent sub-problem for each brand – he/she either consumes this brand to gain more information about it, or exploits the fixed reward 12

Learning from Experience, Simply

. In the latter case, the consumer’s beliefs about the brand cease to evolve, such that

,

. The optimal solution to this sub-problem is determined by a greatly simplified version of Bellman’s equation: (4)

,

max

,

,

,

,

|

.

Notice that each sub-problem only depends on the state evolution of a single brand, which is much simpler than the full problem specified in Equation 3. , is defined as the smallest value of

Gittins’ index,

such that the consumer at

time is just indifferent between experiencing brand and receiving the fixed reward. We obtain by equating the two terms in brackets in Equation 4. Gittins proposed that

could be

used as a measuring device for the value of exploring brand – if there is more uncertainty about a brand left to explore, the consumer will demand a higher fixed reward to be willing to stop exploration. Gittins’ index is updated when new information is realized. Gittins’ surprising result is the Index Theorem. The optimal solution in each period is to choose the brand with the highest index in that period. The consumer suffers no loss in expected discounted utility by using an index strategy. A computationally difficult problem has thus been decomposed into simpler sub-problems. Index Theorem (Gittins and Jones 1974). The optimal decision strategy when there are

no utility shocks is Π

∈

.

Figure 2 illustrates the Index Theorem. In Figure 2a we computed Gittins’ indices for two brands (for normal

and normal priors). The true mean quality for Brand 1 is normalized to ze-

ro and the true mean quality of Brand 2 is negative. The consumers’ mean prior beliefs reflect

13

Learning from Experience, Simply

the true mean qualities, but the consumer is uncertain about these beliefs. Prior to period 7 the index strategy causes the consumer to bounce between choosing Brand 1 and Brand 2. In periods 7 to 12, he/she continues to sample Brand 2 until by period 13 he/she has learned enough about the brands. After period 13 the consumer switches to Brand 1 and remains with Brand 1 indefinitely. A brand’s index remains flat when it is not being consumed because the simplified problem does not include utility shocks. [Insert Figure 2 about here.] Index strategies are much simpler than a brute force solution to Bellman’s equation, but can the consumer intuit (perhaps approximately) how an index varies with experience and beliefs? We expect future laboratory experiments to address this issue explicitly. In this paper, we argue that index strategies have intuitive properties and that it is not unreasonable for the consumer to intuit those properties. Figure 2b illustrates the intuitive properties of Gittins’ index. We plot the expected value of Gittins’ index as it would evolve if the brand were chosen repeatedly (assuming the mean of the prior distribution was accurate). The two lines represent two brands that differ in mean quality but not uncertainty. In each given period, Gittins’ index is larger for the higher-quality brand. Naturally, with the same amount of remaining uncertainty, a higher-quality brand offers a greater value of exploitation. Both index curves decline smoothly and converge with experience toward true brand quality. The index converges because the value of exploration decreases as the consumer learns more about the distribution of brand quality. The amount by which the index exceeds the mean quality beliefs is the value of learning (the value of reducing uncertainty). When we plot Gittins’ index as a function of the consumer’s posterior quality uncertainty (not shown), it is also intuitive – the index increases with 14

because the value of exploration

Learning from Experience, Simply

increases with the remaining amount of quality uncertainty. We therefore posit that it is not unreasonable for the consumer to intuit the (approximate) shape of Gittins’ index as a function of the parameters of the problem. 5. The Index Strategy When Utility Shocks are Present We now allow utility shocks. Observable shocks (

) include effects that researchers

might observe and model, such as changes in advertising, promotion, or price. Unobservable shocks (

) include effects that researchers do not observe, such as changes in the characteristics

of the consumption occasion or changes in the idiosyncratic taste. Because shocks enter the utility function the consumer may, at any period, switch among brands. Without utility shocks the consumer’s choice converges to a single brand as in Figure 2a. When the model includes utility shocks, the Gittins-Jones Index Theorem no longer applies because non-chosen brands do not remain constant. With shocks, the consumer’s problem belongs to the class of “restless-bandit problems” as introduced by Whittle (1988). In general such optimization problems are PSPACE-hard (Papadimitriou and Tsitsiklis 1999) making the problem extremely difficult (if not infeasible) to solve and making it implausible that the consumer would be able to solve the problem with a solution strategy based on Equation 3. Whittle (1988) proposed a solution that generalizes Gittins’ index. In each period, to evaluate a brand , the consumer thinks as if he or she must choose between brand and a fixed reward,

. Bellman’s equation for the

sub-problem (which now includes the utility shocks)

is:

15

Learning from Experience, Simply

,

(5) max

,

,

,

,

,

,

,

,

The index is defined as the smallest value of

,

,

,

,

,

,

|

such that the consumer at time is just indiffer-

ent between experiencing brand and receiving the fixed reward. For such an index to be welldefined and meaningful, the “indexability” condition needs to be satisfied (Whittle 1988). Let ⊆Ω

be the set of states for which choosing

(6)

,

,

∈Ω

at time is optimal:

:

′

,

,

,

, ,

,

,

,

,

,

,

,

|

.

Indexability is defined as: Definition: A brand is indexable if, for any ,

⊇

for any

.

Indexability says that as the fixed reward increases, the collection of states for which the fixed reward is optimal does not decrease. Intuitively, indexability requires that, if under some state it is optimal to choose the fixed reward, then it must also be optimal to choose a higher fixed reward. Indexability implies a consistent ordering of brands for any state, so an index strategy is meaningful. Because this condition does not hold for all restless-bandit problems, we must establish indexability when the model includes utility shocks. In a companion online appendix we prove the following proposition. Proposition 1. The canonical forward-looking learning problem is indexable.

16

.

Learning from Experience, Simply

Once the indexability condition is established, then a well-defined index strategy is to choose in each period the brand with the largest index. The index strategy breaks “the curse of dimensionality” by decomposing a problem with exponential complexity into much-simpler sub-problems, each on a state space of |Ω| after integrating out the utility shocks

and

. By

breaking this curse, it is more plausible that the consumer might use the decision strategy. As a bonus, estimation is far simpler when a dynamic learning problem is nested. It remains to be shown that the index strategy with utility shocks is invariant to scale, intuitive, and implies a reasonable utility vs. thinking cost tradeoff. 5.1. The Index Strategy is Invariant to Scale and Behaves Intuitively

An index strategy would be difficult for the consumer to use if the strategy were not invariant to scale. If it is invariant the consumer can intuit (or learn) the basic shape of the index function and use that intuited shape in many situations. Invariance facilitates ecological rationality. The following results hold for fairly general distribution of quality, tribution of utility shocks quality belief

;

and

;

, as long as they have scale and location parameters and the

is conjugate. To ease interpretation, we assume that

mal distributions with parameters as defined earlier: or beliefs about quality; and

,

, and joint dis-

and

,

and

and

for quality; ̅ and

are norfor posteri-

for utility shocks. In a companion online appendix we

prove the following proposition. Proposition 2. Let

be Whittle’s index for the canonical learning problem computed when

the posterior mean quality ( ̅ ) is zero, the mean utility shock (

,

) is zero, and the inherent

variation of quality ( ) is 1. For forward-looking consumers, Whittle’s index is scalable in these parameters. That is:

17

Learning from Experience, Simply

̅ ,

, ′

,

,

                                           ̅

,

,

, ,

, ,

0,

,

,

, 1, 0,

,

.

Proposition 2 implies that the consumer can simplify his or her mental evaluations by decomposing the index for each brand into (1) the mean utility gained from “myopic learning”, ̅

,

, which reflects the exploitation of posterior beliefs, and (2) the incremental benefit of

looking forward,

, which captures quality information gained through exploration. The con-

sumer needs only intuit the shape of

for a limited range of parameter values and scale it by

.

To provide further intuition, we prove the following proposition in an online appendix. The proposition shows that Whittle’s index behaves as expected when the parameters of the problem vary. The consumer likes increases in quality and utility shocks, dislikes inherent uncertainty in quality and utility shocks, but values uncertainty in beliefs about mean quality because such uncertainty increases the value of learning about that alternative. Proposition 3. Whittle’s Index for the canonical learning problem is (1) increasing in

the posterior mean of quality ( ̅ ), the observable utility shocks ( ′ servable (to the researcher) utility shock (

), and the unob-

), (2) weakly decreasing in the inherent un-

certainty in quality ( ) and the uncertainty in the utility shocks (

,

creasing in the consumer’s posterior uncertainty about quality (

).

), and (3) weakly in-

Figure 3 illustrates Whittle’s index where we set the posterior mean of quality to zero. (We observe similar shapes of Whittle’s index for other parameter values.) Like Gittins’ index, Whittle’s index is a smooth decreasing function of experience (experience reduces posterior quality uncertainty). With sufficient experience, Whittle’s index converges toward zero implying 18

Learning from Experience, Simply

that asymptotically the value of a brand is based on the posterior mean of quality (Proposition 2). Unlike Gittins’ index, Whittle’s index is a function of the magnitude of utility shocks (

,

). As

the magnitude of utility shocks gets larger, it is less important for the consumer to learn about quality and the value of the index deceases as shown in Figure 3. These properties (and the shape of the curve itself) are intuitive. [Insert Figure 3 about here.] Figure 3 and Proposition 3 suggest that, other things being equal, when the magnitude of utility shocks is larger, then the realized utility shocks are more likely to be the deciding factor in consumers’ brand choices. When

,

5 in Figure 3, the Whittle-index curve is almost flat im-

plying an almost myopic strategy. To formalize this insight, we state the following Corollary to Proposition 3: Corollary. (1) When the consumer’s posterior quality uncertainty dominates the uncer-

tainty in utility shocks, the value to the consumer from looking forward is high. (2) When the uncertainty in utility shocks dominates the consumer’s posterior quality uncertainty, the value from looking forward is low. In this case, a myopic leaning strategy (i.e., exploiting posterior beliefs) suffices, and could be the optimal strategy if it requires lower thinking costs than a forward-looking learning strategy. 6. Examination of the Near Optimality of an Index Strategy (Synthetic Data) We now examine whether an index strategy implies a reasonable tradeoff between optimality and thinking costs. Thinking costs remain unobservable, but §4 and §5 suggest that thinking costs could be substantially smaller with an index strategy compared to the direct solution of the PSPACE-hard version of Bellman’s equation. It remains to show that the loss in utility is small. To examine this issue we switch from analytic derivations to synthetic data because the 19

Learning from Experience, Simply

loss in utility is an issue of magnitude rather than direction. Synthetic data establish existence (rather than universality) of situations where index strategies are close to optimal. We set the

to zero and examine near optimality for the special case when

and

are normal distributions. We compare four decision strategies that the consumer might use. 1. No Learning. In this naïve strategy the consumer chooses the brand based only on his or her prior beliefs of quality and the current utility shocks. This strategy provides a baseline to evaluate the incremental value of learning. 2. Myopic Learning. In this strategy the consumer chooses the brand based only on his or her posterior quality beliefs and the current utility shocks. This strategy exploits the consumer’s posterior knowledge about brand quality. The Corollary predicts that this strategy will suffice when the magnitude of the utility shock is relatively high. 3. Index Strategy. This strategy assumes the consumer can intuit the shape of Whittle’s index. As per Proposition 2, this strategy improves on the myopic-learning strategy to take into account the option value of learning. Brand choices now reflect the consumer’s tradeoff between exploitation and exploration. The Corollary predicts the index strategy will outperform myopic learning especially when the magnitude of the utility shocks is relatively low. 4. Approximate Optimality. The PSPACE-hard forward-looking learning problem cannot be solved optimally, hence researchers use approximate solutions. Although approximation methods vary in the literature, discrete optimization is representative and should converge to the optimal solution with finer grids (Rust 1996). We discretize the state space,

̅ ,

into a set of

grid points for each of brands.

We choose parameters that illustrate the phenomena and we expect they are empirically

20

Learning from Experience, Simply

reasonable. The approximate optimal solution requires a finite time horizon; we select periods. If the discount factor is

50

0.90, truncation to a finite horizon should be negligible and,

in any case, is biased against index strategies. We choose

200 which should be very close to

optimal in the continuous problem. To simplify integration we draw the utility shocks from a Gumbel distribution with parameters

,

and normalize the location parameter such that the

utility shocks have zero unconditional means. Setting

2 is sufficient to examine optimality

and makes the approximately optimal solution feasible – it evolves on a state-space of

.

In this sense, the two-brand case provides a conservative test of the relative cognitive simplicity of index strategies. The ongoing uncertainties in quality for both brands,

, are equal and nor-

malized to 1. We vary the parameter values to capture three possibilities: (1) the mean and uncertainty both favor one brand, (2) the means are the same but uncertainty favors one brand, and (3) the mean and uncertainty favor different brands. Because quality beliefs are relative and because we can interpret Brand 1 and Brand 2 interchangeably, we need vary only the prior means in quality beliefs for one brand. Therefore we fix the prior mean quality belief of Brand 1 to zero and vary the prior mean quality beliefs for Brand 2 relative to Brand 1. We normalize the standard deviation of Brand 2’s prior quality belief to one and let the standard deviation of Brand 1’s prior quality belief be 1/2. While other relative variations are possible, these variations suffice to illustrate the phenomena. Finally, to test the predictions in §5 we allow the uncertainty in shocks to vary from relatively small to relatively large.7 We compute the consumer’s expected total utilities for 50 periods under different deci-

7

Specifically, we normalize ̅ 0 and 1 2 . We vary the means of the brand with greater ∈ prior variance with ̅ ∈ 0.3, 0.0, 0.3 . We vary the relative uncertainty in utility shocks with 0.1, 1.0 .

21

Learning from Experience, Simply

sion strategies. Details are provided in a companion online appendix. The results are summarized in Table 1. [Insert Table 1 about here.] We first examine cumulative computation times which are surrogates for thinking costs. As expected, the no-learning and myopic-learning strategies impose negligible thinking costs, the index strategy has moderate thinking costs, and the approximate optimal solution requires substantial thinking costs – 600 times the thinking costs under an index strategy even for this simple problem (the number will be even greater with finer grid points). We first examine the consumer’s expected utilities when there is relatively low magnitude of utility shocks (upper panel of Table 1). In all cases examined, the no-learning strategy leads to the lowest utility, which suggests that learning is valuable. As predicted by the Corollary, the index strategy and the approximately optimal strategy generate significantly higher utility than myopic learning for this problem. Furthermore, the index strategy is indistinguishable from the approximately optimal strategy. As long as thinking costs matter even a little, an index strategy will be better on utility minus thinking costs. We next examine the case of relatively high magnitude of utility shocks (lower panel of Table 1). As predicted by the Corollary, the myopic-learning model performs virtually the same as either the index strategy or the approximately optimal strategy. The differences are not significant. In this case, the consumer might achieve the best utility minus thinking costs with a myopic strategy (among the models tested). Analysis of synthetic data never covers all cases. Table 1 is best interpreted as providing evidence that (1) there exist reasonable situations where an index solution is better than the approximately optimal solution on utility minus thinking costs and (2) there exist domains where 22

Learning from Experience, Simply

myopic learning is best on utility minus thinking costs. We now examine empirical data. 7. Field Estimation of an Index Strategy (IRI Data on Diaper Purchases) We seek to identify at least one situation where consumers are likely to be forwardlooking and, in that situation, examine whether an index solution fits (predicts) as well or better than an approximately optimal solution. Even if an index solution does no better than an approximately optimal solution, we consider the result promising because an index solution is cognitively simpler. As a test of face validity, we also expect learning strategies to outperform nolearning strategies and forward-looking strategies to outperform myopic-learning strategies when the situation favors forward-looking behavior. 7.1. IRI Data on Diaper Purchases

We select the diaper category from the IRI Marketing Dataset that is maintained by the SymphonyIRI Group and available to academic researchers (Bronnenberg, et al. 2008). Diaper consumers are likely to be learning and forward-looking. Parents typically begin purchasing diapers based on a discrete birth event. Even if the birth is a second or subsequent child, “quality” may have changed. Informal qualitative interviews suggest that parents learn about whether diaper brands match their needs through experience (often more than one purchase), that diapers are sufficiently important that parents take learning seriously, and that parents often try multiple brands before settling on a favorite brand. There are observable shocks due to price promotions and shocks due to unobservable events. For example, a baby might go through a stage where a different brand is best suited to the parent/child’s needs. Diapers also have the advantages of being regular purchases (the no-choice option is less of a concern) and consumers tend to be in the market for many purchase occasions. To isolate a situation favoring forward-looking behavior we focus on consumers who

23

Learning from Experience, Simply

purchase branded products and whose purchases are likely triggered by a birth event. To do so, we eliminate private-label consumers and we select consumers whose first purchase occurs 30 weeks after the start of data collection. After applying these screening criteria the data contain 1,385 households who make 8,089 purchases. The market is dominated by three brands, Pampers, Huggies, and Luvs. We aggregate all other purchases as “Other Brands.” As a first-order view, Table 2 compares switching behavior during the first eight purchases to switching behavior after the first eight purchases. Brand loyalty is higher after eight periods than within the first eight periods suggesting that consumers may learn about “quality” over time. Notice also the small (and decreasing) switching to “Other Brands” among major-brand consumers. Finally, although the category was chosen as a likely test-bed for consumer learning, high brand loyalty, even during the initial eight weeks, suggests that there is no guarantee a forward-looking strategy will fit the data. [Insert Table 2 about here.] For this initial test of an index solution, we limit the

to the observed price for brands

purchased and the average price across other panelists in the same period for brands not purchased. We randomly select 150 households for estimation and 200 households for validation8. 7.2. Empirical Specification

We index each of

households by and denote by

sume that the quality and quality-belief distributions,

and

household ’s time horizon. We as, are normal and that unobserva-

ble shock distributions are Gumbel. The decision strategies are given below ( and

are now

scalars.)

8

The actual sample sizes for estimation and validation are 138 and 189 because we drop a few households whose order of brand purchases is unobserved.

24

Learning from Experience, Simply

No Learning:

Π

Myopic Learning:

Π

Index Strategy:

Π

Approximately Optimal:

Π

, ̅

, ̅ ,

̅

,

0,

,

̅

,

, 1, 0, ,

,

,

,

,

,

, |

,

.

7.3. Issues of Identification

Although we would like to identify all parameters of the various models, we cannot do so from choice data because utility is specified only to an affine transformation, and because the parameters that matter are relative parameters. For the no-learning model we can identify only the relative means of prior beliefs. For the myopic-learning model we can identify only the relative means of prior beliefs, the relative uncertainties of prior beliefs, and the means of quality. For the no-learning and myopic-learning models time discounting does not matter. For the index strategy and approximately optimal strategy we set the mean of prior be) to zero and normalize the variance of quality (

liefs of one brand ( ̅ of quality. (Only

) to one to set the scale

matters.) We cannot simultaneously identify a brand-specific mean of

⁄

quality and a brand-specific mean of the unobservable shock, so we set the latter to zero ( 0). We can identify the data and, hence,

,

because we have fixed is computable from

,

. The standard deviation of

is observed in

because the observable and unobservable shocks

are independent. As in most dynamic discrete choice processes (Rust 1994), the discount factor is not identified; we set it to 0.90.9 Finally, as in Erdem and Keane (1996), we suppress “parameter heterogeneity” among 9

As expected, sensitivity analyses with other discount rates (e.g., 0.95 and 0.99) yield almost identical loglikelihood statistics and similar parameter estimates for the index-strategy model. Anticipating the results of § 7.5, we expect similar (lack of) sensitivity for the approximately optimal strategy. The ease with which such sensitivity checks can be run is a benefit of the computational tractability of the index-strategy model.

25

Learning from Experience, Simply

households. We continue to allow each household’s quality perceptions to evolve idiosyncratically, but we do not attempt to estimate heterogeneity in prior beliefs, in the long-term mean quality, or in the standard deviations. We abstract away from parameter heterogeneity for the following reasons. First, there are, on average, only 8.39 purchases per household. We would overly strain the model by attempting to estimate heterogeneity in all of the parameters.10 Second, we wish to focus on behavioral heterogeneity that is generated endogenously by forward-looking learning models. Even if households start with exogenously homogeneous prior beliefs, different quality draws and utility shocks lead to different exploitation-versus-exploration tradeoffs and different learning dynamics. We seek to evaluate heterogeneous learning dynamics against the data, rather than using heterogeneous parameters to fit the data. For an initial test of an index strategy, this simplification is conservative because it biases against a good model fit. Despite this focus, it turns out the index strategy explains the data well and predicts well. 7.4. Maximum Simulated Likelihood Estimation

We estimate each model’s parameters with maximum simulated likelihood estimation. To simplify notation let

denote the vector of parameters to be estimated. Let

∈

denote

denote ’s decision sequence up to time .

household ’s decision at time and let

The likelihood of observing the choice sequences as a function of (7)

Pr

is:

;

.

Learning strategies depend upon the evolution of the unobserved belief states complicating the inference process. If we were to write the likelihood function as a function of each con10

Doing so is technically feasible, but would likely over-parameterize the model and exploit noise in the data. More importantly, our goal is to demonstrate that an index solution is a viable representation of cognitive simplicity and that cognitive simplicity (relative to the approximately optimal solution) is a phenomenon worth study in structural models. We leave explicit modeling of parameter heterogeneity to future research using data with longer purchase histories.

26

Learning from Experience, Simply

sumer’s unobserved belief states and shocks over time, we would need to sample from an extremely complicated joint density of belief states and shocks. Instead, we augment the data and sample directly from the more-fundamental unobservables – the quality experiences, are drawn conditionally i.i.d. from normal distributions with means,

, that

, and standard deviations,

. Given a set of quality experiences and a set of prior beliefs, we obtain the unobserved belief states,

̅ ,

, by conjugate updating formulae:

̅

(8) where

, and ̅

,

is the cumulative number of purchases of brand by consumer

∑

up to and including period . (We use

∙ as an indicator function.) We use

to

∑

denote the average quality experience observed by the consumer up to and including period . We introduce vector notation to simplify exposition. Let qualities, let

be the vector (over ) of mean

be the vector (over ) of the standard deviations of quality draws, and let

be the

vector (over ) of the standard deviations of the unobservable shocks. Let the sequence of quality draws up to and including period be

. And let

prices and unobservable utility shocks. Let

and

be the vectors (over ) of

∙ be a probability density function. Then the like-

lihood for household is given by:

(9)

Π

,

,

,

;

; ,

.

To compute the likelihood we integrate over quality draws and unobservable shocks. To integrate numerically we sample

sequences of quality draws (each sequence has

consumer ) from a multivariate normal distribution with parameters

27

draws for

and . We assume the

Learning from Experience, Simply

unobservable shocks follow zero-mean Gumbel distribution with homogenous variance

for all

brands. This assumption allows us to use the well-known logit formula to substantially simplify the computation of choice probabilities for all models, and the computation of value function used in approximately optimal strategies (Rust 1994). To use the logit formula for the index strategy, we linearize the index function as a linear function of the unobserved shocks

, while

preserving monotonicity: ,

Π

̅

,

0,

,

,

, 1, 0,

,

.

7.5. Estimation Results

Table 3 summarizes the fit statistics for the 651 diaper purchases in the in-sample estimation and the 676 purchases in the out-of-sample validation.

is an information-theoretic meas-

ure that calculates the percent of uncertainty explained by the model (Hauser 1978); AIC and BIC attempt to correct the likelihood function based on the number of parameters in the insample estimation, BIC more so than AIC. (There are no free parameters in the out-of-sample validation.) [Insert Table 3 about here.] First, on all measures there are sizable gains to learning – all learning models explain and predict brand choices substantially better than the no-learning strategy. Second, the index strategy improves in-sample fit and out-of-sample predictions relative to myopic learning. The likelihood is significantly better (

22,

0.002 in-sample;

sample), although the percent improvement in

13.52,

0.009 out-of-

is not large. This result is consistent with our

expectation that diaper buyers are forward-looking. Third, the index strategy outperforms the ap-

28

Learning from Experience, Simply

proximately optimal solution for in-sample fit and out-of-sample predictions, although the differences are small. (The models are not nested so a

test does not apply.) This empirical result is

consistent with the synthetic-data analysis. When two forward-looking strategies yield similar expected utilities, we expect them to be statistically indistinguishable. Table 4 summarizes the estimated parameter values. Across all learning models, all four brands increase substantially in mean quality relative to prior beliefs, which implies that diaper buyers learn by experience. Results are consistent with the switching patterns in Table 2. [Insert Table 4 about here.] Forward-looking models identify the uncertainty in utility shocks relative to ongoing quality uncertainty (last panel of Table 4). Because the relative shock uncertainty varies across brands, the index curve implies different behavior than myopic-learning for those brands. This explains why forward-looking models fit and predict better. For example, Huggies has lower relative shock uncertainty than other brands providing greater incentives for consumers to explore Huggies (as per the Corollary). Because the myopic-learning model ignores this difference, it compensates by underestimating the relative mean prior beliefs. Managerially, Huggies has a higher mean quality than Pampers or Luvs, but also higher ongoing relative uncertainty in quality across consumption. (The table reports the ratio of shock uncertainty to quality uncertainty – a smaller number means higher relative quality uncertainty.) Computational time in the embedded optimization problem is a rough surrogate for thinking costs. The last row of Table 3 reports the time per iteration of each model. Because the index strategy is substantially faster than the approximately optimal strategy, it is reasonable to posit that consumers view the index strategy as having lower thinking costs. (The myopic-learning strategy is even faster and is a reasonable model for categories in which forward-looking is less 29

Learning from Experience, Simply

likely as was predicted by the Corollary.) Both the index-strategy and the approximately-optimal strategy lead to similar parameter estimates. Parameter estimates of either model are usually within confidence regions of the alternative model. This result is consistent with the synthetic-data analyses that suggest that both strategies lead to near-optimal utility. If we accept cognitive simplicity as a paradigm, then the index strategy is preferred as a more plausible description of consumer behavior. In summary, using IRI data on diaper purchases we find that (1) learning models fit and predict substantially better than the no-learning model; (2) forward-looking learning models fit and predict significantly better than the myopic-learning model; (3) the index strategy and the approximately optimal solution achieve similar in-sample fit and out-of-sample forecasts, as well as reasonably close parameter estimates; and (4) computational (and thinking) costs favor the index-strategy model relative to the approximately optimal model. 8. Summary, Conclusions, and Future Research Models of forward-looking consumer learning are important to marketing. These theorydriven models examine how consumers make tradeoffs between exploiting and exploring brand information. Managerially, these models enable researchers to investigate effects due to ongoing quality variation, Bayesian learning, and the variation in utility shocks. However, the optimal solution (assuming no thinking costs) requires an assumption that consumers solve in their heads an optimization problem that is PSPACE-hard. Cognitive simplicity is a plausible theory that is rooted in consumer behavior, psychology, and economics. Cognitive simplicity assumes that consumers solve the meta problem that maximizes utility minus thinking costs. In this paper we propose and evaluate an alternative theory of learning – that consumers use an index strategy when looking forward. We prove analytically that an index strategy exists 30

Learning from Experience, Simply

for canonical consumer learning models and that the index function has simple properties that consumers can intuit. Using synthetic data we demonstrate that an index solution achieves near optimal expected utility, and is fast to compute. Using IRI data on diaper purchases, we show that at least one index solution fits the data and predicts out-of-sample significantly better than either a no-learning model or a myopic-learning model. The index strategy produces estimation results (and hence managerial implications) that are quite similar to an approximately optimal solution. The index solution has significantly lower computational costs and, we believe, is more likely to describe consumer behavior. We address many issues, but many issues remain. We abstract away from risk aversion, advertising as a quality signal (the IRI dataset for the diaper category does not track advertising), and inventory problems. Theoretically, risk aversion does not affect the indexability result. The consequence of incorporating advertising signals depends on how consumers learn. Although previous work has not found significant inventory effects (Ching et al. 2011), it is theoretically interesting to extend index strategies to capture inventory concerns. Finally, diaper buyers are likely forward-looking, but consumers in other product categories may not be. Our theory suggests that consumers are most likely to be forward-looking when shock uncertainty is small compared to quality uncertainty; we expect myopic-learning models to do well when shock uncertainty is large. An index solution appears to be a reasonable tradeoff for diaper consumers, but other cognitively-simple solutions might do even better. Future research can explore these solutions using either field data or laboratory experiments.

31

Learning from Experience, Simply

References Ackerberg, D. A. 2003. Advertising, learning, and consumer choice in experience good markets: an empirical examination. International Economic Review, 44 (3), 1007–1040. Bettman, J. R., M. F. Luce, and J.W. Payne. 1998. Constructive consumer choice processes. Journal of Consumer Research, 25 (3), 187–217. Brezzi, M. and T. L. Lai. 2002. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control, 27 (1), 87-108. Bröder, A. 2000. Assessing the empirical validity of the “take the best” heuristic as a model of human probabilistic inference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 5, 1332-1346. Bronnenberg, B. J., M. W. Kruger, and C. F. Mela. 2008. The IRI marketing data set. Marketing Science, 27 (4), 745 – 748. Bruner, J. S. 1957. On perceptual readiness. Psychological Review, 64, 123–152. Camerer, C. F. 2003. Behavioral Game Theory: Experiments in Strategic Interaction (Roundtable Series in Behavioral Economics). Princeton University Press. Chang, F. and T. L. Lai. 1987. Optimal stopping and dynamic allocation. Advances in Applied Probability, 19 (4), 829-853. Ching, A., T. Erdem, and M. P. Keane. 2011. Learning models: an assessment of progress, challenges and new developments. SSRN eLibrary. Ching, A. and M. Ishihara. 2010. The effects of detailing on prescribing decisions under quality uncertainty. Quantitative Marketing and Economics, 8, 123–165. Chintagunta, P., T. Erdem, P. E. Rossi and M. Wedel. 2006. Structural modeling in marketing: review and assessment, Marketing Science, 25 (6), 604-616.

32

Learning from Experience, Simply

Chintagunta, P., R. Jiang and G. Z. Jin. 2009. Information, learning, and drug diffusion: the case of Cox-2 inhibitors. Quantitative Marketing and Economics, 7 (4), 399–443. Gilbride, T. J. and G. M. Allenby. 2004. A choice model with conjunctive, disjunctive, and compensatory screening rules. Marketing Science, 23 (3), 391-406. Crawford, G. S. and M. Shum. 2005. Uncertainty and Learning in pharmaceutical demand. Econometrica, 73 (4), 1137–1173. Erdem, T. and M. P. Keane. 1996. Decision-making under uncertainty: capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Science, 15 (1), 1–20. Erdem, T., M. P. Keane, and B. Sun. 2008. A dynamic model of brand choice when price and advertising signal product quality. Marketing Science, 27 (6), 1111 – 1125. Erdem, T., M. P. Keane, T. S., and J. Strebel. 2005. Learning about computers: an analysis of information search and technology choice. Quantitative Marketing and Economics, 3, 207– 247. Gabaix, X., and D. Laibson. 2000. A boundedly rational decision algorithm. American Economic Review, 90 (2), Papers and Proceedings, 433-438. Gabaix, X., D. Laibson, G. Moloche, and S. Weinberg. 2006. Costly information acquisition: experimental analysis of a boundedly rational model. American Economic Review, 96 (4), 1043–1068. Gigerenzer, G. and D. G. Goldstein. 1996. Reasoning the fast and frugal way: models of bounded rationality. Psychological Review, 103 (4), 650–669. Gittins, J. 1989. Bandit Processes and Dynamic Allocation Indices. John Wiley & Sons, Inc.: New York, NY.

33

Learning from Experience, Simply

Gittins, J. and D. Jones. 1974. A dynamic allocation index for the sequential design of experiments. In J. Gani (Ed.), Progress in Statistics, North-Holland. Amsterdam, NL, 241–266. Goldstein, D. G. and G. Gigerenzer. 2002. Models of ecological rationality: the recognition heuristic. Psychological Review, 109 (1), 75 – 90. Guadagni, P. M. and J. D. C. Little. 1983. A logit model of brand choice calibrated on scanner data. Marketing Science, 2 (3), 203–238. Hauser, J. R. 1978. Testing the accuracy, usefulness and significance of probabilistic models: an information theoretic approach. Operations Research, 26 (3), 406-421. Hauser, J. R., O. Toubia, T. Evgeniou, D. Dzyabura, and R. Befurt 2010, cognitive simplicity and consideration sets. Journal of Marketing Research, 47, (June), 485-496. Hauser, J. R. and G. L. Urban. 1986. The value priority hypotheses for consumer budget plans. Journal of Consumer Research, 12 (4), 446–462. Hauser, J. R., G. L. Urban, G. Liberali, and M. Braun. 2009. Website morphing. Marketing Science, 28 (2), 202–223. Hauser, J. R., G. L. Urban, and B. Weinberg. 1993. How consumers allocate their time when searching for information. Journal of Marketing Research, 30 (4), 452-466. Hutchinson, J. M. C. and G. Gigerenzer. 2005. Simple heuristics and rules of thumb: where psychologists and behavioural biologists might meet. Behavioural Processes, 69, 97-124. Jacoby, L. L., and M. Dallas. 1981. On the Relationship Between Autobiographical Memory and Perceptual Learning. Journal of Experimental Psychology: General, 110, 306–340. Johnson, E. J. and J. W. Payne. 1985. Effort and accuracy in choice. Management Science, 31, 395-414.

34

Learning from Experience, Simply

Kim, J. B., P. Albuquerque, and B. J. Bronnenberg. 2010. Online demand under limited consumer search. Marketing Science, 29 (6), 1001–1023. Kohli, R., and K. Jedidi. 2007. Representation and inference of lexicographic preference models and their variants. Marketing Science, 26 (3), 380-399. Marewski, J. N., R. F. Pohl, and O. Vitouch. 2010. Recognition-based judgments and decisions: introduction to the special issue (Vol. 1). Judgment and Decision Making, 5 (4), 207–215 Martignon, L. and U. Hoffrage. 2002. Fast, frugal, and fit: simple heuristics for paired comparisons. Theory and Decision, 52, 29-71. Mehta, N., X. J. Chen, and O. Narasimhan. 2008. Informing, transforming, and persuading: disentangling the multiple effects of advertising on brand choice decisions. Marketing Science, 27 (3), 334 – 355. Narasimhan, C. 1988. Competitive promotional strategies. Journal of Business, 61 (4), 427 – 449. Narayanan, S. and P. Manchanda. 2009. Heterogeneous learning and the targeting of marketing communication for new products. Marketing Science, 28 (3), 424 – 441. Narayanan, S., P. Manchanda, and P. K. Chintagunta. 2005. Temporal differences in the role of marketing communication in new product categories. Journal of Marketing Research, 42 (3), 278–290. Papadimitriou, C. H. and J. N. Tsitsiklis. 1999. The complexity of optimal queuing network control. Mathematics of Operations Research, 24 (2), 293–305. Payne J.W., J. R. Bettman and E. J. Johnson. 1988. Adaptive strategy selection in decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 534552.

35

Learning from Experience, Simply

Payne J.W., J. R. Bettman and E. J. Johnson. 1993. The Adaptive Decision Maker. Cambridge University Press: Cambridge UK. Roberts, J. H. and G. L. Urban. 1988. Modeling multiattribute utility, risk, and belief dynamics for new consumer durable brand choice. Management Science, 34 (2), 167–185. Rust, J. 1994. Structural estimation of Markov decision processes. Handbook of Econometrics, Chapter 51. Elsevier: Maryland Heights, MO. Rust, J. 1996. Numerical dynamic programming in economics, Handbook of Computational Economics, Chapter 14. Elsevier: Maryland Heights, MO. Shugan, S. M. 1980. The cost of thinking. Journal of Consumer Research, 7 (2), 99–111. Simon, H. A. 1955. A behavioral model of rational choice. Quarterly Journal of Economics, 69 (1), 99–118. Simon, H. A. 1956. Rational choice and the structure of the environment. Psychological Review, 63 (2), 129–38. Tversky, A. and D. Kahneman. 1973. Availability: a heuristic for judging frequency and probability. Cognitive Psychology, 5, 207–232. Whittle, P. 1988. Restless bandits: activity allocation in a changing world. .Journal of Applied Probability, 25, 287–298. Yee, M., E. Dahan, J. R. Hauser and J. Orlin. 2007. Greedoid-based noncompensatory inference. Marketing Science, 26 (4), (July-August), 532-549.

36

Learning from Experience, Simply

Table 1. Comparing Decision Strategies on Expected Utility and Thinking Costs Average Consumer Utility [subject to affine transformation] (standard errors in parentheses) No Learning

Number of grid points Computation time (surrogate for thinking costs)

n/a

Myopic Learning

n/a

Index Strategy

Approximately Optimal

200 x 50

(200 x 50)2

102 seconds

6 x 104 seconds

negligible

negligible

0.041

1.801

1.992

1.996

(0.003)

(0.043)

(0.045)

(0.045)

0.618

3.352

3.544

3.547

(0.003)

(0.049)

(0.052)

(0.052)

3.036

5.298

5.323

5.327

(0.003)

(0.056)

(0.056)

(0.056)

4.919

5.762

5.767

5.768

(0.026)

(0.047)

(0.047)

(0.047)

6.182

7.150

7.190

7.190

(0.027)

(0.050)

(0.052)

(0.052)

7.946

8.912

8.911

8.912

(0.026)

(0.054)

(0.054)

(0.054)

Relatively low uncertainty in utility shocks Mean of prior quality beliefs (Brand 1, Brand 2) ( ̅ , ̅ ) = (0.0, – 0.3)

( ̅ , ̅ ) = (0.0,

0.0)

( ̅ , ̅ ) = (0.0, + 0.3)

Relatively high uncertainty in utility shocks Mean of prior quality beliefs (Brand 1, Brand 2) ( ̅ , ̅ ) = (0.0, – 0.3)

( ̅ , ̅ ) = (0.0,

0.0)

( ̅ , ̅ ) = (0.0, + 0.3)

37

Learning from Experience, Simply

Table 2. Transition Percentage among Diaper Brands Percent of Households that Purchase Column Brand in Period Purchased Row Brand in Period Pampers

Huggies

Luvs

Pampers

66.7%

19.7%

10.9%

2.7%

Huggies

23.1%

64.3%

11.3%

1.2%

Luvs

18.1%

19.7%

58.8%

3.4%

Other Brands

21.2%

21.2%

20.0%

37.6%

Pampers

76.5%

12.6%

9.7%

1.1%

Huggies

18.7%

75.9%

4.9%

0.5%

Luvs

20.7%

9.5%

67.5%

2.3%

Other Brands

14.1%

9.0%

23.1%

53.8%

1 if

Other Brands

Within the first eight purchases

After the first eight purchases

Table 3. In-Sample and Out-of-Sample Fit Statistics for Diaper Data Estimation No Learning

Myopic Learning

Index

Approximately

Strategy

Optimal

In-sample estimation statistics Log likelihood

- 777.17

- 480.81

- 469.81

- 472.16

U2 (percent information)

78.50%

86.68%

86.99%

86.92%

AIC

1562.34

985.63

971.63

976.33

BIC

1580.25

1039.37

1043.29

1047.99

Number of parameters

4

12

16

16

Out-of-sample validation statistics Log likelihood

- 1058.15

- 745.55

- 738.79

- 741.63

U2 (percent information)

78.77%

85.04%

85.18%

85.15%

0.15

0.29

40.5

857.8

Computational Time (sec)

38

Learning from Experience, Simply

Table 4. Maximum Likelihood Estimates for Prior Beliefs, Mean Quality, and the Relative Magnitude of the Variation in Utility Shocks No

Myopic

Index

Learning

Learning

Strategy

Approximately Optimal

Relative mean of prior beliefs, ̅ Pampers

0.00

0.00

Huggies

-0.29

-0.08

-0.87

-0.48

(0.10)

(0.18)

(0.21)

(0.12)

-0.66

-0.90

-1.40

-1.19

(0.12)

(0.22)

(0.19)

(0.81)

-2.80

-2.49

-2.03

-2.97

(0.25)

(0.36)

(0.57)

(1.64)

–

Luvs Other Brands Uncertainty of prior beliefs,

–

, relative to ongoing quality uncertainty,

Pampers Huggies Luvs Other Brands ‡

0.00 –

0.00 –

†

–

0.56

0.56

0.62

–

(0.12)

(0.07)

(0.16)

–

0.44

0.43

0.47

–

(0.09)

(0.01)

(0.05)

–

1.50

1.54

2.01

–

(0.51)

(0.07)

(1.59)

–

11.68

17.67

13.88

–

(41.08)

(109.44)

(15.27)

Mean quality (long-term), Pampers Huggies Luvs Other Brands ,

Uncertainty in utility shocks, Pampers Huggies Luvs Other Brands †

Standard errors relative to

.

‡

–

4.84

3.13

2.69

–

(0.74)

(0.71)

(0.56)

–

6.07

6.36

5.60

–

(1.09)

(0.58)

(0.11)

–

3.21

2.26

1.95

–

(0.45)

(0.50)

(0.67)

–

0.71

0.54

0.45

–

(0.55)

(0.37)

(0.32)

relative to ongoing quality uncertainty,

†

–

–

0.70

0.65

–

–

(0.25)

(0.27)

–

–

0.11

0.11

–

–

(0.04)

(0.04)

–

–

0.57

0.56

–

–

(0.12)

(0.23)

–

–

2.65

2.02

–

–

(0.53)

(0.86)

The likelihood is particularly flat in this parameter estimate.

39

Learning from Experience, Simply

Figure 1. Index Strategies Balance Utility and Cognitive Simplicity (Conceptual Diagram)

Utility

Optimal Utility Approximately Optimal

Index Strategy

Myopic Learning

No Learning

Cognitive Simplicity

40

Learning from Experience, Simply

Figure 2: Gittins’ Index (Evolution and Variation) 1.0 Brand 1

Realized Gittins' Index

0.8

Brand 2 0.6 0.4 0.2 0.0 0

5

10

15

20

‐0.2

25

30

35

40

45

50

Period

‐0.4 ‐0.6 ‐0.8

(a) Realized Indices for Two Brands (Consumer chooses brand with largest index value.) 1.0 0.9

Gittins' Index

0.8 Mean of Posterior Belief = 0

0.7 0.6

Mean of Posterior Belief = 0.2

0.5 0.4 0.3 0.2 0.1 0.0 0

5

10

15

20

25

30

35

40

Period (b) Index Values (Normalized) Vary with the Consumer’s Experience

41

45

50

Learning from Experience, Simply

Figure 3. Whittle’s Index as Utility Shock Magnitude and Experience Vary 1 0.9

Whittle's Index

0.8 0.7

Shock Magnitude = 0.01

0.6

Shock Magnitude = 0.1 Shock Magnitude = 1.0

0.5

Shock Magnitude = 5.0 0.4 0.3 0.2 0.1 0 0

5

10

15

20

25

Period

42

30

35

40

45

50

Learning from Experience, Simply

Online Appendix A. Proof of Indexability Without loss of generality, we set observable shock

to zero and use

to represent all

utility shocks. The focus is on the sub-problem where the consumer chooses a uncertain brand against a certain reward . To simplify notation, we drop the brand identifier . The Bellman equation for this problem is: ,

(A1)

,

max where

,

,

|

|

,

,

,

| ,

,

summarizes the consumer’s belief about the brand at time t. The definition of

indexability is that, for any state

, if it is optimal to choose the fixed reward , then it must

,

be also optimal to choose the fixed reward (A2)

,

,

for any

|

⇒

. This is equivalent to the following:

|

,

,

,

| ,

,

,

,

|

| , 1

0

.

Intuitively, this is saying the expected future value of choosing the uncertain brand should not grow too much compared to that of choosing the fixed reward , as

increases. It

turns out the assumptions of the canonical problem of consumer learning are sufficient, though not necessary. We first define the expected value function (A3)

,

,

,

|

,

by integrating out

:

.

The sub-problem can then be reduced to the fixed point: (A4)

,

max

,

,

|

To see this, using definition in Equation (A3): 43

,

|

.

Learning from Experience, Simply

,

,

max

,

,

,

|

,

|

,

,

|

,

max

,

,

|

max

,

,

|

| ,

,

,

|

,

used the assumptions that

and

,

=

,

are independent of

|

|

where in the second equality we have used the assumption that ,

| ,

|

,

,

and

are i.i.d. so that

. In the third equality, we have

, and

.

Denote 0 as the option of the certain reward , and 1 as the uncertain brand. We define the following quantities: ,

(A5)

,

   and

,

|

,

|

.

, ,

.

First observe that the conditional probability of choosing 1 is given by: (A6)

1| ,

,

max

,

,

,

,

,

The last equality is by interchanging the integration and differentiation and uses the definition of

function. Similarly we have

44

Learning from Experience, Simply

(A7)

, ,

0| ,

.

Differentiate both sides of Equation (4) with respect to (A8)

,

0| ,

,

1

, ,

,

and use the Chain Rule:

,

, ,

1| ,

,

,

,

|

where the last equality uses Equations (A5), (A6), and (A7). The following lemma is useful to establish indexability. Lemma 1. For all , , we have

(A9)

0

,

1 1

.

Proof. Fix any , , and . Suppose π* is the optimal policy that solves

tive constant c is added only to the fixed reward

, ,

. First, if a posi-

in every period but the uncertain brand re-

mains unchanged, then following π* yields an expected total utility at least at large at Therefore,

, ,

, ,

, ,

. Second, if a positive constant c is added to both the fixed

reward and the uncertain brand in every period, then π* is optimal and yields expected total utility of

, ,

c/ 1

. By construction, adding a positive constant to both options

yields expected utility at least as high as adding the constant only to the fixed reward: , ,

c/ 1

, ,

(A10)

. Integrating out we have:

,

,

c

,

1

It follows that: (A11)

0

.

,

,

1 1

Taking the limit on both sides as c→0 establishes the lemma. 45

.

.

Learning from Experience, Simply

Lemma 1 implies that 0

,

tion (A8), implies that 1 ,

(A12)

1

,

,

,

. This result, together with Equa-

,

1

. It then follows that:

|

,

,

|

0,

which establishes the indexability condition in Equation (A2). Online Appendix B. Proof of Proposition 2 We first prove two useful lemmas. The focus is again on the sub-problem of a single brand and thus we drop the brand identifier j. Lemma 2. Fix a prior

, and a quality sample

̅ ,

0 . Consider a modified ver-

:

sion of the original sub-problem where the utility shocks become fixed reward becomes

. Denote

and

for all t, and the

as the expected value and index value

for the modified problem. Then for any belief state , we have: (A13)

,

(A14)

,

Proof. We first prove that

, ;

,

1

,

, ;

,

.

satisfies the fixed points defined by Equation (A4) for the modi-

fied problem. Suppose Equation (A13) holds, then ,

(A15)

, ,

(A16)

,

|

,

|

,

46

,

1

1

,

| |

1

,

1

,

Learning from Experience, Simply

where the definitions of

and

∆

,

,

,

follow Equation (A5) for the modified problem. Define . Then ∆

,

∆

for the modified problem.

,

The assumption that the distribution of has scale and location parameters implies ∆

(A17)

∆

Pr

Pr

∆ , and

∆

. ∆

∆

The right hand side of Equation (A4) for the modified problem becomes (A18)

,

1

∆

,

∆

∆

,

,

1

1

∆

∆

1

,

∆

,

∆

∆

, =

1

∆

1

1 ,

,

which is the left hand side of Equation (A4). The second equality follows from Equations (A15) to (17). The third equality uses the fact that fore,

,

,

is the fixed point of Equation (A4). There-

also satisfies the fixed point of Equation (A4) for the modified problem.

For the second part of the lemma, we use the definition of Whittle index which is the breakeven value of (A19)

such that the two terms inside the curly brackets of Equation (1) are equal: ,

,

.

It suffices to show the proposed relation in Equation (A14) solves the above equality. Note the right hand side of Equation (A19) is

47

Learning from Experience, Simply

,

,

,

,

1

1 ,

1

,

which is the left hand side. The first and last equalities follow from Equations (A15) and (A16). The third equality follows from the definition of Whittle index for the original problem ting

). Then by the definition of Whittle index,

, , ;

,

,

;

,

.

Lemma 3. Fix the original sub-problem. Consider a modified problem where the quality sample

becomes

0 , the utility shocks becomes

:

becomes

̅ ,

for all ,

̅ ,

̅

, and the fixed reward becomes

,

̅

for all t, the prior belief

. Denote

,

and

. Then

as the expected value and in-

dex value for the modified problem. Then for any belief state , we have: (A20)

,

(A21)

,

, ;

,

,

1 , ;

,

.

Proof. The proof is similar to that of Lemma 2. Note that the Bayesian updating implies that for

all t the precision

of the modified problem remains the same as that of the original problem:

(A22)

.

It follows that the updated posterior mean and variance have the following relationships: (A23) (A24)

̅

1

̅

1 1

1

48

̅

̅ .

,

Learning from Experience, Simply

Therefore, the belief state in the next period preserves the relationship: ̅

,

̅

.

,

We then check whether

satisfies the fixed points defined by Equation (A4) for the

modified problem. Suppose Equation (A20) holds, then ,

(A26)

,

,

,

(A27)

|

,

,

,

,

,

1

| |

1

|

,

| ,

1 | ,

,

,

1

|

|



1

.

1

The second equality uses the fact that

,

,

/ 1

.

The third equality follows from normality and conjugate prior assumptions for distribution of quality ∆

,

and belief

. Then ∆

,

,

,

for the modified problem. The assumption that the distribution of has scale and loca-

tion parameters implies (A28)

∆

.

∆ , ∆

∆

The right hand side of Equation (A4) for the modified problem becomes 49

Learning from Experience, Simply

(A29)

,

1

∆

,

∆

∆

,

,

1

∆

1

1

∆

∆

,

,

∆

∆

, =

1

∆

1

1 ,

,

which is the left hand side of Equation (A4). The second equality follows from Equations (A26) to (A28). The third equality uses the fact that Therefore,

,

is the fixed point of Equation (A4).

also satisfies the fixed point of Equation (4) for the modified problem.

,

For the second part of the lemma, we again use the definition of Whittle index in Equation (A19). It suffices to show the proposed relation in Equation (A21) solves the above equality. Note the right hand side of Equation (A19) is

,

,

,

,

,

1

1

,

which is the left hand side. The first and last equalities follow from Equations (26) and (27). The third equality follows from the definition of Whittle index for the original problem ting

). Then by the definition of Whittle index,

, , ;

,

;

,

.

,

To complete the proof of the proposition, note that by Lemma 3 we have

̅

,

,

;

,

,

̅, , ; ,

50

,

.

Learning from Experience, Simply

Setting

1/ and

̅ / yields



̅, , ; ,

̅

,

̅

0,

̅

0,

,

,

; 1,

,

; 1, 0,

0,

,

; 1, 0,

,

where the second equality uses Lemma 2. Online Appendix C. Proof of Proposition 3 The focus is again on the sub-problem of a single brand and thus drop the brand identifier j. Proof of Proposition 3(1). The first part that the Whittle index is increasing in posterior mean ̅

is evident from Proposition 2. For the second part, fix some belief state and consider any . Let

,

(A30)

and

be the corresponding Whittle index. Recall that ∆ ,

. Then by the definition of index, ∆ ,

  ∆

,

where the inequality is implied by (A12). It then follows that

,

. Note that

∆ ,

∆ ,

,

,

0 , .

Proof of Proposition 3(2). The will prove the first part, and omit the second part which uses a

similar argument. Fix some tionally i.i.d. from distribution

. Let

,

. Let

,

conditionally i.i.d. from distribution

, … be a sequence of random variables condi,

, … be a sequence of random variables . The two sequences are independent. Con-

,

for all . Then

struct a sequence of random variables such that ditionally i.i.d. from distribution

,

. Fix some policy

51

,

, … are con-

applied to solve the problem un-

Learning from Experience, Simply

der

, … . Denote

,

0 if the fixed reward

,

wise. Then the value function under state

,

is chosen, and

1 other-

,

when

is applied becomes

,

, , ;

,

1

,

1

,

1

,

,

,

1

,

0

0

,

,

,

0

,

,

,

,

,

, , ;

where the last equality uses the fact that the second term is equal to zero. Note that , , ;

, , ;

because the latter is the optimal value function. Therefore

, , ;

, , ;

for all

, , ;

. Integrating out then yields

, , ; we have

, ;

tion (A4) with respect to

, ;

0| , ;

and taking maximum on the left hand side gives

⁄

, ;

, ;

, ;

0 for all , , . Differentiating both sides of Equa-

and using chain rule give: , ; , ;

, ;,

. Then

, ;

1| , ; ,

The last equality implies that 52

, ; , ;

, ;

, ;,

|

.

Learning from Experience, Simply

, ;

1| , ; , 0| , ;

1

, ;

|

.

, ;,

|

It then follows that



∆

, ;

1

, ;

1| , ; 0| , ;

and

tion ∆ ,

∆ ,

;

, ;

1 1

, ;

0| , ;

where the last inequality uses corresponding to

1

, ;

0. Let

|

. This inequality implies ∆ , ;

, we have ∆ ,

|

|

0,

and

be the Whittle index

;

∆ ,

;

∆ ,

;

;

. Since by defini-

. It then follows that

by Equation (A30). Proof of Proposition 3(3). Consider any



̅, , ; ,

. By the invariance property we have

,

̅,

̅,

̅,

, ; ,

, ; ,

, ;

,

,

,

,

,

where the first equality follows from Proposition 3(2). Online Appendix D. Computation of Index Function We can use the invariance property to simplify the computation of Whittle index. The computation is based on the fixed point problem in Equation (A4) and the definition of Whittle

53

Learning from Experience, Simply

index. Product identifier j is dropped to simplify notation. Note that the

function evaluated at

0 is ̅

0,

,

max

max

0,

,

,

max

0,

,

,

0,

max

0,

,

,

0,

where

0,

,

,

̅

,

,

1

|0,

̅,

,

,

1

,

|0,

|0,

|0,

,

is the precision. The first equality is implied by Bayesian updating

⁄

formula for normal distribution. The second equality uses the fact that the expectation of ̅ conditional on ̅

0 are both zero, and Equation (21) from Lemma 3. The last equality again

uses zero expectation of We now treat conditional on belief fore

| ~

0,

,

conditional on ̅

0.

as a state variable. Let

. Note that the distribution of

is normal with mean ̅ and standard deviation

̅ ,

. There-

. The fixed point problem now only involves

,

0, and evolves on the state space

fixed at ̅

and

max

0,

,

,

,

function

: 0,

,

|0,

,

.

Standard dynamic programming algorithms can be used to solve the above fixed point. Given the solution of

0,

dom shocks under ̅

0:

,

, we can compute the Whittle index for various values of ran0, , ; ,

,

. The index evaluated at any value of posterior

mean ̅ is then computed by linear summation as implied by the invariance property. 54

Learning from Experience, Simply

Online Appendix E. Simulation of Value Function Given a decision rule Π, we compute the value function (expected total utilities) by forward simulating utilities for a sufficiently long horizon. Starting at a given state we sample a large number D of Markov chains for each brand where

,

:

,

,

, ,

1, 2, …

is greater than the truncated time horizon . Bayesian updating of normal distribution

leads to the following state transition:

̅

,

| ̅

~

,

,

̅

,

~

,

;

,

,

,

,

, where

|

,

,

1

,

,

,

,

,

,

.

These sequences of belief states are then fixed in advance and reused for each decision rule. Under a decision rule Π, the empirical estimate of its expected total utility for a truncated time horizon

,

where

is given by: 1

Π

,

̅

,

,

,

,

,

is the cumulative number of trials for brand up to time . Note that the realized state

values are chosen from the pre-drawn sample paths, with path is chosen.

55

indicating which state in the sample

Recommend Documents

Learning Conflicts From Experience - Semantic Scholar

Learning Semantic Combinatoriality from the ... - Semantic Scholar