Solution Concepts in A-Loss Recall Games: Existence and Computational Complexity
arXiv:1608.01500v1 [cs.GT] 4 Aug 2016
ˇ Jiˇr´ı Cerm´ ak, Branislav Boˇsansk´ y, and Michal Pˇechouˇcek Dept. of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic, {cermak,bosansky,pechoucek}@agents.fel.cvut.cz
Abstract. Imperfect recall games represent dynamic interactions where players forget previously known information, such as history of played actions. The importance of imperfect recall games stems from allowing a concise representation of strategies compared to perfect recall games where players remember all information. However, most of the algorithmic results are negative for imperfect recall games – a Nash equilibrium (NE) does not have to exist and computing a best response or a maxmin strategy is NP-hard. We focus on a subclass of imperfect recall games, called A-loss recall games, where a best response can be found in polynomial time. We derive novel properties of A-loss recall games, including (1) a sufficient and necessary condition for the existence of NE in A-loss recall games, and (2) NP-completeness of problems related to finding maxmin strategies and existence of a NE strategy.
1
Introduction
Dynamic games with finite number of moves are modeled as extensive-form games (EFGs). These games are visualized as game trees where nodes correspond to states of the game and edges to actions executed by players. This representation is general enough to model stochastic events and imperfect information when players are unable to distinguish among several states. If we assume that all players perfectly remember history and all information gained during the course of the game, we say that the game has perfect recall. If this assumption does not hold1 , we talk about EFGs with imperfect recall, or imperfect recall games. There are several strategy representations that can be used in EFGs. A pure strategy assigns one action to every decision point of the game, a mixed strategy is a probability distribution over the set of pure strategies. A behavioral strategy, on the other hand, uses a probability distribution over actions in every decision point. In perfect recall games, there is an equivalence between mixed and behavioral strategies [11]. However, this equivalence no longer holds in imperfect recall games. 1
For example because a player does not distinguish between two states reached by playing two different actions of this player.
2
Fig. 1. An EFG with imperfect recall, circle nodes represent the states of the game, numbers in the circles shows which player acts in that node (player 1 or 2), dashed lines represent indistinguishable states, and box nodes are the terminal states with utility values (player 1 maximizes the utility, player 2 minimizes). This game has A-loss recall, but a NE in behavioral strategies does not exist (from [16]).
Example (known): Consider the game depicted in Figure 1. This game has 4 pure strategies for player 1 S = {(a, c), (a, d), (b, c), (b, d)}. A mixed strategy can condition the actions of players on information that the players should forget. For example, a mixed strategy where (a, c) and (b, d) are played with uniform probability 0.5 allows player 1 to condition playing c and d on the outcome of the non-deterministic choice in the root of the game, and thus randomize between the leftmost and the rightmost state in information set of player 2. Note that one cannot model the same behavior using a behavioral strategy that assigns for every decision point a probability distribution over the actions available, and therefore no additional information can be disclosed to the player. To exploit the concise strategy representation allowed in imperfect recall games, however, one must seek optimal behavioral strategies. Finding optimal behavioral strategies in imperfect recall games is a difficult problem. First of all, main game-theoretic solution concept, Nash equilibrium (NE), does not have to exist in behavioral strategies even in zero-sum games (Figure 1 provides an example from [16] of such a game) and checking whether an imperfect recall game has a NE in behavioral strategies was shown to be an NP-hard problem [3]2 . Secondly, many related problems are NP-hard in imperfect recall games including computing a best response, or computing mixed maxmin strategies [7,3], or Σ2p -complete when computing pure maxmin strategies [7]. Finally, a maxmin strategy may require to use irrational numbers even in games with rational payoffs [7]. The existing negative results rise a natural question about the existence and computation of NE in behavioral strategies in imperfect recall games: Is there a subclass of imperfect recall games where the existence of NE can be exactly characterized and optimal strategies computed? Answering this question can help 2
Recall that the original proof of the existence of a NE for finite games considers only mixed strategies [13] that, as we showed in the example, are not well-suited for imperfect recall games.
3
in finding a concise representation of dynamic games that does not require the perfect memory and thus allowing solving large dynamic games. Previous works answer this question only in part – specifically, by defining the (skewed) wellformed games as a subset of imperfect recall games. In (skewed) well-formed games, players are only allowed to forget information that leads to joining two decision points that have similar histories and there exists a mapping between terminal states reachable from these decision points such that the difference of payoffs between the mapped leafs is bounded. With this assumption, standard perfect-recall algorithms (namely Counterfactual Regret Minimization; CFR) are still guaranteed to find an approximate NE (no-regret) behavioral strategy [12,10]. However, the restrictions defined by (skewed) well-formed games are rather strict and they give only a sufficient condition for the existence of a NE and for the convergence of CFR. We focus on the subclass of A-loss recall games [4,5] that form a significantly less restrictive subset of imperfect recall games compared to (skewed) well-formed games. In A-loss recall games, every loss of information of a player can be tracked to forgetting her own action. A positive effect of this assumption is that computing a best response is in P (recall that the problem is NP-hard in general imperfect recall games). Our results show that the assumption for A-loss recall games also allows us to characterize the existence of a NE in behavioral strategies, and additionally have impact on the computational complexity of computing maxmin strategies. Specifically: – we give sufficient and necessary conditions for any two-player A-loss recall game to have a NE in behavioral strategies (Theorem 1), – we show that NE strategies use only rational numbers, if they exist (Theorem 2), – it is NP-complete (NPC) to check whether there exists a Nash equilibrium in a two-player zero-sum A-loss recall game (Theorem 7), – a maxmin strategy may still need irrational numbers in A-loss recall games with rational payoffs (Theorem 3), – computing maxmin behavioral strategy for a player is NPC for a rational maxmin value (Theorem 5).
2
Extensive-Form Games
We give necessary technical introduction to extensive-form games (EFGs) and define different types of recall in EFGs. A two player EFG G is a tuple {P, H, Z, P, u, I, A}. P = {1, 2} denotes the set of players. We use i to refer to a player and −i to refer to the opponent of i. Set H contains all the states of the game. P : H → P ∪{c} is the function associating a player from P or the nature c with every h ∈ H. The nature c represents the stochastic environment of the game. Z ⊆ H is a set of terminal states. ui (z) is a utility function assigning to each leaf the value of preference for player i; ui : Z → R. For zero-sum games it holds that ui (z) = −u−i (z), ∀z ∈ Z. The imperfect information is defined using the information sets. Ii is a partitioning
4
Fig. 2. An EFG without A-loss recall where a local best response (playing the best action in an information set) is not necessarily the ex ante best response.
of all {h ∈ H : P (h) = i} into these information sets. All states h contained in one information set Ii ∈ Ii are indistinguishable to player i. The set of available actions A(h) is the same ∀h ∈ Ii . We overload the notation and use A(Ii ) as actions available in Ii . A pure strategy si for player i is a mapping assigning ∀Ii ∈ Ii a member of A(Ii ). Si is a set of all pure strategies for player i. A mixed strategy mi is a probability distribution over Si , set of all mixed strategies of i is denoted as Mi . Behavioral strategy bi assigns a probability distribution over A(Ii ) for each Ii . Bi is a set of all behavioral strategies for i, Bip ⊆ Bi denotes the set of deterministic behavioral strategies for i. A sequence σi is a list of actions of player i ordered by their occurrence on the path from the root of the game tree to some node. By seqi (h) we denote the sequence of player i leading to the state h. A strategy profile is a set of strategies, one strategy for each player. A mixed strategy can be formulated as a realization plan ri that for a sequence σi represents the probability of playing actions in σi assuming the other players play such that the actions of σi can be P executed. Realization plan ri has to satisfy network flow property; i.e., ri (σi ) = a∈A(Ii ) ri (σi · a), where Ii is an information set reached by sequence σi and σi ·a stands for σi extended by action a. We say that σi′ is a continuation of σi if σi forms a prefix of σi′ . Ri is a set of realization plans for i, Rpi ⊆ Ri denotes the set of pure realization plans for i. We say that a pair of strategies with arbitrary representation is behaviorally equivalent if they generate the same probability distribution over all z ∈ Z. We overload the notation and use ui as the expected utility of player i when the players play according to pure (mixed, behavioral) strategies or realization plans.
2.1
Perfect, Imperfect, and A-loss Recall
A game has perfect recall iff ∀i ∈ P ∀Ii ∈ Ii all the states in Ii share the same history for i. If there exists at least one information set where this does not hold, the game has imperfect recall.
5
Interestingly, many properties that hold for perfect recall games do not hold for imperfect recall games. Consider, for example, a computation of a best response for a player. In so called absent minded games, where there exists a path from the root of the game tree to some leaf such that an information set Ii ∈ Ii is visited more than once on this path, the best response may need randomization (e.g., in the game with absentminded driver [14]). Even in imperfect recall games without absent-mindedness, playing a best action in each information set (called a time consistent strategy [5]) need not result in an ex ante best response. Consider example game in Figure 2 between player 1 and chance. The ex ante best response of player 1 in this game is to play B,D,F getting the utility of 5−ε 2 . Note, however, that when player 1 chooses the best action to play in each information set I separately given possible beliefs over states in I, a strategy B,C,E with the expected utility of 2 satisfies this condition. The equivalence between time consistent strategies and ex ante best responses is shown to hold in a subset of imperfect recall games called A-loss recall games, Additionally, the computation of the best response is in P in Aloss recall games [4,5]. This motivates our focus on A-loss recall games in this paper. Definition 1. Player i has A-loss recall if and only if for every I ∈ Ii and nodes h, h′ ∈ I it holds either (1) seqi (h) = seqi (h′ ), or (2) ∃I ′ ∈ Ii and two distinct actions a, a′ ∈ Ai (I ′ ), a 6= a′ such that a ∈ seqi (h) ∧ a′ ∈ seqi (h′ ). Condition (1) in the definition says that if player i has perfect recall then she also has A-loss recall. Condition (2) says that if there exists an information set of player i where two distinct actions a, a′ have been taken to reach two different nodes in an information set I then player i has A-loss recall. Note that player 1 does not have A-loss recall in the example game since parents of the nodes in information set I3 are in two distinct information sets I1 , I2 and their common predecessor is a chance node. On the other hand, the example game from Figure 1 has A-loss recall.
3
Existence of Nash Equilibria in A-loss Recall Games
In this section we provide both sufficient and necessary conditions for the existence of Nash equilibrium in behavioral strategies in two-player A-loss recall games without absent mindedness. Let us first define a Nash equilibrium in behavioral strategies. Definition 2. We say that strategy profile b = {bi , b−i } is a Nash equilibrium in behavioral strategies iff ∀i ∈ P ∀bpi ∈ Bip : ui (bi , b−i ) ≥ ui (bpi , b−i ) Next, we define a partition H(Ii ) of states in every information set Ii of some imperfect recall game G to the largest possible subsets, not causing imperfect recall. More formally, let H(Ii ) = {H1 , ..., Hn } be a disjoint partition of all Sn h ∈ Ii , where j=1 Hj = Ii and ∀Hj ∈ H(Ii ) ∀hk , hl ∈ Hj : seqi (hk ) = seqi (hl ), additionally ∀Hk , Hl ∈ H(Ii ) : seqi (Hk ) 6= seqi (Hl ).
6
Definition 3. The coarsest perfect recall refinement G′ of the imperfect recall game G = {P, H, Z, P, u, I, A} is a tuple {P, H, Z, P, u, I ′ , A′ }, where ∀i ∈ P ∀Ii ∈ Ii H(Ii ) defines the information set partition I ′ . A′ is a modification of A, which guarantees that ∀I ∈ I ′ ∀hk , hl ∈ I A′ (hk ) = A′ (hl ), while ∀I k , I l ∈ I ′ A(I k ) 6= A(I l ). In the Definition 3 we change the labelling of actions described by A to A′ , since we modify the structure of the imperfect information I to I ′ . We denote by Φ : E → A′ a function which for an edge e ∈ E, where E is the set of all edges of the game tree of G (and therefore of G′ ), returns its action label in G′ , similarly we define Ψ : E → A. In the rest of this section, when we talk about the equivalence of arbitrary strategy representation or sequences in G and G′ , we talk about the equivalence with respect to Φ and Ψ . Same goes for applying the strategy from G to G′ and vice versa. Lemma 1. When player i plays according to a pure realization plan ri in A-loss recall game G only states in one Hk ∈ H(Ii ) in every imperfect recall information set Ii of i can have positive probability of occurrence if Ii gets visited. Proof. This directly follows from the A-loss recall property.
⊓ ⊔
pG
Lemma 2. The set of pure realization plans R of an arbitrary imperfect recall ′ game G forms a subset of Rp G of its coarsest perfect recall refinement. Proof. This holds since G is created from G′ by merging information sets and therefore invalidating the mutually exclusive pure realization plans. ⊓ ⊔ Lemma 3. The set of pure realization plans Rp G of an A-loss recall game G is ′ equal to Rp G of its coarsest perfect recall refinement. ′
Proof. First, we prove that ∀i ∈ P ∀r′ i ∈ Rp G, r′ i forms a pure realization plan of G. This follows from Lemma 1, as each Hk coincides with an information set in G′ and i can choose actions independently in every Hk when following pure realization plan in G. ri′ must therefore be a valid pure realization plan of G. Next, we show that ∀i ∈ P ∀ri ∈ Rp G it holds that ri forms a pure realization plan of G′ . This directly follows from Lemma 2. ⊓ ⊔ Lemma 4. Every pure realization plan of an arbitrary imperfect recall game G can be represented as a deterministic behavioral strategy and vice versa. Proof. It is straightforward to show that we are able to create a behavioral G G strategy bpi ∈ Bip equivalent to any rip ∈ Rpi . Moving from the root of the game tree of G for every Ii , for which rip (Ii ) is defined, we create b(Ii , a) = 1 for the action prescribed by rip in Ii . This creates equivalent behavior in all Ii . Creating pure realization plan from deterministic behavioral strategy can be done by the same procedure, ignoring the parts of the tree unreachable when playing this strategy. ⊓ ⊔ Now we are ready to state the main result of this section.
7
Fig. 3. (Left) An imperfect recall game without A-loss recall where the only Nash equilibrium in behavioral strategies is not a Nash equilibrium of the coarsest perfect recall refinement. (Right) Its coarsest perfect recall refinement.
Theorem 1. An A-loss recall game G has a Nash equilibrium in behavioral strategies if and only if there exists a Nash equilibrium b in behavioral strategies of the coarsest perfect recall refinement G′ of G, such that ∀I ∈ I G ∀Hk , Hl ∈ H(I) : b(Hk ) = b(Hl ), where b(H) stands for the behavioral strategy in the information set of G′ formed by states in H. Proof. First, since b is a Nash Equilibrium of G′ we know that there exists no incentive for any player to deviate to any pure behavioral strategy in G′ . From the interchangeability of pure behavioral strategies and pure realization plans in games with perfect recall [11] we know that there exists no pure realization plan to which any of the players wants to deviate. From Lemma 2, it follows that there can exist no pure realization plan in G, and therefore by Lemma 4 no pure behavioral strategy in G to which any of the players want to deviate. This, in combination with the fact, that b prescribes valid strategy in G implies that b is a Nash Equilibrium in behavioral strategies of G. Second, we prove that there exists no Nash equilibrium b′ in behavioral strategies of G which is not a Nash equilibrium of G′ . Let us assume that such b′ exists. This would imply that there is no pure behavioral strategy in G to which players want to deviate when playing according to b′ , and therefore no pure realization plan (Lemma 4). However, then Lemma 3 implies that there is no pure realization plan and therefore pure behavioral strategy in G′ either, implying that b′ is a Nash Equilibrium in G′ . This contradicts the assumption and completes the proof. ⊓ ⊔ Informally, Theorem 1 states that G has a Nash equilibrium in behavioral strategies if and only if there exists a behavioral Nash equilibrium b in G′ which prescribes the same behavior in every information set which is connected to some imperfect recall information set of G. Corollary 1. An imperfect recall game G has a Nash equilibrium in behavioral strategies if there exists a Nash equilibrium b in behavioral strategies of the coarsest perfect recall refinement G′ of G, such that ∀I ∈ I G ∀Hk , Hl ∈ H(I) :
8
g/j
1/2
e/h
1/2
2/5
f/i
Fig. 4. An example where only a degenerate Nash equilibrium of the coarsest perfect recall refinement forms a Nash equilibrium of the A-loss recall game created by joining information sets I1 and I2 .
b(Hk ) = b(Hl ), where b(H) stands for the behavioral strategy in the information set of G′ formed by states in H, the opposite does not hold. Proof. The proof of the implication follows directly from the first part of the proof of Theorem 1. Finally, we provide a counter-example showing that the opposite direction of the implication does not hold. Consider the game in left subfigure of Figure 3. Here the only Nash equilibrium in behavioral strategies is playing d and e deterministically and mixing uniformly between g, h. The only Nash equilibrium of its coarsest perfect recall refinement (shown in right subfigure of Figure 3) is, however, playing d, e, h and i deterministically. ⊓ ⊔
4
Optimal Strategies in A-Loss Recall Games
The sufficient and necessary condition on the existence of NE described in the previous section allows us to provide additional characterization of NE strategies in A-loss recall games. First, we show that a Nash equilibrium of a two-player zero-sum A-loss recall game might be formed by a degenerate equilibrium of its coarsest perfect recall refinement. Next we show that if there exists a NE in behavioral strategies, it can be expressed by using only rational numbers. However, a maxmin strategy might still require irrational numbers in A-loss recall games if a NE does not exisit. 4.1
Properties of Nash Equilibria of A-Loss Recall Games
The possibly infinite set of all Nash equilibria of a game G can be completely described by a set of finitely many extreme Nash equilibria E (see, e.g., [1]). These equilibria have the property, that they cannot be obtained by any convex
9
combination of other Nash equilibria of G. Additionally, all the non-extreme (also called degenerate) Nash equilibria of G are obtained as a convex combination of entries of E. For the case of two-player zero-sum games, we can say that E = E1 × E2 , where Ei stands for strategies of player i that forms a part of an extreme Nash equilibrium. In case of perfect recall games, entries of Ei form the vertices of a convex polyhedron of optimal solutions of the sequence-form LP [9] computing the strategy for i. Finally, the expected value of every player is the same when all players play according to some Nash equilibrium strategy in zero-sum games. Let us consider the game from Figure 4. Player 1 has 4 extreme Nash equi54 libria. All of them prescribe b(a) = 89 , They, however, differ in information sets I1 and I2 . The extreme equilibria are {b11 (e) = b11 (g) = 12 , b11 (h) = 52 b11 (i) = 35 }, {b21 (e) = b21 (g) = 12 , b12 (j) = 1},{b31 (f ) = b31 (g) = 12 , b31 (h) = 25 b31 (i) = 53 } and ,{b41 (f ) = b41 (g) = 12 , b41 (j) = 1}. All the equilibria of player 1 can be obtained as a convex combination of these extreme equilibria. Claim. All the Nash equilibra of a two-player zero-sum A-loss recall game might be formed by a degenerate equilibria of its coarsest perfect recall refinement. Proof. Consider an A-loss recall game G created from the game in Figure 4 by merging information sets I1 and I2 , and by mapping actions e to h, f to i, and g to j. Now we would like to check whether there is a Nash equilibrium in G. On the right side of Figure 4 we depict a triangle representing all the behavioral strategies that can occur in the merged information set. The dashed segment represents all the valid behavioral strategies consistent with Nash equilibrium in I1 , the dotted line depicts the same for I2 (notice that the extreme equilibria form the endpoints of these segments). As you can see there exists only one valid behavior which is consistent with the Nash equilibriumin of the resulting game, 3 3 3 4 b1 + 10 b1 ). ⊓ ⊔ since the segments intersect (created, by combining 51 b11 + 15 b21 + 10 4.2
Representation of Nash and Maxmin Strategies
In two-player zero-sum perfect recall games with rational payoffs, there exists an optimal behavioral strategy which uses only rational probabilities [7]. In imperfect recall games this no longer holds [7]. We show that in a zero-sum A-loss recall game a NE in behavioral strategies can be expressed using rational numbers only, if some NE exists. However, if a NE does not exist, a maxmin strategy may still need irrational numbers even in two-player zero-sum A-loss recall games with rational payoffs. Theorem 2. If there exists a Nash equilibrium of a two-player zero-sum Aloss recall game G with rational payoffs, there must exist a Nash equilibrium in behavioral strategies using rational numbers only. Proof. First, let us denote by G′ the coarsest perfect recall refinement of G. All the extreme Nash equilibria in behavioral strategies of a two-player zero-sum perfect recall game (and therefore also of G′ ) use rational numbers only [7]. This
10
is not enough, however, as shown in Section 4.1. Consider a following approach for player i. For each pair Ii , Ii′ of information sets of player i in G′ which are unified to some imperfect recall information set in G, we create a following mixed integer linear program (MILP) bi (Ii , a) = bi (Ii′ , a),
∀a ∈ A(Ii )
(1)
λk pk (I),
∀I ∈ {Ii , Ii′ }
(2)
π(I) ∈ {0, 1}, n X λk bki (I, a) + 1 − π(I), bi (I, a) ≤
∀I ∈ {Ii , Ii′ }
(3)
∀I ∈ {Ii , Ii′ }, ∀a ∈ A(I)
(4)
∀I ∈ {Ii , Ii′ }, ∀a ∈ A(I)
(5)
∀k ∈ {1, ..., n}
(6)
π(I) ≥
n X k=1
bi (I, a) ≥
k=1 n X
λk bki (I, a) − 1 + π(I),
k=1
λk ∈ [0, 1], n X
λk = 1.
(7)
k=1
The MILP computes coefficients λk of the convex combination of extreme Nash equilibria bki (where k = 1, . . . , n) which generate equal behavior in both Ii and Ii′ , or makes sure that at least one of them is not reached, implying that the behavior there is arbitrary. To do this, we use a binary variables π(I) for each Ii and Ii′ to distinguish whether the information set I is visited with positive probability under the current convex combination (Constraints (2), (3), where pk (I) is a constatnt representing the probability that I will be visited according to bki ). Next, in Constraints (4) and (5) we put the variables b(I, a) equal to the convex combination of the extreme Nash equilibria in case I is visited with a positive probability under the current convex combination. In case I is not visited bi (I, a) can be arbitrary. Finally, in Constraints (6) and (7) we make sure that we indeed compute a convex combination. If we create a MILP which consists of constraints (1) to (7) for every pair of information sets of G′ which is unified to some imperfect recall information set of G we obtain a MILP where all the feasible solutions represent all the convex combinations of extreme Nash equilibria of G′ which generate a valid behavior in G and therefore form a Nash equilibrium in behavioral strategies of G by Theorem 1 (if the MILP is infeasible there is no Nash equilibrium in behavioral strategies in G). Finally, thanks to the fact that the feasible space of all the solutions of a MILP is a set of polytopes, each created by linear constraints with rational coefficients, we are sure that there exists a feasible solution with variables having rational values only. Since the variables are coefficients of a convex combination of rational behavioral strategies, the resulting strategy is also rational. ⊓ ⊔
11
Fig. 5. An A-loss recall game where maxmin strategy requires irrational numbers.
Theorem 3. The maxmin strategy may require irrational numbers, even in twoplayer zero-sum A-loss recall game with rational payoffs. Proof. Consider the game from Figure 5 that is a modification of example from Figure 2 in [7]. The maxmin strategy of player 1 is trying to maximize min{3b1 (a)b1 (c), 3(1 − b1 (a))(1 − b1 (c)), b1 (a)(1 − b1 (c)) + (1 − b1 (a))b1 (c)}.
(8)
This is maximized when 3b1 (a)b1 (c) = 3(1 − b1 (a))(1 − b1 (c)) = b1 (a)(1 − b1 (c)) + (1 − b1 (a))b1 (c), (9) √ which leads to b(a) = 0.1(5 ± 5). ⊓ ⊔
5
Computational Complexity in A-loss Recall Games
Now we turn to the computational complexity and show that computing a maxmin strategy is NPC when the maxmin value is rational in A-loss recall games and that checking if there exists a Nash equilibrium in behavioral strategies is also NPC in A-loss recall games. In order to state the complexity results, we first need to specify the length of the input. We use the length of the input used by Koller and Megiddo [7]. The size of an integer N is log2 (1 + |N |), the size of a rational number is the sum of sizes of its numerator and denominator in the reduced form. The size of the game equals the sum of sizes of payoffs and of probabilities associated with random moves. The hardness results use a transformation that extends the original transformation used by Koller and Megiddo in Proposition 2.6 [7]. Theorem 4. The problem of deciding whether player 2 having an A-loss recall can guarantee an expected payoff of at least λ is NP-hard even if player 1 has perfect recall, there are no chance moves and the game is zero-sum. Proof. The proof is made by reduction from 3-SAT problem. Given n clauses xj,1 ∨ xj,2 ∨ xj,3 we create a two person zero-sum game in a following way. In the root of the game player 2 chooses between n actions, each corresponding to
12
Fig. 6. An A-loss recall game reduction of 3-SAT problem x1 ∨¬x3 ∨x4 ∧¬x2 ∨x3 ∨¬x4 .
one clause. Player 1 plays next with no information about the action chosen by player 2. He has again n actions available. In every state of player 1 n − 1 actions lead directly to a terminal state with utility (0, 0) and one action (with index corresponding to the index of action of player 2 preceding this state) leads to a state of player 2. Every such state of player 2 corresponds to the variable xj,1 where j is the index of the clause chosen in the root of the game. Every such state has actions Txj,1 , Fxj,1 available, these actions correspond to setting the variable xj,1 to true or false respectively. After both Txj,1 , Fxj,1 in xj,1 we reach state representing the assignment to xj,2 with the same setup (state representing the assignment to xj,3 is reached after that). After the assignment to xj,3 we reach the terminal state with utility (−nλ, nλ) if the assignment to xj,1 , xj,2 and xj,3 satisfies the clause xj,1 ∨xj,2 ∨xj,3 , (0, 0) otherwise. The information sets of player 2 group together all the states corresponding to the assignment to one variable in the original 3-SAT problem (note that we assume that the order of variables in every clause follows some complete ordering on the whole set of variables in the 3-SAT problem). An example of the reduction is shown in Figure 6. When the reduction is done in this way, we are sure that player 2 can guarantee the outcome of λ if and only if there is an assignment of variables satisfying the original 3-SAT problem. This holds since if there would be at least one clause not satisfied, player 1 could choose an action corresponding to this clause guaranteeing an expected outcome of 0 for both players. On the other hand if player 2 can guarantee an outcome of λ all the clauses must be satisfiable, since otherwise player 1 could again choose the action corresponding to the unsatisfiable clause leading to an outcome of 0 for both. The reduction is polynomial, since the game has n(n − 1) + 23 n leafs. The last thing which remains to be shown is that player 2 has A-loss recall. This is satisfied, since any loss of information about actions of player 1 can be tracked back to forgetting his own action taken in the root. ⊓ ⊔
13
Theorem 5. If player 1 has A-loss recall, the problem of deciding whether player 2 having an imperfect recall can guarantee the rational expected payoff of at least λ is in NPC in two-player general-sum games. Proof. In Theorem 4 we show that it is NP-hard. To finish the proof we show that the problem is in NP. For a given strategy b2 we can check whether it guarantees the expected outcome by computing the best response of player 1 in a zero-sum game created using u1 (z) = −u2 (z), ∀z ∈ Z and where we convert the states of player 2 to chance moves with the distribution described by b2 . The information set structure is created as the coarsest perfect recall refinement of the original information set structure of player 1. Since the resulting game has perfect recall, finding the best response needs O(|H1 | · maxh∈H1 |A(h)|) which is polynomial in the size of the input (the result is a best response also for the A-loss recall information set structure by Lemma 3 and 4). Note that the requirement for rationality of λ is crucial here, since as shown in Section 4.2 if the maxmin value is irrational, we need to evaluate strategies with irrational numbers, which do not have representation with a size polynomial to the input size. In case the game has irrational maxmin value it is NPC to find its closest rational approximation. ⊓ ⊔ Theorem 6. If a two player game has A-loss recall, it is NP-hard to check whether there is a Nash equilibrium in behavioral strategies. Proof. The proof is made by reduction from 3-SAT problem. The reduction is similar to the one in proof of Theorem 4. The only change in the resulting game is the substitution of the utility in the leafs after actions of player 1 by (−1, 1) and in the leafs corresponding to satisfying the given clause by (−0.5, 0.5). When the reduction is done in this way, we are sure that the game has a Nash equilibrium in behavioral strategies if and only if there is an assignment of variables satisfying the original 3-SAT problem. This holds since if there would be at least one clause not satisfied in each assignment, there is no stable strategy, as player 2 can deviate from any strategy to strategy making some clause k satisfied (one clause can be always satisfied in SAT problem) and play the action corresponding to clause k deterministically in the root. This, however, does not lead to a stable strategy either, since then player 1 wants to deviate to the action corresponding to k clause as he gets better expected utility from the subtree than the immediate -1 from the leafs reached by other actions. Player 2 then, however, wants to deviate to any other action in the root because it yields the expected value of 1 for him. After that the whole process repeats itself. On the other hand, if there exists a Nash equilibrium in behavioral strategies, all the clauses must be satisfiable, since, as discussed above, there is no stable strategy if there is at least one clause not satisfied in each assignment. If all clauses are satisfied in some assignment then the Nash equilibrium is to play uniformly in the root and according to the assignment in the rest of the tree for player 2 and to play uniformly for player 1. ⊓ ⊔
14
Theorem 7. If a two-player game has A-loss recall, the problem of deciding whether there exists a Nash equilibrium in behavioral strategies is in NPC. Proof. In Theorem 6 we show that it is NP-hard. To finish the proof we show that the problem is in NP. This can be done by computing a best response for both players in a game created in the same way as described in the proof of Theorem 5 in polynomial time. If the expected value of both best responses is equal to the expected values the players have in the strategy profile, the strategy profile is in Nash equilibrium. ⊓ ⊔
6
Conclusion
We provide several key theoretical contributions about a subclass of imperfect recall games, called A-loss recall games, where every loss of information of a player can be tracked to forgetting his own actions. We give new insights about solution concepts by defining a sufficient and necessary condition for the existence of a Nash equilibrium in behavioral strategies in two-player A-loss recall games, and this equilibrium strategies can be expressed using rational numbers only, if they exist. We also show NP-completeness of computational problems related to the existence of a Nash equilibrium and computing a maxmin strategy. The results can be exploited to design an exact algorithm for computing a Nash equilibrium in A-loss recall games. Another possible continuation is to generalize the conditions for the existence of a Nash equilibrium to general imperfect recall games. Acknowledgements. This research was supported by the Czech Science Foundation (grant no. 15-23235S) and by the Grant Agency of the Czech Technical University in Prague, grant No. SGS16/235/OHK3/3T/13.
References 1. Audet, C., Belhaiza, S., Hansen, P.: A New Sequence Form Approach for the Enumeration and Refinement of all Extreme Nash Equilibria for Extensive Form Games. International Game Theory Review 11 (2009) 2. Halpern, J.Y., Pass, R.: Sequential equilibrium and perfect equilibrium in games of imperfect recall. Unpublished manuscript (2009) 3. Hansen, K.A., Miltersen, P.B., Sørensen, T.B.: Finding Equilibria in Games of no Chance. In: Computing and Combinatorics, pp. 274–284. Springer (2007) 4. Kaneko, M., Kline, J.J.: Behavior Strategies, Mixed Strategies and Perfect Recall. International Journal of Game Theory 24, 127–145 (1995) 5. Kline, J.J.: Minimum Memory for Equivalence between Ex Ante Optimality and Time-Consistency. Games and Economic Behavior 38, 278–305 (2002) 6. Kline, J.J.: Imperfect recall and the relationships between solution concepts in extensive games. Economic Theory 25(3), 703–710 (2005) 7. Koller, D., Megiddo, N.: The Complexity of Two-person Zero-sum Games in Extensive Form. Games and Economic Behavior 4, 528–552 (1992)
15 8. Koller, D., Megiddo, N.: Finding Mixed Strategies with Small Supports in Extensive Form Games. International Journal of Game Theory 25(1), 73–92 (1996) 9. Koller, D., Megiddo, N., von Stengel, B.: Fast Algorithms for Finding Randomized Strategies in Game Trees. In: 26th annual ACM symposium on Theory of computing (1994) 10. Kroer, C., Sandholm, T.: Extensive-Form Game Imperfect-Recall Abstractions With Bounds. arXiv preprint arXiv:1409.3302 (2014) 11. Kuhn, H.W.: Extensive Games and the Problem of Information. Annals of Mathematics Studies (1953) 12. Lanctot, M., Gibson, R., Burch, N., Zinkevich, M., Bowling, M.: No-regret Learning in Extensive-form Games with Imperfect Recall. arXiv preprint arXiv:1205.0622 (2012) 13. Nash, J.F.: Equilibrium Points in n-person Games. Proc. Nat. Acad. Sci. USA 36(1), 48–49 (1950) 14. Piccione, M., Rubinstein, A.: On the interpretation of decision problems with imperfect recall. Games and Economic Behavior 20(1), 3–24 (1997) 15. Waugh, K., Zinkevich, M., Johanson, M., Kan, M., Schnizlein, D., Bowling, M.H.: A Practical Use of Imperfect Recall. In: SARA. Citeseer (2009) 16. Wichardt, P.C.: Existence of Nash Equilibria in Finite Extensive Form Games with Imperfect Recall: A Counterexample. Games and Economic Behavior 63(1), 366–369 (2008)