Quantum Inference on Bayesian Networks

Report 11 Downloads 229 Views
Quantum Inference on Bayesian Networks Guang Hao Low, Theodore J. Yoder, Isaac L. Chuang

arXiv:1402.7359v1 [quant-ph] 28 Feb 2014

Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, 02139 MA, United States of America (Dated: March 3, 2014) Performing exact inference on Bayesian networks is known to be #P-hard. Typically approximate inference techniques are used instead to sample from the distribution on query variables given the values e of evidence variables. Classically, a single unbiased sample is obtained from a Bayesian network on n variables with at most m parents per node in time O(nmP (e)−1 ), depending critically on P (e), the probability the evidence might occur in the first place. By implementing a quantum 1 version of rejection sampling, we obtain a square-root speedup, taking O(n2m P (e)− 2 ) time per sample. We exploit the Bayesian network’s graph structure to efficiently construct a quantum state, a q-sample, representing the intended classical distribution, and also to efficiently apply amplitude amplification, the source of our speedup. Thus, our speedup is notable as it is unrelativized – we count primitive operations and require no blackbox oracle queries. PACS numbers: 02.50.Tt, 03.67.Ac

I.

INTRODUCTION

How are rational decisions made? Given a set of possible actions, the logical answer is the one with the largest corresponding utility. However, estimating these utilities accurately is the problem. A rational agent endowed with a model and partial information of the world must be able to evaluate the probabilities of various outcomes, and such is often done through inference on a Bayesian network [1], which efficiently encodes joint probability distributions in a directed acyclic graph of conditional probability tables. In fact, the standard model of a decision-making agent in a probabilistic time-discretized world, known as a Partially Observable Markov Decision Process, is a special case of a Bayesian network. Furthermore, Bayesian inference finds application in processes as diverse as system modeling [2], model learning [3, 4], data analysis [5], and decision making [6], all falling under the umbrella of machine learning [1]. Unfortunately, despite the vast space of applications, Bayesian inference is difficult. To begin with, exact inference is #P hard in general [1]. It is often far more feasible to perform approximate inference by sampling, such as with the Metropolis-Hastings algorithm [7] and its innumerable specializations [8], but doing so is still NP-hard in general[9]. This can be understood by considering rejection sampling, a primitive operation common to many approximate algorithms that generates unbiased samples from a target distribution P (Q|E) for some set of query variables Q conditional on some assignment of evidence variables E = e. In the general case, rejection sampling requires sampling from the full joint P (Q, E) and throwing away samples with incorrect evidence. In the specific case in which the joint distribution is described by a Bayesian network with n nodes each with no more than m parents, it takes time O(nm) to generate a sample from the joint distribution, and so a sample from the conditional distribution P (Q|E) takes average time O(nmP (e)−1 ). Much of the computational difficulty is

related to how the marginal P (e) = P (E = e) becomes exponentially small as the number of evidence variables increases, since only samples with the correct evidence assignments are recorded. One very intriguing direction for speeding up approximate inference is in developing hardware implementations of sampling algorithms, for which promising results such as natively probabilistic computing with stochastic logic gates have been reported [10]. In this same vein, we could also consider physical systems that already describe probabilities and their evolution in a natural fashion to discover whether such systems would offer similar benefits. Quantum mechanics can in fact describe such naturally probabilistic systems. Consider an analogy: if a quantum state is like a classical probability distribution, then measuring it should be analogous to sampling, and unitary operators should be analogous to stochastic updates. Though this analogy is qualitatively true and appealing, it is inexact in ways yet to be fully understood. Indeed, it is a widely held belief that quantum computers offer a strictly more powerful set of tools than classical computers, even probabilistic ones [11], though this appears difficult to prove [12]. Notable examples of the power of quantum computation include exponential speedups for finding prime factors with Shor’s algorithm [13], and square-root speedups for generic classes of search problems through Grover’s algorithm [14]. Unsurprisingly, there is a ongoing search for ever more problems amenable to quantum attack [15–17]. For instance, the quantum rejection sampling algorithm for approximate inference was only developed quite recently [18], alongside a proof, relativized by an oracle, of a square-root speedup in runtime over the classical algorithm. The algorithm, just like its classical counterpart, is an extremely general method of doing approximate inference, requiring preparation of a quantum pure state representing the joint distribution P (Q, E) and amplitude amplification to amplify the part of the superpo-

2 sition with the correct evidence. Owing to its generality, the procedure assumes access to a state-preparation oracle AˆP , and the runtime is therefore measured by the query complexity [19], the number of times the oracle must be used. Unsurprisingly, such oracles may not be efficiently implemented in general, as the ability to prepare arbitrary states allows for witness generation to QMAcomplete problems [18, 20]. This also corresponds consistently to the NP-hardness of classical sampling. In this paper, we present an unrelativized (i.e. no oracle) square-root, quantum speedup to rejection sampling on a Bayesian network. Just as the graphical structure of a Bayesian network speeds up classical sampling, we find that the same structure allows us to construct the state-preparation oracle AˆP efficiently. Specifically, quantum sampling from P (Q|E = e) takes time O(n2m P (e)−1/2 ), compared with O(nmP (e)−1 ) for classical sampling, where m is the maximum indegree of the network. We exploit the structure of the Bayesian network to construct an efficient quantum circuit AˆP composed of O(n2m ) controlled-NOT gates and singlequbit rotations that generates the quantum state |ψP i representing the joint P (Q, E). This state must then be evolved to |Qi representing P (Q|E = e), which can be done by performing amplitude amplification [21], the source of our speedup and heart of quantum rejection sampling in general [18]. The desired sample is then obtained in a single measurement of |Qi. We better define the problem of approximate inference with a review of Bayesian networks in section II. We discuss the sensible encoding of a probability distribution in a quantum state axiomatically in section III. This is followed by an overview of amplitude amplification in section IV. The quantum rejection sampling algorithm is given in section V. As our main result, we construct circuits for the state preparation operator in sections VI A and VI B and circuits for the reflection operators for amplitude amplification in section VI C. The total time complexity of quantum rejection sampling in Bayesian networks is evaluated in section VI D, and we present avenues for further work in section VII.

II.

BAYESIAN NETWORKS

A Bayesian network is a directed acyclic graph structure that represents a joint probability distribution over n bits. A significant advantage of the Bayesian network representation is that the space complexity of the representation can be made much less than the general case by exploiting conditional dependencies in the distribution. This is achieved by associating with each graph node a conditional probability table for each random variable, with directed edges representing conditional dependencies, such as in Fig. 1a. We adopt the standard convention of capital letters (e.g. X) representing random variables while lowercase letters (e.g. a) are particular fixed values of those vari-

ables. For simplicity, the random variables are taken to be binary. Accordingly, probability vectors are denoted P (X) = {P (X = 0), P (X = 1)} while P (x) ≡ P (X = x). Script letters represent a set of random variables X = {X1 , X2 , ...Xn }. An arbitrary joint probability distribution P (x1 , x2 , . . . , xn ) on n bits can always be factored by recursive application of Bayes’ rule P (X, Y ) = P (X)P (Y |X), P (x1 , x2 , . . . , xn ) = P (x1 )

n Y

i=2

P (xi |x1 , . . . , xi−1 ).

(1)

However, in most practical situations a given variable Xi will be dependent on only a few of its predecessors’ values, those we denote by parents(Xi ) ⊆ {x1 , . . . , xi−1 } (see Fig. 1a). Therefore, the factorization above can be simplified to P (x1 , x2 , . . . , xn ) = P (x1 )

n Y

i=2

P (xi | parents(Xi )).

(2)

A Bayes net diagrammatically expresses this simplification, with a topological ordering on the nodes X1  X2  · · ·  Xn in which parents are listed before their children. With a node xi in the Bayes net, the conditional probability factor P (xi = 1| parents(Xi )) is stored as a table of 2mi values [1] where mi is the number of parents of node Xi , also known as the indegree. Letting m denote the largest mi , the Bayes net data structure stores at most O(n2m ) probabilities, a significant improvement over the direct approach of storing O(2n ) probabilities [1]. A common problem with any probability distribution is inference. Say we have a complete joint probability distribution on n-bits, P (X ). Given the values e = e|E| ...e2 e1 for a set E ⊆ X of random variables, the task is to find the distribution over a collection of query variables Q ⊆ X \E. That is, the exact inference problem is to calculate P (Q|E = e). Exact inference is #P-hard [1], since one can create a Bayes net encoding the n variable k-SAT problem, with nodes for each variable, each clause, and the final verdict – a count of the satisfying assignments. Approximate inference on a Bayesian network is much simpler, thanks to the graphical structure. The procedure for sampling is as follows: working from the top of the network, generate a value for each node, given the values already generated for its parents. Since each node has at most m parents that we must inspect before generating a value, and there are n nodes in the tree, obtaining a sample {x1 , x2 . . . , xn } takes time O(nm). Yet we must postselect on the correct evidence values E = e, leaving  us with an average time per sample of O nmP (e)−1 , which suffers when the probability P (e) becomes small, typically exponentially small with the number of evidence variables |E|. Quantum rejection sampling, however, will improve the factor of P (e)−1 to P (e)−1/2 , while preserving the linear scaling in the number of variables n, given

3 a)

b)

X1

X2

X5

X3

X4

X6

X7

FIG. 1: a) An example of a directed acyclic graph which can represent a Bayesian network by associating with each node a conditional probability table. For instance, associated with the node X1 is the one value P (X1 = 1), while that of X5 consists of four values, the probabilities X5 = 1 given each setting of the parent nodes, X2 and X3 . b) A quantum circuit that efficiently prepares the q-sample representing the full joint distribution of (a). Notice in particular how the edges in the graph are mapped to conditioning nodes in the circuit. The |ψj i represent the state of the system after applying the operator sequence U1 ...Uj to the initial state |0000000i. that we use an appropriate quantum state to represent the Bayesian network.

III.

QUANTUM SAMPLING FROM P (X )

This section explores the analogy between quantum states and classical probability distributions from first principles. In particular, for a classical probability distribution function P (X ) on a set of n binary random variables X what quantum state ρP (possibly mixed, dqubits) should we use to represent it? The suitable state, which we call a quantum probability distribution function (qpdf), is defined with three properties. Definition 1. A qpdf for the probability distribution P (X ) has the following three properties: 1. Purity: In the interest of implementing quantum algorithms, we require the qpdf be a pure state ρP = |ΨP i hΨP |. 2. Q-sampling: A single qpdf can be measured to obtain a classical n-bit string, a sample from P (X ). Furthermore, for any subset of variables W ⊂ X , a subset of qubits in the qpdf can be measured to obtain a sample from the marginal distribution P (W). We call these measurement procedures q-sampling. 3. Q-stochasticity: For every stochastic matrix T there is a unitary UT such that whenever T maps the classical distribution P (X ) to P 0 (X ), UT maps the qpdf |ΨP i to |ΨP 0 i = UT |ΨP i. The motivation for property 3 is for implementing Markov chains, Markov decision processes, or even sampling algorithms such as Metropolis-Hastings, on quantum states. The question we pose, and leave open, is whether a qpdf exists.

The simplest way to satisfy the first two criteria, but not the third, is to initialize a single qubit for each classical binary random variable. This leads to what is called the q-sample, defined in prior work [22] as: Definition 2. The q-sample of the joint distribution P (x1 , ..., xn ) over n binary variables {Xi } is the n-qubit p P pure state |ψP i = x1 ,...,xn P (x1 , ..., xn ) |x1 ...xn i.

The q-sample possesses property 1 and the eponymous property 2 above. However, it does not allow for stochastic updates as per property 3, as a simple single qubit example shows. In that case, property 3 requires  p  √ pT11 + (1 − p)T12 √ p p = , 1−p pT21 + (1 − p)T22 (3) for all p ∈ [0, 1]. Looking at Eq. 3 for p = 0 and p = 1 constrains U completely, and it is never unitary when T is stochastic. Thus, the q-sample fails to satisfy property 3. Yet, the q-sample satisfies properties 1 and 2 in a very simplistic fashion, and various more complicated forms might be considered. For instance, relative phases p could P iφ(x) be added to the q-sample giving P (x) |xi, xe though this alone does not guarantee property 3, which is easily checked by adding phases to the proof above. Other extensions of the q-sample may include ancilla qubits, different measurement bases, or a post-processing step including classical randomness to translate the measurement result into a classical sample. It is an open question whether a more complicated representation satisfying all three properties exists, including qstochasticity, so that we would have a qpdf possessing all the defining properties. Nevertheless, although the q-sample is not a qpdf by our criteria, it will still be very useful for performing quantum rejection sampling. The property that a sample from a marginal distribution is obtained by simply 

U11 U12 U21 U22



4 measuring a subset of qubits means that, using conditional gates, we can form a q-sample for a conditional distribution from the q-sample for the full joint distribution, as we will show in section VI. This corresponds to the classical formula P (V|W) = P (V, W)/P (W), which is the basis behind rejection sampling. The way it is actually done quickly on a q-sample is through amplitude amplification, reviewed next, in the general case.

IV.

AMPLITUDE AMPLIFICATION

Amplitude amplification [21] is a well-known extension of Grover’s algorithm and is the second major concept in the quantum inference algorithm. Given a quantum circuit Aˆ for of an n-qubit pure state |ψi = the creation ⊗n α |ψt i + β ψ¯t = Aˆ |0i , where hψt |ψ¯t i = 0, the goal is to return the target state |ψt i with high probability. To make our circuit constructions more explicit, we assume target states are marked by a known evidence bit string e = e|E| ...e2 e1 , so that |ψt i = |Qi |ei lives in the tensor product space HQ ⊗ HE and the goal is to extract |Qi.

Just like in Grover’s algorithm, a pair of reflection operators are applied repetitively to rotate |ψi into |ψt i. Reflection about the evidence is performed by Sˆe = Iˆ ⊗ (Iˆ − 2 |ei he|) followed by reflection about the initial ˆ then Sˆψ = AˆSˆ0 Aˆ† , state by Sˆψ = (Iˆ−2 |ψi hψ|). Given A, ⊗n where Sˆ0 = (Iˆ − 2 |0i h0| ).

The analysis of the amplitude amplification algorithm ˆ = −Sˆψ Sˆe = is elucidated by writing the Grover iterate G β ¯ α †ˆ 1 ˆ ˆ ˆ −AS0 A Se in the basis of |α| |ψt i ≡ ( 0 ) and |β| ψt ≡ 0 ( 1 ) [19], ˆ= G



p  2 1− 2|α| 1 − |α|2 p2|α| . 1 − 2|α|2 −2|α| 1 − |α|2

(4)

In this basis, the Grover iterate corresponds to a rotation by small angle θ = cos−1 (1 − 2|α|2 ) ≈ 2|α|. Therefore, applying the iterate N times rotates the state by N θ. ˆ N |ψi is closest to α |ψt i after N = We conclude that G |α|   π O 4|α| iterations.

Usually, amplitude amplification needs to be used without knowing the value of |α|. In that case, N is not known. However, the situation is remedied by guessing the correct number of Grover iterates to apply in expoˆ 2k times, with nential progression. That is, we apply G k = 0, 1, 2, . . . , measure the evidence qubits |Ei after each attempt, and stop when we find E = e. It has been shown   1 [21] that this approach also requires on average O |α| ˆ applications of G.

V.

THE QUANTUM REJECTION SAMPLING ALGORITHM

The quantum rejection sampling algorithm [18], which we review now, is an application of amplitude amplification on a q-sample. The general problem, as detailed in section II, is to sample from the n-bit distribution P (Q|E = e). We assume that we have a circuit AˆP that ⊗n can prepare the q-sample |ψP i = AˆP |0i . Now, permuting qubits so the evidence lies to the right, the qsample can be decomposed into a superposition of states with correct evidence and states with incorrect evidence. p p |ψP i = P (e) |Qi |ei + 1 − P (e) Q, e , (5) where |Qi denotes the q-sample of P (Q|E = e), our target state. Next perform the amplitude amplification algorithm from the last section to obtain |Qi with high probability. Note that this means the state preparation operator AˆP must be applied O(P (e)−1/2 ) times. Once obtained, |Qi can be measured to get a sample from P (Q|E = e), and we have therefore done approximate inference. Pseudocode is provided as an algorithm 1. However, we are so far missing a crucial element. How is the q-sample preparation circuit AˆP actually implemented, and can this implementation be made efficient, that is, polynomial in the number of qubits n? The answer to this question removes the image of AˆP as a featureless black box and is addressed in the next section. Algorithm 1 Quantum rejection sampling algorithm: generate one sample from P (Q|E = e) given a q-sample preparation circuit AˆP k ← −1 while evidence E = 6 e do k ←k+1 |ψP i ← AˆP |0i⊗n //prepare a q-sample of P (X ) ˆ = −AˆP Sˆ0 Aˆ† Sˆe ˆ 2k |ψP i //where G |ψP0 i ← G P Measure evidence qubits E of |ψP0 i Measure the query qubits to obtain a sample Q = q

VI.

CIRCUIT CONSTRUCTIONS

While the rejection sampling algorithm from Section V is entirely general for any distribution P (X ), the complexity of q-sample preparation, in terms of the total number of CNOTs and single qubit rotations involved, is generally exponential in the number of qubits, O(2n ). We show this in section VI A. The difficultly is not surprising, since arbitrary q-sample preparation encompasses witness generation to QMA-complete problems [18, 20]. However, there are cases in which the q-sample can be prepared efficiently [22]. The main result of this paper is that, for probability distributions resulting from a Bayesian network B with n nodes and maximum indegree m, the circuit complexity of the q-sample preparation circuit AˆB is O (n2m ). We show this in section

5 VI B. The circuit constructions for the remaining parts of the Grover iterate, the phase flip operators, are given in section VI C. Finally, we evaluate the complexity of our constructions as a whole in section VI D and find that approximate inference on Bayesian networks can be done with a polynomially sized quantum circuit. Throughout this section we will denote the circuit complexity of a circuit Cˆ as QCˆ . This complexity measure is the count of the number of gates in Cˆ after compilation into a complete, primitive set. The primitive set we employ includes the CNOT gate and all single qubit rotations.

A.

Q-sample Preparation

If P (x) lacks any kind of structure, the difficulty of ⊗n preparing the q-sample |ψP i = AˆP |0i with some unitary AˆP scales at least exponentially with the number of qubits n in the q-sample. Since P (x) contains 2n − 1 arbitrary probabilities, AˆP must contain at least that many primitive operations. In fact, the bound is tight — we can construct a quantum circuit preparing |ψP i with complexity O(2n ). Theorem 1. Given an arbitrary joint probability distribution P (x1 , ..., xn ) over n binary variables {Xi }, there ˆ exists a quantum circuit PAP thatpprepares the q-sample ⊗n ˆ P (x1 , ..., xn ) |x1 ...xn i AP |0i = |ψP i = x1 ,...,xn with circuit complexity O(2n ).

Qn Proof. Decompose P (x) = P (x1 ) i=2 P (xi |x1 ...xi−1 ) as per Eq. (1). For each conditional distribution P (Xi |x1 ...xi−1 ), let us define the i-qubit uniformly ˆi such that given an (i − 1) controlled rotation U bit string assignment xc ≡ x1 ...xi−1 on the conˆi on the ith qubit initrol qubits, the action of U tialized to |0i pi is a rotation about the y-axis by an−1 ˆi |0i = gle 2 tan ( P (xi =p 1|xc )/P (xi = 0|xc )) or U i p P (xi = 0|xc ) |0ii + P (xi = 1|xc ) |1ii . With this defˆ1 is U ˆ1 |0i = inition, the action p of the single-qubit U 1 p P (x1 = 0) |0i1 + P (x1 = 1) |1i1 . By applying Bayes’ ˆn ...U ˆ1 then prorule in reverse, the operation AˆP = U duces |ψP i = AˆP |0i. As each k-qubit uniformly controlled rotation is decomposable into O(2k ) CNOTs and ˆ single-qubit Pnrotations [23], the circuit complexity of AP is QAˆP = i=1 O(2i−1 ) = O(2n ). The key quantum compiling result used in this proof is the construction of Bergholm et. al. [23] that decomposes k-qubit uniformly controlled gates into O(2k ) CNOTs and single qubit operations. Each uniformly controlled gate is the realization of a conditional probability table from the factorization of the joint distribution. We use this result again in Bayesian q-sample preparation.

B.

Bayesian Q-sample Preparation

We now give our main result, a demonstration that the circuit AˆB that prepares the q-sample of a Bayesian network is exponentially simpler than the general q-sample preparation circuit AˆP . We begin with a Bayesian network with, as usual, n nodes and maximum indegree m that encodes a distribution P (X ). As a minor issue, because the Bayesian network may have nodes reordered, the indegree m is actually a function of the specific parentage of nodes in the tree. This non-uniqueness of m corresponds to the non-uniqueness of the decomposition Qn P (x1 , ..., xn ) = P (x1 ) i=2 P (xi |x1 ...xi−1 ) due to permutations of the variables. Finding the variable ordering minimizing m is unfortunately an NP-hard problem [24], but typically the variables have real-world meaning and the natural causal ordering often comes close to optimal [25]. In any case, we take m as a constant much less than n. Definition 3. If P (X ) is the probability distribution represented by a Bayesian network B, the Bayesian qsample |ψB i denotes the q-sample of P (X ). Theorem 2. The Bayesian q-sample of the Bayesian network B with n nodes and bounded indegree m can be prepared efficiently by an operator AˆB with circuit com⊗n plexity O(n2m ) acting on the initial state |0i . Proof. As a Bayesian network is a directed acyclic graph, let us order the node indices topologically such that for all 1 ≤ i ≤ n, we have parents(xi ) ⊆ {x1 , x2 , . . . , xi−1 }, and maxi |parents(xi )| = m. Referring to the construction from the proof of theorem 1, the state preparation operaˆn ...U ˆ1 then contains at most m-qubit uniformly tor Aˆ = U controlled operators, each with circuit complexity O(2m ), again from Bergholm et. al. [23]. The circuit complexity Pn of AˆB is thus QAˆB = i=1 O(2m ) = O(n2m ). Fig. 1b shows the circuit we have just described. Bayesian q-sample preparation forms part of the Grover iterate required for amplitude amplification. The rest is comprised of the reflection, or phase flip, operators.

C.

Phase Flip Operators

Here we show that the phase flip operators are also efficiently implementable, so that we can complete the argument that amplitude amplification on a Bayesian qsample is polynomial time. Note first that the phase flip operators Sˆe acting on k = |E| ≤ n qubits can be implemented with a single k-qubit controlled Z operation along with at most 2k bit flips. The operator Sˆ0 is the special case Sˆe=0n . A Bayesian q-sample can be decomposed exactly as in Eq. (5) |ψB i =

p

P (e) |Qi |ei +

p

1 − P (e) Q, e .

(6)

X1 s can bequbits implemented with a single k-qubit con- k-qubit con- X1 canprebeFlip implemented with a single C. Phase C. Operators Phase Flip Operators mprocan be defd Z operation with atalong most with 2k bit trolled along Z operation atflips. most The 2k bit flips. The con=ˆ ˆ ˆ ˆ e¯2 e¯ ator S isoperator the special the S case is flip the special case . qubits Permute p. Permute p P P= eyes’ phase flip The operators phase SˆS operators acting on SˆkS acting |E|  onnk =the |E|qubits  n and cuit complexX X cyclic graph, (x) so that the es ˆBayesian state | be i =implemented Psingle (x) |xik-qubit so thatconthe in implemented the Bayesian state a| Psingle i =|xi 2 for implementing can bequbits can with with k-qubit conFIG.a 3: Quantum circuit circuit for implementing the phase flip 2 o FIG. 3: Quantum A pronce qubits lie to the right. Grouping around evievidence qubits lie to the right. Grouping around eviˆ d Z operation trolled along Z operation with at along most with 2k bit at flips. most The 2k bit flips. The erator S S one ancilla qubit |0i qubit initialized toimplem the ze ˆwith FIG. ancilla 3:e Quantum circuit|0i for nch initial state one initi econand non-evidence states, dence and states, ¯ e¯ for e with ˆspecial ˆerator 3 a ator Sˆthat isoperator the special Sˆ non-evidence case is all theS . Permute case S the . qubits Permute the qubits ˆ state. The k = |E|-qubit To↵oli operator acts on the set p p erator S with one ancilla qubit |0 P P X X and p Bayesian p p| P (x) ↵evidence 3 3 The = |E|-qubit operator sopthat P (x)the |xi↵ so k that the eroof Bayesian in the state | i = state i =|xistate. qubits E, where an Complexity open(closed) circle denotes co state. TheD. k To↵oli = |E|-qubit To↵oli op ˆ D. Time Time C . (5) | i = P (e) | |ei + P (¯ e ) e . (5) | i = P (e) | i |ei + P (¯ e ) e A .ncede-, x and qubits lie to qubits the right. lie toGrouping the right. around Grouping evi-i qubits around eviievidence 1 }, trol on the qubit being |0i(|1i), and determined conc evidence qubits where an open(c evidence E, where an E,are open(closed) (2 ) ↵ ↵ e and non-evidence dence and non-evidence states, states,tional on the i th =the 1(0) in the being k bit evidence trol eon i |0i(|1i), qubit |0i(|1i), a ll | i is the q-sample of Pq-sample (Q|E = e) Recall | i is the of and Ptrol (Q|Eeon = cone)the and i evidence e qubit conbeing and are conconstruction p p invalid p E 6= evidence ↵ p ↵ write roof tional the i evidence e = 1(0) string =the ...e . The circuiton complexity of Sˆ , dominated all states with invalid evidence e. We write tains all states with Ee(¯ 6= e.e eWe ththe ility . (5) . (5) | i = P (e) | | i |ei = + P (e) P (¯ | e ) i |ei e + P e ) cyclic graph, tional on the i evidence e = 1(0) incompl thea dei ˆ ˆ as that k-qubit To↵oli, ineterms of number of CNOTs string = e ...e . The circuit nce as aevidence k-bit string = e string . . . e ee = and as a ek-bit e X . . .ofeasthe etheand X the We ration operaˆ Zˆ phase, (2 on ) thebit ↵ controlled ↵operations single-qubit X, iscircuit O(k) if various provided O( The circuit complexities of the The circuit complexities of of the k-qubit To↵oli, inwith terms ip i flip evidence qubit. The controlled phase, oni is thethe i Pq-sample evidence qubit. The string e = e ...e . The complexity oeo on. 1 kthat ll | i is the Recall q-sample | of (Q|E = e) of and P (Q|E = e) and e cone conconch that for all ˆ Zˆ operations is O(k ancilla qubits, and O(ksingle-qubit ) with no ancillas. X, ted Zˆ tains , with actsallinvalid on kwith evidence qubits symmetridenoted Zˆ all ,evidence acts on Eall6= ke. evidence qubits symmetriall states states invalid evidence We write E 6 = the e. We write the † † an that of the k-qubit To↵oli, in terms ofA num ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ility bit uniformly flipping the phase if and only if all qubits are one. cally, flipping the phase if and only if all qubits are one. ancilla qubits, and O(k ) with no Grover iterate G = A S A S are present Grover iterate G = A S S ˆ ˆ nce as a evidence k-bit string as a e k-bit = e string . . . e e e = and e X . . . e as e the and X as the 0 e 0 D. Time Complexity D. Comments C ˆ Zˆ operations .ipon.We }, and single-qubit isTime O(k) if pr Sˆ,onx is the implemented by Then Sˆ onis the implemented ˆ ibit mqubit. U QX, i1 flip evidence i evidence Thebycontrolled qubit. The phase, controlled phase, 2 exity O(2 ), ˆ the circuit complexity of the phase fli the circuit complexity of ancilla qubits, and O(k ) with no ancillas. ted Zˆ As denoted , acts on Zˆ all ˆ,k ˆacts evidence on allqubits kAs evidence symmetriqubits symmetriˆ U Qpreparation ˆ ˆ ˆ ˆ ˆ ˆ A Q-sample S = BZ BS = B Z (6) O(2 ) (6) B construction = flipping cally, the phase flipping if and the only phaseif ifalland qubits onlyare if all one. are one. ˆqubits Aˆ state O(2 ) Q-samp A O(n2 ) Bayesian preparation ˆ ˆ tedifcomplexity Q Q Sˆ is implemented Thene S is implemented by by (S ) scales linearly with number of qubits (S ) scales linearly with num ˆB ˆ with ˆ1 with B = where X e . Explicitly, = e¯ =X e¯ = 1 e e . Explicitly, ˆ Aˆ O(n2 ) qubits Bayesian sta S O(n) O(n) ancilla ration operaP (x) ˆ The circuit complexities The circuit of complexities the various of e U Q Co ˆ ˆhp= B ˆ Zˆ ˆSˆhp=p ˆ Zˆ ˆ ↵i pS ˆ i (6)U ˆ S B B B (6) S O(n) O(n tion ↵ O(|E|) O(|E|) ancilla qubits .each dominated of the state prepar by that of the ˆ ZˆSˆ | B ˆ i = PB ˆ(e) ˆ|ei +by | =i =uniformly B Pdominated (¯ )i |ei e + that (7) Zˆ | iB P (e) |ethe P (¯ eˆ ) ˆe ˆ (7)ˆ † n† the ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ bit S are O(|E|) O(|E| A O(2 ) Q-sample pre Q Q difGrover iterate Grover G = A iterate S A S G = A present S A S P h i h i ˆ ˆ ˆ ˆ 0 e 0 p p p p e B = where X B with = e¯ =X1 with e . Explicitly, e¯ =TABLE 1 e↵. Explicitly, I: Circuit ↵complexity Pdis(x) = B ˆ Zˆ ˆ Zˆ| i |1 i + m Q of implementing the op m ˆ P (e) P (¯ e ) 1 = B P (e) | i |1 i + P (¯ e ) 1 (E) )complexity state pre ˆQ Biin↵O(n2 hp h ), hp p ators idiscussed itheTABLE I:Bayesian Circuit o text. The Grovercomplexity iterate G for a exity O(2 p Acircuit tion ↵the h i As the circuit As complexity of the phase of fli p p p p ↵ ↵ ˆˆ ZˆSˆ | B ˆ i = PB ˆ ˆ ˆ n isi = B Z|1 | i i+ B |e1plitude eˆ (7) amplification of a Bayes (Q-sample) state ˆi(e) each = B ators discussed in the text.consists The G P (e) P|P(¯ ei(e) )P =| B P|ei (e)+ |1(¯ i)i+|eie + P (¯ e(7) )P (¯ 1)0 e O(n) S O(n) ancil are h h i i ˆ ˆ p p p p ↵ ↵ preparation dis-complexity plitude amplification a )Bayes (Qinstances circuitwith A (of Aqubits and one h(S i two hˆ ˆp p i of the t(E) p ↵(¯ ↵(¯ scales linearly (S ) scales with linearly number of num ˆ Zˆp e ) = nde- = B P (e) B Z | i |1 i + P (e) P | e i ) |1 1 i + P e ) 1 ˆ e S. eS iandO(|E|) O(|E|) ancil = P (e) |=i |ei +P (e) P (¯ . P (¯ |e)i |eie + stance e)each e of Sp . The time toofcollect one sample fro two instances the preparation circ s to h h i p p p p ↵ ↵ n is P (Q|E = e) is O(Q / stance P (e)).each of S and Sp. The time t ˆ ˆi |1 i + = B P (e) = | B P (e) P | (¯ e i ) |1 1 i + P (¯ e ) 1 dominated by dominated the that of by the the state that prepar of the are ˆ ˆ P (Q|E = e) is O(Q / P (e)). e circuit representing is shown in The circuit representing S Figis shown in Figh diagram h diagram S i i

1

P

P k

k

0

B

e

B

B

e

B

6

e

ˆ Zˆ1...k B ˆ Sˆe = B k 2 1 th

th

1...k

e

e=0n

e=0n B B Recall |Qi is the q-samplexof P (Q|E = e) andx Qe contains all states with invalid evidence E 6= e. We write the e ˆ i as the evidence as a k-bit string e = ek . . . e2 e1 and X n n th 0 0 e=0 e=0 bit flip on the i evidence qubit. The controlled phase, denoted Zˆ1...k ,Bacts on allxk evidence qubitsxsymmetriB B cally, flipping the B phase if and only if all qubits are 1. Then Sˆe is implemented by 0

k

i

1...k e¯ ˆ = Qk X ˆ i where B ¯i ≡ 1 − ei . Explicitly, i=1 i with e k 2 1 k i th th e

(7) 2 1

ˆ U QUˆ Comments AˆP O(2n ) Q-sample preparation a AˆB O(n2m ) Bayesian state preparation eO(n) ancilla qubits Sˆ0 O(n) ˆ Se O(|E|) O(|E|) ancilla qubits

th TABLE I: Circuit complexity QUˆ of implementing the th th ˆ discussed in ˆ operators U i the text. The Grover iterate G th q-sample (genfor amplitude amplification of a Bayesian i 1eral q-sample) e k consists of two instances of the preparation circuit AˆB (AˆP ) and one instance 1 k each of Sˆ0 and i Sˆe . The time to collect one sample from P (Q|E = e) is O(QGˆ P (e)−1/2 ).2

2

2 1

i

D.

Time Complexity

ˆ U i hp p n complexities of the various elements The circuit in the 1...k Sˆ |ψ i = B ˆ Zˆ1...k ˆ ˆ P (e) |Qi − P (e) Qe P Grover iterate G U e B ˆ = −AˆSˆ0 Aˆ† Sˆe are presented e 1...k B 1...k e |ei + 11...k in Table I. m complexity of the phase flipnoperator S0 (8) As the circuit P B h i p p k k (Se ) scales linearly with number of qubits nm (|E|), QGˆ is e¯ie ˆ ˆ e¯i n n i P (e)i |Qi |1i i + i 1 − P (e) Q1 i i=1 i = B Z1...k i=1 B preparation operator 0 dominated by the that of the state h p p i ˆ − P (e) |Qi |1n i + 1 − P (e) Q1n =B e 1...k e 1...k h p p i 1...k e B 1...k Qe . = − P (e) |Qi |ei + 1 − P (e) k k e¯i e¯i i i i in i=1 i i=1 n ni 1...k 1...k

ˆ Although Q ˆ scales exponentially with the number A. AP 0 preparation, Bayesian e of nodes n for general q-sample q-sample preparation on a network of bounded indegree e O(n2m ) scales linearly m is efficient. Namely, QAˆB = ˆ from a Bayesian netwith n nas in classical sampling U work. It takes O(P (e)−1/2 ) applications of AˆB to perform the rejection sampling algorithm from Section V and, n n n 1...k e B 1...k n a single sample from P (Q|E) can be obtained by a thus, ˆ quantum computer in time O(n2m P (e)−1/2 ). In Section The circuit diagram representing Se is shown in Fig. 2. n n n n that classical Bayesian inferenceB takesPtime II, we saw The k-qubit controlled phase can be constructed from 1...k 1...k O(nmP 0 (e)−1 ) toe generate a single sample. Thus, quanO(k) CNOTs and single qubit operators using O(k) ancil2 tum inference on a Bayesian network provides a squarelas [16] or, alternatively, single qubit ˆ n O(k ) CNOTs and n n 0 quantumecircuit G n speedup root over the classical case. The operators using no ancillas [26]. diagram for Bayesian inference is outlined in ˆFig. 3. G e e p pp ↵ p ↵

ˆ U

TABLE I: Circuit complexity QUˆ of imple ndereby . The controlled phase ure 3. The controlled = k-qubit P (e) |=ik-qubit |ei +P (e) P (¯ |e)can i |eiebe + phase .constructed P (¯ e)canebe .constructed sO(k) to CNOTs 5 bles ˆ VII.the CONCLUSION ators discussed in text. The and single qubit O(k) from O(k) CNOTs andoperators single qubit operators using A.using Although QAˆP O(k) scales exponentially with Grover the numb and 2 2 therefore provides a square root speedup described over ˆ q-sample as [4] or, alternatively, O(k ) ˆCNOTs single ancillas [4] or, alternatively, O(k ) ˆCNOTs andgeneral single oftheand nodes for preparation, Bayesi A. Although Q scales classical ˆ plitude amplification ofstructure a Bayes (Q-sample We have how the ofA a PBayesian net-exponen e circuit diagram The circuit representing diagram S representing in Seapproach. Figis n shown in shown Fige is shown operators using no ancillas qubit operators using[20]. no ancillas [20]. reby allows square-root, speedup in apstate preparation onforof aa network offor bounded indegree m nodes nquantum general q-sample . The k-qubit ure 3. controlled The k-qubit phase controlled can be phase constructed can bework constructed ˆ proximate inference. We explicitly constructed a quantwo instances of the preparation circuit A V. CONCLUSION m bles netefficient. Namely, Qfrom = O(n2 ) scales linearly with state preparation onrotations a network ofB O(k) CNOTs from O(k) and single CNOTs andoperators single using operators O(k) e¯1qubit e¯1qubit ˆB CNOT tumusing circuitO(k) and single qubit A X (8) X (8) and[4] or, 1 stance pre2 have showneach m We howthat the structure of a aBayesian net- from returns sample (Q|E = net. e) just of single S and SeaNamely, .PBayes The time toO(n2 collec as ancillas alternatively, [4] or, 1e¯alternatively, O(k 2 ) CNOTs O(k and )allows CNOTs single and aswork in classical sampling from Implementi 0efficient. Qusing = ˆB for a square root, speedup in p m quantum − A e ¯ 2 2 O(n2 P (e) ) gates. For more general probability displexapproximate inference. We explicitlyˆ(9) constructed a X2 using[20]. (9) iterate X2 the operators qubit using operators no ancillas no ancillas [20]. Grover Gqubit aclassical network as wea ha asfor inˆiterate sampling Ba tributions, the Grover would include a quantityfrom of quantum circuit from CNOT and single rotations B P (Q|E = e) is O(Q /Bayesian P (e)). tate G e¯3 e¯3 that returns a samplegates from exponential P (Q|E = e) using just in n, the number of random variables, ˆ netX3e¯1 X3e¯1 O(n2 P (e) (10) the Grover iterate GB for a Bayes ) gates. For more general(10) probability disand thus not be efficient. tributions, the Grover iterate would include a quantity This efficiency of our algorithm X X (8) (8) 1 1 preof gates exponential inimplies n, the number of random variexperimental possibilities. As a proof of princiefficient. As a proof of principal, e ¯ e¯2 ables, and thus not beple, 2 one could experimentally perform inference on a two plexX X (9) (9) aph, one could even experimentally perform inference on a two FIG. Quantum circuit for implementing 2 FIG. 3: 2: Quantum circuit for implementing the phase flip 2 op- the phase flip node Bayesian network with only two qubits with current node Bayesian network with only two qubits with current ˆ ˆ erator S with ancilla k qubit |0i to the zero operator Seone . The |E| initialized qubit controlled operae¯=3 operator e¯3 phase rtate all capabilities of ion trapcapabilities qubits. of ion trap qubits [27]. state. The k = |E|-qubitX To↵oli acts on the set of X (10) (10) tor acts onTime E. ItComplexity beArtificial compiled 3 D. qubits 3maydescribed intelligence and therefore learning tasks are provides a square ro evidence qubits E,the whereevidence an Complexity open(closed) circle denotes conD. Time Wemachine also SIAM placed the idea on of a q-sample into the broader and [1] W. J. Computing pp. 1484–15 often atP. least NP-hard.Shor, Although exponential speedups trol onO(k) the i qubit being |0i(|1i), and are determined condiinto CNOTs and single qubit operators given O(k) contextbyof an analogy between quantum states and clastional on the i evidence e = 1(0) in the k bit evidence bit on such problems are precluded BBBV [21], one might ancillas The evidence e = eby . . hope e2 e1for(1997). control tion k . the string e = e [16]. ...e . The circuit complexity values of Sˆ , dominated square root speedups, as we have found here, classical approach. [1] P. W. IfShor, SIAM J. on Co sical probability distributions. a qpdf can be found aph, that ofbit the flips k-qubitthrough To↵oli, in terms the e¯i ≡of number 1 − eiof. CNOTs and for a variety of tasks. For instance, binary classification ˆ Zˆ operations that is pure, can be q-sampled, and allows q-stochastic single-qubit X, is the O(k) if various provided withof O(k) ereracircuit complexities of elements in the The circuit complexities the various elements in the [2] L.haveK. Grover, inrecently, ANNUAL problems received a lot of attention even (1997). ACM SYMPOSIUM O all ancilla qubits, and O(k ) with no ancillas. with implementation on the D-Wave device [22]. ˆ = A ˆSˆ0 Aˆ† SˆG ˆe are ˆpresented ˆ0 Aˆ† Sˆe are mly er iterate G in Table I. Grover iterate = A S presented in Table I. COMPUTING (ACM, 1996), pp. 212–21 One THEORY way to classify data OF o✏ine is by learning de- K. Grover, [2] a L. in ANNUAL A D.Uˆ Time Complexity D. Comments Time Complexity and m Q cision tree given a training set. The classical algorithm 2 ), he circuitAscomplexity ofQ-sample the preparation phase of flipthe operator S0entropy the complexity flip operator Sset [3]phase A. Galindo M. Martin-Delgado, Rev. Mod. Ph 0A. Aˆ circuit O(2 ) involves calculating ofand the training andTHEORY reOF COMPUTING (A tion peated bifurcation of the training set, tasks which can Aˆ O(n2 ) Bayesian state preparation xity scales linearly with number of qubits n (|E|), Q is (S ) scales linearly with number of qubits n (|E|), Q is ˆ ˆ e 74, 347 (2002), URL http://link.aps.org/doi/1 be speeded up by quantum counting and amplitude amˆ [3]V. A. Galindo and M. A. Martin-D Gthe O(n) O(n) qubits of eeracircuit complexities TheSSˆ circuit of complexities theancilla various elements theplification various in elements inGthe CONCLUSION respectively. A di↵erent brand of learning is O(|E|) O(|E|) ancilla preparation qubits nated bydominated the that ofbythe state that of the preparation operator 1103/RevModPhys.74.347. † the † stateoperator 1 2

m

e

a

th

th

1

i

e

k

2

P

B

0

e

ˆ U n

m

1 2

7

a)

|0i

⌦n

AˆP

ˆ2 G

x

k

such as Metropolis-Hastings, Gibbs-sampling, and even Bayesian learning, could find square-root speedups in a similar manner to our results here.

updates, the quantum machine learning subfield would greatly benefit. Algorithms for many important routines,

Artificial intelligence and machine learning tasks are often at least NP-hard. Although exponential speedups on such problems are precluded by BBBV [28], one might hope for square-root speedups, as we have found here, for a variety of tasks. For instance, a common machine learning environment is online or interactive, in which the agent must learn while making decisions. Good algorithms in this case must balance exploration, finding new knowledge, with exploitation, making the best of what is already known. The use of Grover’s algorithm in reinforcement learning has been explored [29], but much remains to be investigated. One complication is that machine learning often takes place in a classical world; a robot is not usually allowed to execute a superposition of actions. One might instead focus on learning tasks that take place in a purely quantum setting. For instance, quantum error correcting codes implicitly gather information on what error occurred in order to correct it. Feeding this information back into the circuit, could create an adaptive, intelligent error correcting code.

[1] S. J. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 2nd Edition, Pearson Education, 2003. [2] M. Bensi, A. D. Kiureghian, D. Straub, Efficient bayesian network modeling of systems, Reliability Engineering and System Safety 112 (0) (2013) 200 – 213. [3] R. E. Neapolitan, Learning Bayesian Networks, illustrated edition Edition, Prentice Hall, 2003. [4] G. Cooper, E. Herskovits, A bayesian method for the induction of probabilistic networks from data 9 (4) (1992) 309–347. [5] N. Friedman, M. Goldszmidt, A. Wyner, Data analysis with bayesian networks: A bootstrap approach (1999). [6] F. V. Jensen, Bayesian Networks and Decision Graphs (Information Science and Statistics), Springer, 2001. [7] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, Equation of state calculations by fast computing machines, The Journal of Chemical Physics 21 (6) (1953) 1087–1092. [8] S. Chib, E. Greenberg, Understanding the MetropolisHastings Algorithm, The American Statistician 49 (4) (1995) 327–335. [9] P. Dagum, M. Luby, Approximating probabilistic inference in bayesian belief networks is np-hard, Artificial Intelligence 60 (1) (1993) 141 – 153. [10] V. K. Mansinghka, Natively Probabilistic Computation, Ph.D. thesis, MIT (2009). [11] E. Bernstein, U. Vazirani, Quantum complexity theory, in: in Proc. 25th Annual ACM Symposium on Theory of Computing, ACM, 1993, pp. 11–20. [12] S. Aaronson, Bqp and the polynomial hierarchy, in: Proceedings of the Forty-second ACM Symposium on Theory of Computing, STOC ’10, ACM, New York, NY, USA, 2010, pp. 141–150.

[13] P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer, SIAM J. on Computing (1997) 1484–1509. [14] L. K. Grover, A fast quantum mechanical algorithm for database search, in: ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, ACM, 1996, pp. 212–219. [15] A. Galindo, M. A. Martin-Delgado, Information and computation: Classical and quantum aspects, Rev. Mod. Phys. 74 (2002) 347–423. [16] M. A. Nielsen, I. L. Chuang, Quantum Computation and Quantum Information, 1st Edition, Cambridge University Press, 2004. [17] S. Jordan, Quantum algorithm zoo, http://math.nist.gov/quantum/zoo/. [18] M. Ozols, M. Roetteler, J. Roland, Quantum rejection sampling, in: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, ACM, New York, NY, USA, 2012, pp. 290–308. [19] L. Grover, Rapid sampling through quantum computing, STOC ’00. [20] A. Bookatz, Qma-complete problems arXiv:1212.6312. [21] G. Brassard, P. Høyer, M. Mosca, Quantum amplitude amplification and estimation, 2002, pp. 53–74. [22] D. Aharonov, A. Ta-Shma, Adiabatic quantum state generation and statistical zero knowledge, in: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, STOC ’03, ACM, New York, NY, USA, 2003, pp. 20–29. [23] V. Bergholm, J. J. Vartiainen, M. M¨ ott¨ onen, M. M. Salomaa, Quantum circuits with uniformly controlled onequbit gates, Phys. Rev. A 71 (2005) 052330. [24] D. M. Chickering, D. Heckerman, C. Meek, Large-sample learning of bayesian networks is np-hard, J. Mach. Learn.

b)

ˆ G

=

Sˆe

Aˆ†P

Sˆ0 AˆP

FIG. 3: a) Quantum Bayesian inference on Bayes net B for evidence E = e is done by repetition of the circuit shown, with k incrementing k = 0, 1, . . . , stopping when the measurement result x contains evidence bits e. Then x can be recorded as a sample from the conditional distribution P (Q|E). This corresponds to Alˆ gorithm 1. b) The constituents of the Grover iterate G, the state preparation AˆB and phase flip operators Sˆe and Sˆ0 . The state preparation operator is constructed from Theorem 2, and an example is shown in Fig. 1b. The phase flip operators are constructed as shown in Fig. 2.

8 Res. 5 (2004) 1287–1330. [25] M. Druzdzel, H. Simon, Causality in bayesian belief networks, in: In Proceedings of the Ninth Annual Conference on Uncertainty in Artificial Intelligence (UAI–93, Morgan Kaufmann Publishers, Inc, 1993, pp. 3–11. [26] M. Saeedi, M. Pedram, Linear-depth quantum circuits for n-qubit toffoli gates with no ancilla, arXiv preprint arXiv:1303.3557. [27] D. Hanneke, J. P. Home, J. D. Jost, J. M. Amini, D. Leibfried, D. J. Wineland, Realization of a pro-

grammable two-qubit quantum processor, Nat Phys 6 (1) (2010) 13–16. [28] C. H. Bennett, E. Bernstein, G. Brassard, U. Vazirani, Strengths and weaknesses of quantum computing, SIAM JOURNAL OF COMPUTATION (1997) 1510–1523. [29] D. Dong, C. Chen, H. Li, T. Tarn, Quantum reinforcement learning, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 38 (5) (2008) 1207– 1220.