Quantum Compressed Sensing Using 2-Designs

Report 2 Downloads 106 Views
Quantum Compressed Sensing Using 2-Designs Shelby Kimmel1 and Yi-Kai Liu1, 2

arXiv:1510.08887v1 [quant-ph] 29 Oct 2015

1

Joint Center for Quantum Information and Computer Science (QuICS), University of Maryland 2 National Institute of Standards and Technology (NIST), Gaithersburg, MD

We develop a method for quantum process tomography that combines the efficiency of compressed sensing with the robustness of randomized benchmarking. Our method is robust to state preparation and measurement errors, and it achieves a quadratic speedup over conventional tomography when the unknown process is a generic unitary evolution. Our method is based on PhaseLift, a convex programming technique for phase retrieval. We show that this method achieves approximate recovery of almost all signals, using measurements sampled from spherical or unitary 2-designs. This is the first positive result on PhaseLift using 2-designs. We also show that exact recovery of all signals is possible using unitary 4-designs. Previous positive results for PhaseLift required spherical 4-designs, while PhaseLift was known to fail in certain cases when using spherical 2-designs.

I. A.

INTRODUCTION

Fast, robust quantum process tomography

Quantum process tomography is an important tool for the experimental development of large-scale quantum information processors. Process tomography is a means of obtaining complete knowledge of the dynamical evolution of a quantum system. This allows for accurate calibration of quantum gates, as well as characterization of qubit noise processes, such as dephasing, leakage, and cross-talk. These are the types of error processes that must be understood and ameliorated in order build scalable quantum computers with error rates below the fault-tolerance threshold. Process tomography is challenging for (at least) two reasons. First, it requires estimating a large number of parameters: for a system of n qubits, the Hilbert space has dimension d = 2n , and a general quantum process has approximately d4 = 24n degrees of freedom. For example, to characterize the cross-talk between a pair of two-qubit gates acting on n = 4 qubits, using the most general approach, one must estimate ∼ 216 parameters. To address this issue, several authors have proposed methods based on compressed sensing, which exploit sparse or lowrank structure in the unknown state or process, to reduce the number of measurements that must be performed [1, 11, 12, 18, 20, 34, 35]. The second challenge is that, in most real experimental setups, one must perform process tomography using state preparation and measurement devices that are imperfect. These devices introduce state preparation and measurement errors (“SPAM errors”), which limit the accuracy of process tomography. This limit is encountered in practice, and often produces non-physical estimates of processes [19]. Perhaps surprisingly, there are methods

that are robust to SPAM errors, such as randomized benchmarking [21, 22, 27, 28], and gate set tomography [15, 30, 36]. However, there seem to be few methods for full process tomography that are both fast and robust to SPAM errors. In this paper we develop a new compressed sensing method that combines both attributes. One one hand, our method achieves a quadratic speedup over conventional tomography when applied to almost any unitary process. That is, our method works to characterize processes that consist of unitary evolution, E : ρ 7→ U ρU † ,

(I.1)

for almost all choices of U in the unitary group (i.e., with probability exponentially close to 1, when U is sampled at random from the Haar measure). On the other hand, our method makes use of measurements that can be extracted using randomized benchmarking protocols, and so these observables are robust to SPAM errors. B.

The problem

Let E be some quantum process that acts on a Hilbert space of dimension d that we wish to learn. Using randomized benchmarking, we can estimate quantities of the form tr(E † C), where C : ρ 7→ CρC † can be any Clifford operation [21] or more generally, any element in a unitary 2-design group of dimension d. We can rewrite this as tr(E † C) = d2 tr(J(E)J(C)),

(I.2)

where J(E) and J(C) are the Choi-Jamiolkowski states corresponding to E and C. If E is a unitary process, as in equation (I.1), then J(E) is a pure state, described by a rank-1 density matrix: J(E) = (U ⊗ I)|Φ+ ihΦ+ |(U † ⊗ I)

(I.3)

2 Pd where |Φ+ i = √1d i=1 |iii. Learning low rank matrices is a hallmark of compressed sensing techniques. Therefore, one natural approach is to try to reconstruct J(E) by solving a convex program: find a matrix that minimizes the trace norm, subject to linear constraints given by the measurements (I.2). When this technique works, it leads to a quadratic speedup compared to conventional tomography: the number of measurement settings needed to reconstruct J(E) is only O(d2 poly log d), rather than d4 . The success of this approach depends crucially on the type of measurements being used. Unfortunately, it turns out that none of the previous results on compressed tomography will work with the measurements shown in (I.2). Most previous work [12, 16, 18, 25] used Pauli measurements, or more generally, any observables √ P that have small operator norm, kP k ≤ O(1/ d)kP kF . (Here, kP kF = p tr(P † P ) is the Frobenius norm.) Clearly, the observables J(C) appearing in (I.2) do not satisfy this condition, since they have rank 1. Another approach is to view the measurements in equation (I.2) as an example of the phase retrieval problem. Using (I.3), we can write the measurements as 2 tr(E † C) = d2 hΦ+ |(U † ⊗ I)(C ⊗ I)|Φ+ i , (I.4)

so each measurement projects the unknown “signal vector” (U ⊗ I)|Φ+ i onto a known “measurement vector” (C ⊗ I)|Φ+ i, and returns the squared absolute value of this quantity, but not its phase. The problem of reconstructing U from this information is called phase retrieval. One method for phase retrieval, called PhaseLift, works by “lifting” this task to a low-rank matrix recovery problem, then solving a convex relaxation that involves trace-norm minimization [2, 5, 17, 24]. When PhaseLift works, it leads to a quadratic speedup, similar to that obtained using low-rank compressed tomography. Again, the success of PhaseLift depends on the types of measurements that can be performed. For our purposes, it is useful to consider measurement vectors that are sampled from spherical or unitary t-designs. (A spherical t-design is a distribution of vectors that reproduces the tth -order moments of the uniform distribution on the sphere. A unitary tdesign is a distribution of matrices that reproduces the tth -order moments of the Haar distribution on the unitary group.) PhaseLift is known to succeed for phase retrieval and for reconstructing low rank matrices when the measurement vectors are sampled from a spherical 4-design [24]. However, it is also known that PhaseLift can fail when the measurements are sampled from a spherical 2-design [17].

For applications to randomized benchmarking, we are particularly interested in unitary 2-designs. This is because randomized benchmarking can be used to extract observables of the form of Eq. (I.2), where C is an element of a unitary 2-design group. Thus, it is a natural question to ask whether unitary 2designs are sufficient to recover unitary maps using PhaseLift. Setting aside the qualitative difference between spherical and unitary designs, the No-Go results for spherical 2-designs with PhaseLift suggests that PhaseLift is unlikely to succeed when combined with unitary 2-design only.

C.

Our solution

Our solution is to develop a variant of PhaseLift that uses measurements sampled from any 2-design, and achieves approximate recovery of almost all signals. Here, approximate recovery means that our method recovers the true signal up to an additive error of constant size δ, where δ can be chosen arbitrarily small, and δ can be independent of the dimension d. Our method recovers all signals that are non-spiky, and we show that almost all signals have this property. (This notion of non-spikiness is reminiscent of previous work on low-rank matrix completion [4, 16, 31].) This result helps answer the question of what kinds of measurements are sufficient for PhaseLift to succeed. In a sense, 2-designs are almost sufficient: while PhaseLift can sometimes fail using 2-design measurements, our result shows that these failures occur very infrequently. We prove two versions of our result, for spherical and unitary 2-designs. Both proofs use the same strategy as in [24], based on Mendelson’s small-ball method (see also [37]). However, our proofs require some new techniques. Since we only use spherical 2designs, rather than spherical 4-designs as in [24], we must handle certain fourth-moment calculations differently, relying instead on the non-spikiness properties of the unknown signal. Furthermore, we extend these techniques from spherical 2-designs to unitary 2-designs, which requires bounding certain second moments of the Haar distribution on the unitary group. As far as we are aware, these are the first phase retrieval results using unitary (rather than spherical) designs. While our result achieves approximate recovery, this is in contrast to previous results (e.g. [3, 5, 16]), which achieve exact recovery. By exact recovery, we mean that when there is no additive noise in the measurement, the procedure recovers the signal exactly with high probability.

3 D.

Unitary 3-Designs and Unitary 4-Designs

A natural question is whether unitary 3- or 4designs give better recovery than unitary 2-designs. We know Phaselift succeeds in the recovery of low rank matrices when the measurement vectors are sampled from a spherical 4-design, and achieves a square-root speedup in the number of measurement settings [24]. In Appendix B, we show that by sampling from a unitary 4-design, we can use Phaselift to uniformly recover all unitary maps (without the non-spikiness condition, and with exact rather than approximate recovery), and achieve a square root reduction in the number of required measurement settings compared to standard tomography. We use a strategy similar to the one we use for 2designs to prove unitary 4-designs can be used for recovery of unitary processes. While in the case of 2-designs, we could not bound the 4th moment terms directly, and had to take advantage of the nonspikiness condition, in the case of unitary 4-designs, we can bound the 4th moment terms by averaging over Haar random unitaries. We provide new techniques for calculating and bounding these 4th moment terms using Weingarten functions [7, 8, 33] and commutative diagrams. It is an open question what recovery is possible with unitary 3-designs. Spherical 3-designs have been shown to give some improvement in terms of the number of experiments needed for phase retrieval [17], but they have not been proven to perform as well as spherical 4-designs. The question of recovery with unitary 3-designs has recently become an extremely relevant question, as it was proven that Cliffords (a known unitary 2-design [9]) form a unitary 3-design [39, 40] [41]. While in theory any unitary 2-design that is also a group can be used to perform randomized benchmarking, in practice, Clifford unitaries are the unitary 2-design of choice for randomized benchmarking. Clifford operations are a group and can be efficiently sampled, implemented, and simulated [10, 14], which makes randomized benchmarking experiments both easy and efficient to implement. It is possible that better recovery results regarding randomized benchmarking can be obtained using the fact that Cliffords are a 3-design. It is known that Cliffords are not a 4-design [39, 40], so our recovery results for unitary 4-designs will not be applicable. While it may be possible to achieve better recovery bounds using Cliffords because they form a 3design, recovery bounds using 2-designs may still be of interest. For example, there is a subset of the Clifford group (that may also be subgroup), which is more efficient to implement than the full Clifford group, but is still a unitary 2-design [6].

E.

Open Questions

As mentioned in Section I D, we are particularly interested in whether compressed sensing is possible with unitary 3-designs. We are also curious whether for 2-designs the approximate recovery bounds are necessary, or whether they are just an artifact of our analysis. Finally, we only prove recovery for unitaries, but we expect these results can be extended to quantum maps with low rank Choi representations. F.

Notation

Let Cd×d Herm be the set of d × d complex Hermitian matrices. Let L(Cd×d , Cd×d ) be the set of linear maps from Cd×d to Cd×d . We will use calligraphic letters to denote these maps, e.g., E, C. In general we will consider unitary maps, where E : ρ → U ρU † for a unitary U , and C : ρ → CρC † for a unitary C. We will us E to represent the unknown map we want to recover, and C to represent the measurement maps we compare with E. Thus sometimes C will represent an element of a unitary 2-design, and sometimes it will represent an elemento of a unitary 4-design. For any matrix A, let A† be its adjoint, let AT be its transpose, and let A∗ be its complex conjugate. For any matrix A, with singular values σ1 (A) ≥ σ2 (A) ≥ · · · ≥ σd (A) ≥ 0, letpkAk = σ1 (A) be P the operator norm, let kAkF = Pi σi2 (A) be the Frobenius norm, and let kAk∗ = i σi (A) be the nuclear or trace norm. Because we use very similar approaches for the three cases of spherical 2-designs, unitary 2-designs, and unitary 4-designs, we will have similar notation in each section. In general, un-addorned notation (e.g. f , A) will be used for spherical 2-designs, no˜ will be used for unitary tation with tildes (e.g. f˜, A) ˆ will be 2-designs, and notation with hats (e.g. fˆ, A) used for unitary 4-designs. II.

LOW-RANK MATRIX RECOVERY USING SPHERICAL 2-DESIGNS

We consider the problem of reconstructing an unknown matrix X ∈ Cd×d Herm , having rank at most r, from quadratic measurements of the form yi = wi† Xwi + εi ,

i = 1, 2, . . . , m,

(II.1)

where the measurement vectors wi ∈ Cd are known and are sampled independently at random from a spherical 2-design, and the εi ∈ R are unknown contributions due to additive noise.

4 This problem has been studied previously, particularly in the rank-1 case (setting r = 1), where it corresponds to phase retrieval [17, 24]. Much of this previous work involves measurement vectors that are chosen from spherical t-designs using larger values of t, which provide a better approximation to the Haar-random distribution on the unit sphere. Roughly speaking, it is known for the rank-1 case, that 4-designs are sufficient to recover X efficiently [24], whereas there exists a 2-design that can not recover X efficiently [17]. More precisely, X can be recovered (uniformly), via convex relaxation, from m = O(rd poly log d) measurements chosen from a spherical 4-design; while on the other hand, X cannot be recovered, by any method, from fewer than m = Ω(d2 ) measurements chosen from a particular spherical 2-design. Here, we show that 2-designs are sufficient to recover a large subset of all the rank-r matrices in Cd×d Herm . More precisely, we show that 2-designs achieve efficient approximate recovery of all lowrank matrices X that are non-spiky with respect to the measurement vectors (we will define this more precisely below). This implies that 2-designs are sufficient to recover generic (random) low-rank matrices X, since these matrices satisfy the nonspikiness requirement with probability close to 1. Here, “efficient approximate recovery” means recovery up to an arbitrarily small constant error, using m = O(rd poly log d) measurements, by solving a convex relaxation. This is reminiscent of results on non-spiky low-rank matrix completion [4, 16, 31]. A.

Non-spikiness condition

Let G be a finite set of vectors in Cd , each of length 1. We say that a matrix X ∈ Cd×d Herm is non-spiky with respect to G (with parameters β0 ≤ 0 ≤ β1 ) if the following holds: β0 β1 ≤ w† Xw ≤ , d d

∀w ∈ G.

(II.2)

Generally speaking, we will say that X is non-spiky when the parameters β0 and β1 are much smaller than d, e.g., of size poly(log d). We now show that, when the set G is not too large, almost all rank-r matrices X ∈ Cd×d Herm are non-spiky with respect to G. Proposition II.1. Fix some rank-r matrix W ∈ Cd×d Herm , and construct a random X by setting X = U W U † , where U ∈ Cd×d is a Haar-random unitary operator. Write W as a sum of positive-semidefinite and negative-semidefinite parts, W = W0 + W1 , where W0  0  W1 .

Then for any t ≥ 0, with probability at least 1 − 2e−t (over the choice of U ), X is non-spiky with respect to G, with parameters βi = 9π 3 (t+ ln|G|+ ln r) tr(Wi ),

i ∈ {0, 1}. (II.3)

Proof: Let S d−1 denote the unit sphere in Cd . Fix any vector w ∈ S d−1 , and let x be a uniformly random vector in S d−1 . Using Levy’s lemma [29, 32], we have that  dε2 (II.4) Pr[|w† x| ≥ ε] ≤ 2 exp − 9π 3 . q 3 Setting ε = 9πd (t + ln|G| + ln r), we get that Pr[|w† x| ≥ ε] ≤ 2e−t |G|−1 r−1 .

(II.5)

Now recall that X = U W U † , and write the specPr tral decomposition of W as W = i=1 λi vi vi† . Then for any w ∈ G, we can write w† Xw as follows: w† Xw = w† U W U † w =

r X i=1

λi |w† U vi |2 .

(II.6)

Each vector U vi is uniformly random in S d−1 , hence it obeys the bound in (II.5). Using the union bound over all w ∈ G and all v1 , . . . , vr , we conclude that: |w† U vi | ≤ ε,

∀w ∈ G, ∀i = 1, . . . , r,

(II.7)

with failure probability at most 2e−t . Combining this with (II.6) proves the claim.  We will be interested in cases where the vectors in G form a spherical 2-design in Cd . In these cases, the set G can be relatively small, i.e., sub-exponential in d. As an example, let d = 2n , and let G be the set of stabilizer states on n qubits, so we have 2 |G| ≤ 4n . Also suppose that W is a density matrix representing a quantum state, so we have W0 = 0 and tr(W1 ) = 1. Then with high probability, X is non-spiky with respect to G, with parameters β0 = 0 and β1 = O(lg2 d). B.

Convex relaxation

In the following sections, it will be convenient to renormalize the measurement vectors to be more like Gaussian random vectors. To that end, let us define new measurement vectors ai = [(d + 1)d]1/4 wi ,

i = 1, . . . , m.

(II.8)

We also let A be the renormalized version of the spherical 2-design G, A = {[(d + 1)d]1/4 w | w ∈ G}.

(II.9)

5 We can rewrite the non-spikiness conditions on X as follows: q q β0 1 + d1 ≤ a† Xa ≤ β1 1 + d1 , ∀a ∈ A. (II.10) We define the sampling operator A : Cd×d Herm → R , m

A(Z) = a†i Zai

m

i=1

.

(II.11)

Using this notation, the measurement process returns a vector y = A(X) + ε,

kεk2 ≤ η,

(II.12)

where we assume we know an upper-bound η on the size of the noise term. We will solve the following convex relaxation: arg min kZk∗ such that Z∈Cd×d Herm

kA(Z) − yk2 ≤ η, q q β0 1 + d1 ≤ a† Za ≤ β1 1 + d1 ,

C.

∀a ∈ A. (II.13)

Approximate recovery of non-spiky low-rank matrices

We show the following uniform recovery guarantee for the convex relaxation shown in equation (II.13): Theorem II.2. Consider the above scenario where m measurement vectors a1 , . . . , am are chosen independently at random from a (renormalized) spherical 2-design A ⊂ Cd . Let 1 ≤ r ≤ d and β0 ≤ 0 ≤ β1 . Fix any δ > 0 and C0 > 169. Suppose that the number of measurements satisfies m≥



64(β1 − β0 )2 (1 + d1 ) 9δ 2

2

· C0 rd log(2d).

(II.14) Then with probability at least 1 − e−2rd log(2d) (over the choice of the ai ), we have the following uniform recovery guarantee: For any rank-r matrix Xtrue ∈ Cd×d Herm that is nonspiky with respect to A (in the sense of (II.10)), any solution Xopt to the convex program (II.13) with noisy measurements (II.12) must satisfy:

The proof follows the same strategy used in [24] to show low-rank matrix recovery using spherical 4-design measurements, by means of Mendelson’s small-ball method [37]. The key difference is that our measurements are spherical 2-designs only, so we do not have control over their fourth moments. Instead, we use the non-spikiness properties of Xtrue and Xopt to bound the fourth moments that appear in the proof. This allows us to show approximate recovery of Xtrue , up to an arbitrarily small constant error δ. We will present this proof in several steps.

D.

Approximate recovery via modified descent cone

We begin by defining a modified version of the descent cone used in [24, 37]. Let f : Rd → R ∪ {∞} be a proper convex function, and let xtrue ∈ Rd and δ ≥ 0. Then we define the modified descent cone D′ (f, xtrue , δ) as follows: D′ (f, xtrue , δ) = {y ∈ Rd | ∃τ > 0 such that f (xtrue + τ y) ≤ f (xtrue ), kτ yk2 ≥ δ}. (II.16) This is the set of all directions, originating at the point xtrue , that cause the value of f to decrease, when one takes a step of size at least δ. Note that this is a cone, but not necessarily a convex cone. We use this to state an approximate recovery bound, analogous to Prop. 7 in [24] and Prop. 2.6 in [37]. Let y be a noisy linear measurement of xtrue , given by y = Φxtrue + ε, where Φ ∈ Rm×d and kεk2 ≤ η. Then let xopt be a solution of the convex program arg min f (z) such that kΦz − yk2 ≤ η. z∈Rd

(II.17)

Then we have the following recovery bound:   2η , kxopt − xtrue k2 ≤ max δ, λmin (Φ; D′ (f, xtrue , δ)) (II.18) where λmin is the conic minimum singular value, λmin (Φ; D′ (f, xtrue , δ)) = inf{kΦuk2 | u ∈ S d−1 ∩ D′ (f, xtrue , δ)}. (II.19)

 This is proved easily, using the same argument as in  2η p . [24, 37]. kXopt − Xtrue kF ≤ max δ, √ ( C0 − 13) rd log(2d) We now re-state this bound for the setup in The(II.15) orem II.2. Here the function f : Cd×d Herm → R ∪ {∞}

6 is given by

f (Z) =

and we use

  kZk∗  ∞

if z is non-spiky in the sense of (II.10), otherwise,

(II.20)

where we define n o Pr [|tr(aa† Y )| ≥ ξ] , (II.26) Qξ (E) = inf Y ∈E a∼A   m 1 X sup tr(HY ) , H = √ Wm (E) = E εi ai a†i , ǫi ∼±1 Y ∈E m i=1 ai ∼A

D(f, Xtrue , δ) = {Y ∈Cd×d Herm | ∃τ > 0 such that f (Xtrue + τ Y ) ≤ f (Xtrue ), kτ Y kF ≥ δ}. (II.21)

(II.27)

where ε1 , . . . , εm are Rademacher random variables. F.

Lower-bounding Qξ (E)

We will lower-bound Qξ (E) as follows. Fix some Y ∈ E. We know that Y is in E(Xtrue ), for some Our recovery bound is: X ∈ Cd×d true Herm that has rank at most r and is non  spiky in the sense of (II.10). Hence we know that 2η , kY k = 1, and there exists some τ > 0 such that kXopt − Xtrue kF ≤ max δ, F λmin (A; D(f, Xtrue , δ)) Xtrue + τ Y is non-spiky in the sense of (II.10), and (II.22) we have and we want to lower-bound the quantity λmin : kXtrue + τ Y k∗ ≤ kXtrue k∗ ,

λmin (A; D(f, Xtrue , δ)) 1/2 X m † 2 , |tr(ai ai Y )| = inf Y ∈E(Xtrue )

i=1

where we define E(Xtrue ) = {Y ∈ D(f, Xtrue , δ) | kY kF = 1}. (II.23)

kτ Y kF ≥ δ. (II.28)

Note that we have kτ Y kF = τ ≥ δ. We will lower-bound Pr[|tr(aa† Y )| ≥ ξ], for any ξ ∈ [0, 1], using the Paley-Zygmund inequality, and appropriate bounds on the second and fourth moments of tr(aa† Y ). Let us define a random variable S ≡ | tr(aa† τ Y )|. We can lower-bound E(S 2 ), using the same calculation as in Prop. 12 of [24]: E(S 2 ) = tr(τ Y )2 + kτ Y k2F ≥ 0 + τ 2 .

(II.29)

4

E.

Mendelson’s small-ball method

In order to lower-bound λmin , we will use Mendelson’s small-ball method, following the steps described in [24] (see also [37]). We will prove a uniform lower-bound that holds for all Xtrue simultaneously. We define E=

[

E(Xtrue ),

(II.24)

Xtrue

where the union runs over all Xtrue ∈ Cd×d Herm that have rank at most r and satisfy the non- spikiness conditions (II.10). We then take the infimum over all Y ∈ E. Using Mendelson’s small-ball method (Theorem 8 in [24]), we have that for any ξ > 0 and t ≥ 0, with 2 probability at least 1 − e−2t , inf

Y ∈E

X m i=1

|tr(ai a†i Y

)|

2

1/2

√ ≥ ξ mQ2ξ (E) − 2Wm (E) − ξt, (II.25)

To handle E(S ), we need to use a different argument from [24], since our a is sampled from a spherical 2-design, not a 4-design. Our solution is to upper-bound S, using the non-spikiness properties of Xtrue and Xtrue + τ Y : S = − tr(aa† Xtrue ) + tr(aa† (Xtrue + τ Y )) q ≤ (β1 − β0 ) 1 + d1 . (II.30) This implies an upper-bound on E(S 4 ): E(S 4 ) ≤ (β1 − β0 )2 (1 + d1 )E(S 2 ).

(II.31)

Putting it all together, we have: Pr [| tr(aa† Y )| ≥ ξ]

a∼A

= Pr[S 2 ≥ τ 2 ξ 2 ]

≥ Pr[S 2 ≥ ξ 2 E(S 2 )] ≥ (1 − ξ 2 )2

E(S 2 )2 E(S 4 )

≥ (1 − ξ 2 )2

τ2 . (β1 − β0 )2 (1 + d1 )

(II.32)

7 Finally, using the fact that τ ≥ δ, we have that Q1/2 (E) ≥ (1 − ξ 2 )2 G.

2

δ . (β1 − β0 )2 (1 + d1 )

(II.33)

Upper-bounding Wm (E)

Next, we will upper-bound Wm (E), using essentially the same argument as in [24]. Recall that the argument in [24] showed an upper-bound on Wm (F ), where F was a slightly different set than our E, and the ai were sampled from a spherical 4-design rather than a 2-design. The argument used the following steps: Wm (F ) = E sup tr(HY ) Y ∈F √ ≤ E 2 rkHk p √ ≤ 2 r · (3.1049) d log(2d),

(II.34)

using Lemma 10 and Prop. 13 in [24], and provided that m ≥ 2d log d. The first step still works to upper-bound Wm (E), because E ⊂ F . To see this, recall that the set F was defined as follows: [ F = F (Xtrue ), (II.35) Xtrue

where the union was over all Xtrue ∈ Cd×d Herm with rank at most r, and F (Xtrue ) = {Y ∈ D(k·k∗ , Xtrue , 0) | kY kF = 1}. (II.36) This can be compared with our definition of the set E in (II.24) and (II.23). The second step still works for our choice of the ai , because Prop. 13 in [24] does not use the full power of the spherical 4-design. In fact it only requires that the ai are sampled from a spherical 1-design (because the proof relies mainly on the Rademacher random variables εi ). Thus we conclude that p √ Wm (E) ≤ 2 r · (3.1049) d log(2d). (II.37) H.

III.

QUANTUM PROCESS TOMOGRAPHY USING UNITARY 2-DESIGNS

A.

Robust measurements using randomized benchmarking

A quantum process is represented by a completelypositive trace-preserving linear map E : Cd×d → Cd×d .

(III.1)

Quantum process tomography is the problem of reconstructing E from experimental measurements. Quantum process tomography can be carried out using randomized benchmarking techniques [21]. This approach is advantageous because it is robust to state preparation and measurement errors. The basic idea is to use randomized benchmarking to estimate the quantity tr(C † E), for different Clifford operations C : ρ 7→ CρC †

(III.2)

where C ∈ Cd×d is a unitary operation belonging to the Clifford group. Using these measurements, one can reconstruct the “unital part” of E. This is the part of E that lies in the “unital subspace” consisting of all linear maps that take the identity I ∈ Cd×d to itself. In many cases of interest, such as unitary operations or dephasing noise, E lies entirely within the unital subspace, and so learning the unital part of E is sufficient. B.

Reconstruction of unitary processes

Final result

Combining equations (II.23), (II.25), (II.33) and (II.37), and setting ξ = 41 , we get that: inf λmin (A; D(f, Xtrue , δ))

Xtrue

9δ 2 ≥4 m 16(β1 − β0 )2 (1 + 1d ) p − (12.44) rd log(2d) − 14 t p p ≥ ( C0 − 12.69) rd log(2d), √ 1

where we set   64(β1 − β0 )2 (1 + 1d ) 2 m≥ · C0 rd log(2d), 9δ 2 (II.39) p and t = rd log(2d). This can be plugged into our approximate recovery bound (II.22). This finishes the proof of Theorem II.2. 

(II.38)

We will focus on the special case where E is a unitary operation, E : ρ 7→ U ρU † ,

(III.3)

for some unitary U ∈ Cd×d . In this case E has a low-rank structure, which we will exploit in a way that is analogous to our earlier results in Section II. It will be convenient to use the Choi-Jamiolkowski representation of E. This is the quantum state that is obtained by applying E to one half of a maximally

8 √ Pd entangled state |Φ+ i = (1/ d) i=1 |ii ⊗ |ii. We d2 ×d2 can write it as a density matrix J(E) ∈ CHerm , J(E) = (E ⊗ I)(|Φ+ ihΦ+ |) =

d 1 X E(|iihj|) ⊗ |iihj|. d i,j=1

(III.4)

β=

When E is a unitary operation, it is easy to see that J(E) is a pure state, with rank 1: J(E) = (U ⊗ I)|Φ+ ihΦ+ |(U † ⊗ I).

(III.5)

In addition, the measurements performed via randomized benchmarking can be rewritten in terms of the Jamiolkowski state, and then in terms of the unitaries U and C: tr(C † E) = d2 tr(J(E)J(C)) †

2

= |tr(U C)| .

(III.6)

(See Appendix A for details.) This reveals the similarity with the quadratic measurements discussed in Section II. Here, our measurements make use of unitary matrices C that belong to the Clifford group, which is a unitary 2-design. It is not known whether these measurements are sufficient to reconstruct all unitary processes. (In Appendix B, we show that measurements chosen from a unitary 4-design are sufficient for this purpose, while one may suspect that measurements chosen from a unitary 2-design are not sufficient, in light of the counterexamples for spherical 2-designs mentioned earlier.) We will show that measurements chosen from a unitary 2-design are sufficient to reconstruct almost all unitary processes. To do this, we will first show that almost all unitary processes have a certain nonspikiness property. Then we will show that measurements chosen from a unitary 2-design are sufficient to reconstruct all non-spiky unitary processes. C.

Non-spikiness condition

˜ be a finite set of unitary matrices in Cd . For Let G ˜ let C : ρ 7→ CρC † be the corresponding any C ∈ G, quantum process. We say that a quantum process ˜ E : Cd×d → Cd×d is non-spiky with respect to G (with parameter β ≥ 0) if the following holds: β ˜ , ∀C ∈ G. (III.7) d2 Generally speaking, we will say that E is non-spiky when β ≪ d, e.g., β ≤ poly(log d). ˜ is not too We now show that, when the set G large, almost all unitary processes E : ρ 7→ U ρU † are ˜ non-spiky with respect to G. 0 ≤ tr(J(E)J(C)) ≤

Proposition III.1. Choose U ∈ Cd×d to be a Haarrandom unitary operator, and let E : ρ 7→ U ρU † . Then for any t ≥ 0, with probability at least 1 − 4e−t (over the choice of U ), E is non-spiky with ˜ with parameter respect to G, 9π 3 2 (t

˜ + ln|G|).

(III.8)

˜ Using Levy’s lemma [26], and Proof: Fix any C ∈ G. noting that the function√U 7→ tr(U † C) has Lipshitz coefficient η ≤ kCkF = d, we have that  Pr[|tr(U † C)| ≥ ε] ≤ 4 exp − 9π2 3 ε2 . (III.9) q 3 ˜ we get that Setting ε = 9π2 (t + ln|G|), ˜ −1 . Pr[|tr(U † C)| ≥ ε] ≤ 4e−t |G|

(III.10)

˜ we conclude Using the union bound over all C ∈ G, that: |tr(U † C)| ≤ ε,

˜ ∀C ∈ G,

(III.11)

with failure probability at most 4e−t . Combining this with equation (III.6) proves the claim.  We will be interested in cases where the vectors ˜ form a unitary 2-design in Cd×d . In these in G ˜ can be relatively small, i.e., subcases, the set G exponential in d. As an example, let d = 2n , and let ˜ be the set of Clifford operations on n qubits, so G ˜ ≤ 22n2 +3n . Then with high probability, we have |G| ˜ with parameter E is non-spiky with respect to G, 2 β = O(lg d). D.

Convex relaxation

Let E : Cd×d → Cd×d be an unknown unitary process that we want to learn. We will reconstruct d2 ×d2 the Choi- Jamiolkowski state J(E) ∈ CHerm . ˜ ⊂ Cd×d be a unitary 2-design. First, we let G ˜ We choose C1 , . . . , Cm uniformly at random from G, † and for each Ci , we let Ci : ρ 7→ Ci ρCi be the corresponding quantum process. Next we define the sampling operator A˜ that describes our measurement procedure. We define d2 ×d2 A˜ : CHerm → Rm , which acts on Choi-Jamiolkowski states ρ, as follows: h im ˜ A(ρ) = tr(ρJ(Ci )) i=1 (III.12) h im † + = hΦ |(Ci ⊗ I)ρ(Ci ⊗ I)|Φ+ i . i=1

Note that in Section II, equation (II.8), we scaled the measurement vectors ai ∈ Cd to have length

9 √ ≈ d, similar to Gaussian random vectors. Here, we have not scaled the measurement vectors in this fashion. Each measurement vector (Ci ⊗ I)|Φ+ i ∈ 2 Cd has length 1, whereas a Gaussian random vector would have length ≈ d. Using this notation, our measurement procedure returns a vector ˜ y = A(J(E)) + ε,

kεk2 ≤ η/d2 ,

min 2

×d ρ∈Cd Herm

2

tr2 (J(Z)) =

d 1 X Z(|iihj|) tr(|iihj|) d i,j=1 d

=

(III.13)

where we assume we know an upper-bound η/d2 on the size of the noise term ε. Note that this bound is analogous to equation (II.12) in Section II. In particular, η plays the same role in both equations, and the extra factor of 1/d2 in equation (III.13) is simply due to the different normalization of the measurement vectors (Ci ⊗ I)|Φ+ i. In order to reconstruct E, we will solve the following convex relaxation: arg

The third constraint says that

1X Z(|iihi|) = Z(I/d) d i=1

∝ I.

(III.17)

We can interpret this as a “unitality” constraint: if the input to Z is the maximally mixed state, then the output must be proportional to the maximally mixed state.

E.

Approximate recovery of non-spiky unitary processes

tr(ρ) such that

˜ − yk2 ≤ η/d2 , kA(ρ) β ˜ 0 ≤ tr(ρJ(C)) ≤ 2 , ∀C ∈ G, d ρ  0, tr1 (ρ) = tr(ρ) I/d, tr2 (ρ) = tr(ρ) I/d.

We show the following uniform recovery guarantee for the convex relaxation shown in equation (III.14): (III.14)

In the last two lines, tr1 (·) denotes the partial trace over the first d-dimensional subsystem of the ChoiJamiolkowski state, while tr2 (·) denotes the partial trace over the second d-dimensional subsystem. The last three constraints ensure that ρ is (proportional to) the Choi-Jamiolkowski state J(Z) of some map Z that is completely positive, trace preserving, and unital. The third to last constraint says that J(Z)  0, which enforces complete positivity. The second constraint says that d 1 X tr1 (J(Z)) = tr(Z(|iihj|)) ⊗ |iihj| d i,j=1 X |iihi|. (III.15) ∝

Theorem III.2. Consider the above scenario where m measurement matrices C1 , . . . , Cm are chosen in˜⊂ dependently at random from a unitary 2-design G Cd×d . Then there exists a numerical constant c5 > 0 such that the following holds. Let β ≥ 0. Fix any δ > 0 and c0 > (2c5 + 1)2 . Suppose that the number of measurements satisfies m≥



64β 2 9δ 2

2

· c0 d2 ln d.

(III.18) 2

Then with probability at least 1 − e−2d ln d (over the choice of the Ci ), we have the following uniform recovery guarantee: For any unitary process E ∈ L(Cd×d , Cd×d ) that is ˜ (in the sense of (III.7)), non-spiky with respect to G any solution ρopt to the convex program (III.14) with noisy measurements (III.13) must satisfy:

i

(III.16)

  2η √ kρopt − J(E)kF ≤ max δ, √ . ( c0 − 2c5 − 14 )d ln d (III.19)

where δij is the Kronecker delta function. We can interpret this as a “trace preservation” constraint, because if the input to Z has trace 0, then the output must also have trace 0, while if the input has trace 1, then the output must have trace k, for some constant k.

Note that this error bound scales similarly to what one would get by applying Theorem II.2 to rank-1 matrices of size d2 × d2 . We will present this proof in several steps, following the same strategy as in Section II.

To meet this requirement, we see that we need tr(Z(|iihj|)) ∝ δij

10 F.

where ε1 , . . . , εm are Rademacher random variables.

Modified descent cone

We define the follows:  tr(ρ)     f˜(ρ) =     ∞

d2 ×d2 function f˜ : CHerm → R ∪ {∞} as

if ρ is non-spiky in the sense of (III.7), and also satisfies the last three constraints in (III.14), otherwise. (III.20) Our recovery bound is:   2η/d2 kρopt − J(E)kF ≤ max δ, , ˜ D(f˜, J(E), δ)) λmin (A; (III.21) and we want to lower-bound the quantity λmin : X 1/2 m 2 ˜ ˜ λmin (A; D(f , J(E), δ)) = inf |tr(Y J(Ci ))| , ˜ Y ∈E(E)

i=1

(III.22)

where we define ˜ E(E) = {Y ∈ D(f˜, J(E), δ) | kY kF = 1}. (III.23) G.

E

where the union runs over all unitary processes E ∈ L(Cd×d , Cd×d ) that satisfy the non-spikiness conditions (III.7) and the last three constraints in ˜ (III.14). We then take the infimum over all Y ∈ E. We will then lower-bound this quantity, using Mendelson’s small-ball method (Theorem 8 in [24]): for any ξ > 0 and t ≥ 0, with probability at least 2 1 − e−2t , we have that 1/2 X m |tr(Y J(Ci ))|2 inf ˜ Y ∈E

˜ ξ (E) ˜ as follows. Fix some We will lower-bound Q ˜ We know that Y is in E(E), ˜ Y ∈ E. for some unitary process E such that J(E) is non-spiky in the sense of (III.7) and satisfies the last three constraints in (III.14). Hence we know that kY kF = 1, and there exists some τ > 0 such that J(E) + τ Y is non-spiky in the sense of (III.7) and satisfies the last three constraints in (III.14), and we have tr(J(E) + τ Y ) ≤ tr(J(E)),

i=1

√ ˜ 2ξ (E) ˜ − 2W ˜ m (E) ˜ − ξt, (III.25) ≥ ξ mQ

where we define ˜ ξ (E) ˜ = inf Q



Pr [|tr(Y J(C))| ≥ ξ] , (III.26) ˜ C∼G " # ˜ , sup tr(Y H) E

˜ Y ∈E

˜ m (E) ˜ = W



ǫi ∼±1 ˜ C∼G

˜ Y ∈E

˜ = √1 with H m

m X i=1

εi J(Ci ),

(III.27)

kτ Y kF ≥ δ. (III.28)

Note that we have kτ Y kF = τ ≥ δ. We will lower-bound Pr[|tr(Y J(C))| ≥ ξ], for any ξ ∈ [0, 1], using the Paley-Zygmund inequality, and appropriate bounds on the second and fourth moments of tr(Y J(C)). To simplify the notation, let us define a random variable S˜ ≡ τ |tr(Y J(C))|. 1.

Mendelson’s small-ball method

We will use Mendelson’s small-ball method to lower-bound λmin . We will prove a uniform lowerbound that holds for all E simultaneously. We define [ ˜= ˜ E E(E), (III.24)

Lower-bounding Qξ (E)

H.

Lower-bounding E(S˜2 )

We will first put S˜ into a form that is easier to work with. We can write Y as the ChoiJamiolkowski matrix J(Y) of some linear map Y ∈ L(Cd×d , Cd×d ). Using the relationship between the trace of Choi and Liouville representations (see Appendix A), we have S˜ =τ | tr(J(C1 )J(Y))|

=(τ /d2 )| tr((C1L )† Y L )|

=(τ /d2 )| tr((C1† ⊗ C1T )Y L )|.

(III.29)

We can calculate E(S˜2 ) by using the fact that C1 ∈ G is chosen from a unitary 2-design: Z  τ2 tr (U ⊗ U ∗ ⊗ U ⊗ U ∗ )(Y L ⊗ Y L ) dU E(S˜2 ) = 4 d Haar Z  τ 2 ∗ ∗ L L ≥ 4 tr (U ⊗ U ⊗ U ⊗ U )dU (Y ⊗ Y ) . d Haar (III.30) Now using Weingarten functions [7, 8, 33], we have that Z (U ⊗U ⊗ U † ⊗ U † )dU Haar

=

P3421 + P4312 P3412 + P4321 − 2 d −1 d(d2 − 1) (III.31)

11 where

Then note that

Pσ(1),σ(2),...σ(t) X |j1 ihjσ(1) | ⊗ |j2 ihjσ(2) | ⊗ · · · ⊗ |jt ihjσ(t) | = j1 ,j2 ,...,jt

(III.32)

P1324 T2

Z





(U ⊗ U ⊗ U ⊗ U )dU P1324 Z (U ⊗ U ∗ ⊗ U ⊗ U ∗ )dU. =

Haar

is a permutation of the registers. Let T2 (·) transpose the second half of the registers of a matrix That is, if X X= xijkl |iihj| ⊗ |kihl|, (III.33)



Haar

(III.35)

ijkl

then T2 (X) =

X ijkl

xijkl |iihj| ⊗ |lihk|.

E(S˜2 ) ≥

(III.34)

Combining with Eq. (III.30) and Eq. (III.31), we have

    τ 2 P3412 + P4321 P3421 + P4312 L L P (Y ⊗ Y tr P T − 1324 1324 2 . 4 2 2 d d −1 d(d − 1)

Because addition commutes with permutation and transposition and trace, we need to calculate  tr P1324 T2 (Pσ )P1324 (Y L ⊗ Y L )

(III.36)

(III.37)

for σ = {3412, 4312, 3421, 4321}. Letting σ = (abcd), we have  tr P1324 T2 (Pabcd )P1324 (Y L ⊗ Y L )     X = tr P1324 T2  |j1 ihja | ⊗ |j2 ihjb | ⊗ |j3 ihjc | ⊗ |j4 ihjd | P1324 (Y L ⊗ Y L ) j1 ,j2 ,j3 ,j4



= tr P1324 

= tr  =

X

j1 ,j2 ,j3 ,j4

X

j1 ,j2 ,j3 ,j4

X

j1 ,j2 ,j3 ,j4



|j1 ihja | ⊗ |j2 ihjb | ⊗ |jc ihj3 | ⊗ |jd ihj4 |P1324 (Y L ⊗ Y L ) 

|j1 ihja | ⊗ |jc ihj3 | ⊗ |j2 ihjb | ⊗ |jd ihj4 |(Y L ⊗ Y L )

hja j3 |Y L |j1 jc ihjb j4 |Y L |j2 jd i.

Because of the unitality constraint in (III.14), we have X

i1 ,i2 ,i3

hi2 i3 |Y L |i1 i1 i = δi2 ,i3 hi2 i3 |Y L |i1 i1 i

(III.38)

(III.14), we have X hi1 i1 |Y L |i2 i3 i = δi2 ,i3 hi1 i1 |Y L |i2 i3 i i1 ,i2 ,i3

= d tr(J(Y)).

(III.40)

(III.39)

We now go through the four permutations of Eq. (III.36):

and because of the trace-preservation constraint in

• P3412 . In this case, a = 3, b = 4, c = 1, and

= d tr(J(Y)),

12 d = 2. Plugging into Eq. (III.38) and using Eqs. (III.40) and (III.39), we have  tr P1324 T2 (P3412 )P1324 (Y L ⊗ Y L ) = d2 tr(J(Y))2 . (III.41) • P4321. In this case, a = 4, b = 3, c = 2, and d = 1. Plugging into Eq. (III.38), we have  tr P1324 T2 (P4321 )P1324 (Y L ⊗ Y L ) X hj4 j3 |Y L |j1 j2 ihj3 j4 |Y L |j2 j1 i = j1 ,j2 ,j3 ,j4

=

X

j1 ,j2 ,j3 ,j4

=

X

j1 ,j2 ,j3 ,j4

hj4 j3 |Y L |j1 j2 ihj4 j3 |(Y L )∗ |j1 j2 i |hj4 j3 |Y L |j1 j2 i|2

=kY L k2F

=d2 kJ(Y)k2F ,

of J(E) and J(E) + τ Y : S˜ = − tr(J(E)J(C1 )) + tr((J(E) + τ Y )J(C1 )) β ≤ 2. d (III.46) This implies an upper-bound on E(S˜4 ):

Pr[| tr(Y J(C))| ≥ ξ] = Pr[S˜2 ≥ ξ 2 τ 2 ]

≥ Pr[S˜2 ≥ d4 ξ 2 E(S˜2 )] E(S˜2 )2 ≥ (1 − d4 ξ 2 )2 E(S˜4 )

where we used Eq. (A.3).

• P4312. In this case, a = 4, b = 3, c = 1, and d = 2. Plugging into Eq. (III.38) and using Eqs. (III.40) and (III.39), we have  tr P1324 T2 (P3412 )P1324 (Y L ⊗ Y L ) = d2 tr(J(Y))2 . (III.44) Putting it all together, we have τ 2 d2 tr(J(Y))2 + kJ(Y)k2F 2 tr(J(Y))2 E(S˜2 ) ≥ 4 − d d2 − 1 d(d2 − 1) τ2 τ2 ≥ 4 (1 − 2d ) tr(J(Y))2 + 4 kJ(Y)k2F d d 2 τ τ2 τ2 ≥ 0 + 4 kJ(Y)k2F = 4 kY k2F = 4 . d d d (III.45)

2.

Upper-bounding E(S˜4 )

To handle E(S˜4 ), we need to use a different argument from that in [24], since we are sampling from a unitary 2-design, not a 4-design. Our solution is to ˜ using the non-spikiness properties upper-bound S,

(III.47)

Putting it all together, and using the PaleyZygmund inequality, for any ξ ∈ [0, 1], we have:

(III.42)

• P3421. In this case, a = 3, b = 4, c = 2, and d = 1. Plugging into Eq. (III.38) and using Eqs. (III.40) and (III.39), we have  tr P1324 T2 (P3412 )P1324 (Y L ⊗ Y L ) = d2 tr(J(Y))2 . (III.43)

β 2 ˜2 E(S ). d4

E(S˜4 ) ≤

≥ (1 − d4 ξ 2 )2

2 τ 2 d4 4 2 2τ = (1 − d ξ ) . d4 β 2 β2 (III.48)

Thus, using the fact that τ ≥ δ, we have that 2

˜ ξ (E) ˜ ≥ (1 − d4 ξ 2 )2 τ . Q β2

(III.49)

˜ m (E) Upper-bounding W

I.

In this section, we will bound m

˜ m (E) = E sup tr(Y H), ˜ W ˜ Y ∈E

X ˜ = √1 H εi J(Ci ), m i=1

(III.50) where the εi are Rademacher random variables, and the expectation is taken both over the εi and the choice of the unitaries Ci . For this bound, we will only need the fact that the Ci are chosen from a unitary 1-design, rather than a unitary 2-design. We start by following the argument in [24], where it is shown that √ ˜ ˜ ≤ E2 rkHk (III.51) E sup tr(Y H) Y ∈Fr

where Fr =

[

Xtrue

F (Xtrue ),

(III.52)

13 and the union is over all Xtrue ∈ Cd×d Herm with rank at most r, and F (Xtrue ) = {Y ∈ D(k·k∗ , Xtrue , 0) | kY kF = 1}. (III.53) This can be compared with our definition of the set ˜ in (III.24) and (III.23). E ˜ ˜ = S E(E), where In our case, we have E E the union is over all processes E whose Choid2 ×d2 Jamiolkowski state J(E) ∈ CHerm has rank at most ˜ 1. E(E) is defined similarly to F (Xtrue ), but using the function f from (III.20) rather than the trace norm. While f involves the trace tr(·) instead of the trace norm k·k∗ , because we are considering only positive semidefinite matrices, it can be replaced by the trace norm. Hence E ⊂ F1 , and so ˜ ≤ E sup tr(Y H) ˜ ≤ 2EkHk. ˜ E sup tr(Y H) ˜ Y ∈E

Y ∈F1

(III.54)

Now we analyze

m

1 X

Eε,C √ εi J(Ci ) ,

m

i=1

(III.55)

using similar tools to what is used in [24]. Since the ǫj ’s form a Rademacher sequence and J(Ci ) are Hermitian, we can apply the non-commutative Khintchine inequality [13, 38]:

m

1 X

εi J(Ci ) Eε,C √

m i=1

1/2

m

X p

2 2 ≤ EC 2 ln(2d )/m J(Ci )

i=1

!1/2

m

X p

≤ 2 ln(2d2 )/m EC J(Ci )

i=1

(III.56)

where we’ve used Jensen’s inequality in the second line, and used the fact that J(Ci ) is a rank one projector with trace 1, so J(Ci )2 = J(Ci ). We now apply the matrixPChernoff inequality of m Theorem 15 in [24] to EC k i=1 J(Ci )k. To apply the theorem, we need to bound

m

X

J(Ci ) and kJ(Ci )k. (III.57)

EC

i=1

Since J(Ci ) corresponds to a quantum state, its maximum eigenvalue is 1, so kJ(Ci )k = 1. Next, notice m X J(Ci ) = mEC J(Ci ). (III.58) EC i=1

Because we’ve assume Ci are drawn from a unitary 2-design, EC J(Ci ) is the state that results when a

depolarizing channel is applied to one half of a maximally entangled state. (Randomly applying operations from a unitary 1-design results in the depolarizing channel.) The resulting state is the maximally mixed state. Thus

m

X m

J(Ci ) = 2 . (III.59)

EC

d

i=1 Then the Matrix Chernoff Inequality (Theorem 15 in [24]) tells us that for any γ > 0,

m

X

(eγ − 1)m + d2 log d2

. (III.60) E J(Ci )

≤ d2 γ

j=1

Minimizing γ gives



m

X 2

J(C ) E i ≤ c4 m/d

j=1

(III.61)

for some numerical constant c4 > 0. Plugging this expression into Eq. (III.56), we obtain the desired bound: √ ˜ m (E) ˜ ≤ c5 ln d · 1 , (III.62) W d where c5 > 0 is some numerical constant. J.

Final result

Combining equations (III.22), (III.25), (III.49) and (III.62), and setting ξ = 1/(4d2 ), we get that: ˜ D(f˜, J(E), δ)) inf λmin (A; E

√ 1 1 √ 9δ 2 ln d − 2c5 − 2t ≥ 2 m 2 4d 16β d 4d √ √ ln d ≥ ( c0 − 2c5 − 14 ) , d

(III.63)

where we set m≥



64β 2 9δ 2

2

· c0 d2 ln d,

(III.64)

√ and t = d ln d. This can be plugged into our approximate recovery bound (III.21). This finishes the proof of Theorem III.2.  Acknowledgements

The authors thank Richard Kueng and David Gross for helpful discussions about this problem. Contributions to this work by NIST, an agency of the US government, are not subject to US copyright.

14

[1] Charles H. Baldwin, Amir Kalev, and Ivan H. Deutsch. Quantum process tomography of unitary and near-unitary maps. Phys. Rev. A, 90:012110, Jul 2014. [2] Emmanuel J. Candes and Xiaodong Li. Solving quadratic equations via phaselift when there are about as many equations as unknowns. Found. Comput. Math., 14(5):1017–1026, 2014. [3] Emmanuel J Candes and Yaniv Plan. A probabilistic and ripless theory of compressed sensing. Information Theory, IEEE Transactions on, 57(11):7235–7254, 2011. [4] Emmanuel J. Candes and Benjamin Recht. Exact matrix completion via convex optimization. Found. Comput. Math., 9(6):717–772, 2009. [5] Emmanuel J. Candes, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Commum. Pure and Applied Math., 66(8):1241–1274, 2013. [6] Richard Cleve, Debbie Leung, Li Liu, and Chunhao Wang. Near-linear constructions of exact unitary 2-designs. arXiv preprint arXiv:1501.04592, 2015. [7] Benoit Collins. Moments and cumulants of polynomial random variables on unitary groups, the itzykson-zuber integral and free probability. Int. Math. Res. Notes, 17, 2003. [8] Benoit Collins and Piotr Sniady. Integration with respect to the haar measure on unitary, orthogonal and symplectic group. Comm. Math. Phys., 264, 2006. [9] Christoph Dankert, Richard Cleve, Joseph Emerson, and Etera Livine. Exact and approximate unitary 2-designs and their application to fidelity estimation. Physical Review A, 80(1):012304, 2009. [10] David P DiVincenzo, Debbie W Leung, and Barbara M Terhal. Quantum data hiding. Information Theory, IEEE Transactions on, 48(3):580–598, 2002. [11] Marcus Cramer et al. Efficient quantum state tomography. Nature Commun., 1(149), 2010. [12] Steven T Flammia, David Gross, Yi-Kai Liu, and Jens Eisert. Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators. New Journal of Physics, 14(9):095022, 2012. [13] Simon Foucart and Holger Rauhut. A mathematical introduction to compressive sensing. Springer, 2013. [14] Daniel Gottesman. Stabilizer codes and quantum error correction. arXiv preprint quant-ph/9705052, 1997. [15] Daniel Greenbaum. Introduction to quantum gate set tomography. ArXiv:1509.02921, 2015. [16] David Gross. Recovering low-rank matrices from few coefficients in any basis. Information Theory, IEEE Transactions on, 57(3):1548–1566, March 2011. [17] David Gross, Felix Krahmer, and Richard Kueng. A partial derandomization of phaselift using spher-

[18]

[19]

[20]

[21]

[22]

[23]

[24] [25]

[26]

[27]

[28]

[29] [30]

[31]

ical designs. J. Fourier Analysis and Applications, 21(2):229–266, 2015. David Gross, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Phys. Rev. Lett., 105:150401, Oct 2010. Blake R Johnson, Marcus P da Silva, Colm A Ryan, Shelby Kimmel, Jerry M Chow, and Thomas A Ohki. Demonstration of robust quantum gate tomography via randomized benchmarking. arXiv preprint arXiv:1505.06686, 2015. Amir Kalev, Robert L. Kosut, and Ivan H. Deutsch. Informationally complete measurements from compressed sensing methodology. ArXiv:1502.00536, 2015. Shelby Kimmel, Marcus P. da Silva, Colm A. Ryan, Blake R. Johnson, and Thomas Ohki. Robust extraction of tomographic information via randomized benchmarking. Phys. Rev. X, 4:011050, Mar 2014. E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland. Randomized benchmarking of quantum gates. Phys. Rev. A, 77:012307, Jan 2008. Richard Kueng and David Gross. Qubit stabilizer states are complex projective 3-designs. arXiv:1510.02767, 2015. Richard Kueng, Holger Rauhut, and Ulrich Terstiege. Low rank matrix recovery from rank one measurements. CoRR, abs/1410.6913, 2014. Yi-Kai Liu. Universal low-rank matrix recovery from pauli measurements. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, pages 1638–1646, 2011. Richard A. Low. Large deviation bounds for kdesigns. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 465(2111):3289–3308, 2009. Easwar Magesan, J. M. Gambetta, and Joseph Emerson. Scalable and robust randomized benchmarking of quantum processes. Phys. Rev. Lett., 106:180504, May 2011. Easwar Magesan, Jay M. Gambetta, B. R. Johnson, Colm A. Ryan, Jerry M. Chow, Seth T. Merkel, Marcus P. da Silva, George A. Keefe, Mary B. Rothwell, Thomas A. Ohki, Mark B. Ketchen, and M. Steffen. Efficient measurement of quantum gate error by interleaved randomized benchmarking. Phys. Rev. Lett., 109:080505, Aug 2012. Jiri Matousek. Lectures on Discrete Geometry. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2002. Seth T. Merkel, Jay M. Gambetta, John A. Smolin, Stefano Poletto, Antonio D. C´ orcoles, Blake R. Johnson, Colm A. Ryan, and Matthias Steffen. Selfconsistent quantum process tomography. Phys. Rev. A, 87:062119, Jun 2013. Sahand Negahban and Martin J. Wainwright. Re-

15

[32]

[33]

[34]

[35] [36]

[37] [38] [39] [40] [41]

stricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res., 13:1665–1697, May 2012. Sandu Popescu, Anthony J Short, and Andreas Winter. Entanglement and the foundations of statistical mechanics. Nature Physics, 2(11):754–758, 2006. Andrew James Scott. Optimizing quantum process tomography with unitary 2-designs. Journal of Physics A: Mathematical and Theoretical, 41(5):055308, 2008. Alireza Shabani, Masoud Mohseni, Seth Lloyd, Robert L. Kosut, and Herschel Rabitz. Estimation of many-body quantum hamiltonians via compressive sensing. Phys. Rev. A, 84:012107, Jul 2011. Alireza et al Shabani. Efficient measurement of quantum dynamics via compressive sensing. Phys. Rev. Lett., 106:100401, Mar 2011. Cyril Stark. Self-consistent tomography of the state-measurement gram matrix. Phys. Rev. A, 89:052109, May 2014. Joel A. Tropp. Convex recovery of a structured signal from independent random linear measurements. CoRR, abs/1405.1102, 2014. R. Vershynin. Compressed sensing: theory and applications. Cambridge University Press, 2012. Zak Webb. The clifford group forms a unitary 3design. arXiv:1510.02769, 2015. Huangjun Zhu. Multiqubit clifford groups are unitary 3-designs. arXiv:1510.02619, 2015. This implies that stabilizer states are spherical 3designs, as was also proven independently in [23].

Appendix A: Representations of Quantum Operations

Given a completely positive and trace preserving quantum operation F : Cd×d → C d×d , there are several useful representation of F . In Eq. (III.4), we introduced the Choi-Jamiolkowski representation. Another representation that we use is the Liouville representation. We denote Liouville representation of F , as F L ∈ d2 ×d2 C as hkl|F L |iji = hk|F (|iihj|)|li.

(A.1)

where |ii, |ji, |li and |ki are any standard basis states. All representations are completely equivalent, and it is a simple exercise to convert between them. However, certain representations make it easier to check for properties like complete positivity. For example from the complete positivity constraint, we have that J(X ) is Hermitian, so hki|J(X )|lji = hlj|J(X )∗ |kii.

(A.2)

Converting this into Liouville representation, we have hkl|X L |iji = hlk|(X L )∗ |jii.

(A.3)

Using this fact, we can show that for completely positive superoperators F and K,

1 tr(F L (KL )† ). (A.4) d2 To see this, we calculate   1 X tr(J(F )J(K)) = 2 tr F (|iihj|)K(|jihi|) d ij tr(J(F )J(K)) =

=

1 X hk|F (|iihj|)|lihl|K(|jihi|)|ki d2 ijkl

1 X hkl|F L |ijihkl|(KL )∗ |iji = 2 d ijkl

1 = 2 tr(F L (KL )† ). (A.5) d Additionally, we note that for unitary maps U corresponding to a map E, the corresponding Liouville representation take the form E L = U ⊗ U ∗,

(A.6)

E(|iihj|) = U |iihj|U †

(A.7)

because

Using Eq. (A.4) and Eq. (A.6), we have that for a map E representing a unitary U and a map C representing a unitary C, that tr(J(E)J(C)) = (1/d2 )|tr(U † C)|2 .

(A.8)

Appendix B: Recovery Using Unitary 4-Designs

In this Appendix, we show that it is possible to recover unitary maps with PhaseLift using measurements taken from a unitary 4-design. In Section III, using a unitary 2-design, we could not calculate the 4th moment of S˜ directly, and instead had to bound it using the non-spikiness condition. In this section, because we have a unitary 4-design, we can calculate 4th moments more directly using properties of Haar random unitaries. (This is analogous to the approach in [24], where they show phase retrieval is possible with spherical 4-designs.) Because we can achieve better bounds on the 4th moment terms, we can achieve exact, rather than approximate recovery, and can recover all unitaries rather than just those that satisfy the non-spikiness condition. To calculate these 4th moment terms, we use commutative diagrams. This technique may be of broader interest.

16 1.

Convex Relaxation

Let E : Cd×d → Cd×d be an unknown unitary process that we want to learn. We will reconstruct d2 ×d2 the Choi- Jamiolkowski state J(E) ∈ CHerm . ˆ ⊂ Cd×d be a unitary 4-design. We We let G ˆ and choose C1 , . . . , Cm uniformly at random from G, † for each Ci , we let Ci : ρ 7→ Ci ρCi be the corresponding quantum process. We define the sampling operator Aˆ that describes our measurement procedure. We define Aˆ : d2 ×d2 CHerm → Rm , which acts on Choi-Jamiolkowski states ρ, as follows: h im ˆ A(ρ) = tr(ρJ(Ci )) i=1 (B.1) h im † + = hΦ |(Ci ⊗ I)ρ(Ci ⊗ I)|Φ+ i . i=1

Using this notation, our measurement procedure returns a vector ˆ y = A(J(E)) + ε,

kεk2 ≤ η/d2 ,

(B.2)

where we assume we know an upper-bound η/d2 on the size of the noise term ε. Note that this bound is analogous to equation (II.12) in Section II. In particular, η plays the same role in both equations, and the extra factor of 1/d2 in equation (B.2) is simply due to the different normalization of the measurement vectors (Ci ⊗ I)|Φ+ i. In order to reconstruct E, we will solve the following convex relaxation: arg

min 2

d ×d ρ∈CHerm

2

tr(ρ) such that

ˆ − yk2 ≤ η/d2 , kA(ρ) ρ  0, tr1 (ρ) = tr(ρ) I/d, tr2 (ρ) = tr(ρ) I/d.

(B.3)

These constraints are identical to those in Eq. (III.14), (see the discussion after Eq. (III.14) for more details) except we have removed the requirement of non-spikiness.

2.

Approximate recovery of non-spiky unitary processes

We show the following uniform recovery guarantee for the convex relaxation shown in Eq. (B.3): Theorem B.1. Consider the above scenario where m measurement matrices C1 , . . . , Cm are chosen inˆ⊂ dependently at random from a unitary 4-design G

Cd×d . Then there exists a numerical constants c5 > 0 such that the following holds. Suppose that c0 > (2c5 + 1)2 and the number of measurements satisfies m≥

2  32(4!)2 · c0 d2 ln d.

(B.4)

2

Then with probability at least 1 − e−2d ln d (over the choice of the Ci ), we have the following uniform recovery guarantee: For any unitary process E ∈ L(Cd×d , Cd×d ), any solution ρopt to the convex program (B.3) with noisy measurements (B.2) must satisfy: 2η √ kρopt − J(E)kF ≤ √ . ( c0 − 2c5 − 14 )d ln d

(B.5)

Here, we see that we achieve exact recovery: when η = 0, we can recover J(E) exactly. We will present this proof in several steps, following a similar strategy as in Section II.

3.

Descent cone

For the case of unitary 4-designs, we use a similar descent cone to the one used in [24, 37] - that is, d2 ⊗d2 for f : CHerm → R ∪ {∞} a proper convex funcd2 ⊗d2 tion, and Xtrue ∈ CHerm , we use the descent cone D(f, Xtrue , 0). This is the set of all directions, originating at the point Xtrue , that cause the value of f to decrease. d2 ×d2 We define the function fˆ : CHerm → R ∪ {∞} as follows:   tr(ρ) if ρ satisfies the fˆ(ρ) = last three constraints in (B.3),  ∞ otherwise. (B.6) By Prop. 7 in [24], our recovery bound is: kρopt − J(E)kF ≤

2η/d2 , ˆ D(fˆ, J(E), 0)) λmin (A;

(B.7)

and we want to lower-bound the quantity λmin : ˆ D(fˆ, J(E), 0)) = λmin (A;

inf

X 1/2 m |tr(Y J(Ci ))|2 ,

ˆ Y ∈E(E) i=1

(B.8)

where we define ˆ E(E) = {Y ∈ D(fˆ, J(E), 0) | kY kF = 1}.

(B.9)

17 4.

a.

Mendelson’s small-ball method

We will use Mendelson’s small-ball method to lower-bound λmin . We will prove a uniform lowerbound that holds for all E simultaneously. We define ˆ= E

[

ˆ E(E),

(B.10)

E

where the union runs over all unitary processes E ∈ L(Cd×d , Cd×d ) such that J(E) satisfies the last three constraints in (B.3). We then take the infimum over ˆ all Y ∈ E.

Because we are working with a unitary 4-design, and Sˆ2 only depends on the second moment of the distribution, the bound will be the same as for a unitary 2-design, and we can use our bound from Section III H 1, Eq. (III.45): 1 1 (1 − d2 ) tr(J(Y))2 + 4 kJ(Y)k2F d4 d  1 ≥ 4 max tr(J(Y))2 , kJ(Y)k2F , (B.15) 2d

E(Sˆ2 ) ≥

but also (using Eq. (III.45))

We will then lower-bound this quantity, using Mendelson’s small-ball method (Theorem 8 in [24]): for any ξ > 0 and t ≥ 0, with probability at least 2 1 − e−2t , we have that inf

X m

|tr(Y J(Ci ))|2

Lower-bounding E(Sˆ2 )

E(Sˆ2 ) ≥ b.

1 . d4

(B.16)

Upper-bounding E(Sˆ4 )

1/2

Now we would like to bound E[Sˆ4 ]. Because we are considering a unitary 4-design, which has the same √ ˆ 2ξ (E) ˆ − 2W ˆ m (E) ˆ − ξt, (B.11) ≥ ξ mQ fourth moments as the Haar measure on unitaries, instead of taking the average E[Sˆ4 ] over the 4-design, where we define we will take the average over Haar random unitaries. We have   Z ˆ ˆ Pr [|tr(Y J(C))| ≥ ξ] , (B.12) Qξ (E) = inf  1 ˆ ˆ C∼G Y ∈E dU 8 | tr U ⊗ U ∗ Y L |4 E[Sˆ4 ] = " # d Haar m Z 1 X 1 ˆ ˆ ˆ ˆ sup tr(Y H) , H = √ Wm (E) = E εi J(Ci ), = 8 dU ǫi ∼±1 m i=1 ˆ Y ∈E d Haar ˆ Ci ∼G 2  (B.13) tr U ⊗ U ∗ Y L tr U ∗ ⊗ U (Y L )∗ Z 1 dU = 8 where ε1 , . . . , εm are Rademacher random variables. d Haar  ⊗2 L ⊗2  Y ⊗ (Y L )∗ tr U ⊗ (U ∗ )⊗2 ⊗ U ˆ Y ∈E i=1

5.

(B.17)

ˆ ξ (E) ˆ Lower-bounding Q

We will lower-bound PrC∼Gˆ [|tr(Y J(C))| ≥ ξ], for any ξ ∈ [0, 1], using the Paley-Zygmund inequality, and appropriate bounds on the second and fourth moments of | tr(Y J(C))|. To simplify the notation, let us define a random variable Sˆ ≡ |tr(Y J(C))|. We will first put Sˆ into a form that is easier to work with. We can write Y as the ChoiJamiolkowski matrix J(Y) of some linear map Y ∈ L(Cd×d , Cd×d ). Using the relationship between the trace of Choi and Liouville representations (see Appendix A), we have Sˆ =| tr(J(C)J(Y))| =(1/d2 )| tr((C L )† Y L )| 2



T

Using similar tricks as in the case of the second moment, we have that Z ⊗2 dU U ⊗ (U ∗ )⊗2 ⊗ U Haar Z  dU U ⊗4 ⊗ (U † )⊗4 P15842673 . = P15842673 T2 Haar

(B.18)

We will use P ∗ ≡ P15842673 Then we use Weingarten functions to obtain an expression for the integral on the right hand side of Eq. (B.18) in terms of permutation operators: Z X Wg (d, 4, στ −1 )Pσ,τ dU U ⊗4 ⊗ (U † )⊗4 = Haar

L

=(1/d )| tr((C ⊗ C )Y )|.

(B.14)

σ,τ ∈S4

(B.19)

18 where S4 is the symmetric group of 4 elements, and

The permutations Pσ,τ in the sum will be of the form Pwxyzabcd (where (wxyz) is a nonrepeating sequence of elements in the set {5, 6, 7, 8} and (abcd) is a non-repeating sequence of elements in the set {1, 2, 3, 4}. Each term in the above sum is therefore of the form:

Pσ,τ = Pτ (1)+4,τ (2)+4,τ (3)+4,τ (4)+4,σ−1(1),σ−1 (2),σ−1 (3),σ−1 (4) . (B.20) Combining Eqs, (B.17), (B.18), and (B.19), we have 1 X E[Sˆ4 ] = 8 Wg (d, 4, στ −1 )× d σ,τ ∈S4  ⊗2  . (B.21) tr P ∗ T2 (Pσ,τ )P ∗ Y L ⊗ (Y L )∗

 ⊗2  tr P ∗ T2 (Pwxyzabcd ) P ∗ Y L ⊗ (Y L )∗   X

 ∗   = tr  P T2 

j1 ,j2 ,j3 ,j4 j5 ,j6 ,j7 ,j8



 ∗ = tr  P 

 = tr  

X

j1 ,j2 ,j3 ,j4 j5 ,j6 ,j7 ,j8

X

j1 ,j2 ,j3 ,j4 j5 ,j6 ,j7 ,j8

=

|j1 , j2 , j3 , j4 , ja , jb , jc , jd ihjw , jx , jy , jz , j5 , j6 , j7 , j8 |P ∗ Y L ⊗ (Y L )∗

|j1 , ja , jd , j4 , j2 , jb , jc , j3 ihjw , j5 , j8 , jz , jx , j6 , j7 , jy | Y L ⊗ (Y L )∗



X

hjw , j5 |Y L |j1 , ja ihjz , j8 |Y L |j4 , jd ihjx , j6 |Y L |j2 , jb ihjy , j7 |Y L |j3 , jc i

X

ˆ a , jw ihj4 , j8 |Y|j ˆ d , jz ihj2 , j6 |Y|j ˆ b , jx ihj3 , j7 |Y|j ˆ c , jy i hj1 , j5 |Y|j

j1 ,j2 ,j3 ,j4 j5 ,j6 ,j7 ,j8



⊗2   

⊗2   

hjw , j5 |Y L |j1 , ja ihj8 , jz |(Y L )∗ |jd , j4 ihjx , j6 |Y L |j2 , jb ihj7 , jy |(Y L )∗ |jc , j3 i

j1 ,j2 ,j3 ,j4 j5 ,j6 ,j7 ,j8

=

 ∗ L   L ∗ ⊗2  Y ⊗ (Y ) |j1 , j2 , j3 , j4 , j5 , j6 , j7 , j8 ihjw , jx , jy , jz , ja , jb , jc , jd | P  

X

j1 ,j2 ,j3 ,j4 j5 ,j6 ,j7 ,j8

=





(B.22)

where in the second to last line we have used Eq. (A.3), and in the last line, we have reordered the elements of Y L as ˆ u , jr i. hjr , js |Y L |jt , ju i = hjt , js |Y|j

(B.23)

Now we will use a graphical representation of the matrices Yˆ in order to evaluate these terms: t s

t s



u r



m n

=



u r

ˆ u , jr i hjt , js |Y|j

=

ˆ m , jn i hjt , js |Yˆ Y|j

=

P

ˆ

(B.24)

ˆ

u,r hjt , js |Y|ju , jr ihju , jr |Y|jm , jn i

(B.25)

19

s



=

P

t

ˆ

js ,jt hjt , js |Y|jt , js i

ˆ = tr(Y) (B.26)

Furthermore, because of Eq. (III.39) and Eq. (III.40), s r



u

=

r



s

u

t

=



=

d tr(J(Y))

t

We see that a single self-loop on either register forces the other register to also have a self loop. Because of this simplifying effect, we can enumerate all possible configurations of commutative diagrams that arise from choices of (wxyzabcd). We have

(B.27)

IV :

I: Yˆ





















V:



II :

(B.28)

VI : Yˆ







III :









(B.29)

VII : Yˆ















(B.30)

20 We ignore figures that are identical to figures that are depicted, but which have one or more loops reversed in direction, or that have dashed and solid arrows switched. By reordering the elements of the matrix, as we did in Eq. (B.23), we can create a new figure which looks identical to ones shown above, but involving a new matrix Yˆ ′ whose elements are ˆ In our analysis below, we will ulthe same as Y. timately see that our bounds on the contributions due to each figure will depend only on the Frobenius norm of Y L , which only depends on the elements of the matrix, and not on its ordering. (We also have contributions that depend on the tr(Y), but these terms only come from self-loops about a single element, for which reordering does not produce a new term.) For a square matrix A and integer i > 1, we will use the following bound:

Let K be the d × d matrix such that s 2 X ˆ b , jd i . hja |K|jb i = hja , jc |Y|j

Then the contribution of the diagram is bounded by X ˆ χ , jα ihjχ , jα |Y|j ˆ ξ , jβ i hjφ , jβ |Y|j

jα ,jβ ,jγ ,jδ jξ ,jχ ,jφ ,jψ

ˆ ψ , jγ ihjψ , jγ |Y|j ˆ φ , jδ i × hjξ , jδ |Y|j v 2 Xu uX  ˆ χ , jα i t hjφ , jβ |Y|j ≤ jξ ,jχ jφ ,jψ

jα ,jβ

v uX  2 u ˆ ξ , jβ i hjχ , jα |Y|j ×t jα ,jβ

tr(Ai ) = tr(Ai−1 A)

v uX  2 u ˆ ψ , jγ i ×t hjξ , jδ |Y|j

= vec((Ai−1 )T )T vec(A) ≤ k(Ai−1 )T kF kAkF



kAkiF

jγ ,jδ

(B.31)

where we have used Cauchy-Schwarz, the submultiplicative property of the Frobenius norm, and the fact that kAT kF = kAkF . We now bound the contribution due to each diagram. ˆ 4 = I: We read off the contribution of tr(Y) d4 tr(J(Y))4 . II: We have a contribution of ˆ 2 d2 tr(J(Y))2 tr(Yˆ2 ) ≤d2 tr(J(Y))2 kYk F

=d4 tr(J(Y))2 kJ(Y)k2F . (B.32)

III: We have a contribution of d tr(J(Y)) tr(Yˆ 3 ) ≤d tr(J(Y))kYˆ2 k3F

=d4 tr(J(Y))kJ(Y)k3F .

v uX  2 u ˆ φ , jδ i hjψ , jγ |Y|j ×t jγ ,jδ



X

jξ ,jχ jφ ,jψ

hjφ |K|jχ ihjχ |K|jξ ihjξ |K|jψ ihjψ |K|jφ i = tr(K 4 ) ≤ kKk4F  2 2 X X ˆ b , jd i  hja , jc |Y|j = ja ,jb jc ,jd

ˆ 4 = kYk F

= d4 kJ(Y)k4F . (B.33)

tr(Fˆ 4 ) ≤ d4 kJ(F )k4F .

χ

(B.34)



V: We reprint the diagram here, but with labels: α

V:

χ

α

Yˆ β



ξ



β

ψ

φ

ψ

γ

δ

Yˆ (B.38)

(B.35) γ

δ



ξ



φ

(B.37)

VI: We reprint the diagram here, but with labels: VI :

IV: We have a contribution of



(B.36)

jc ,jd

Let K ′ be the d × d matrix such that s 2 X ′ ˆ b , jd i . hja , jc |Y|j (B.39) hja |K |jb i = jc ,jd

21 Then the contribution of the diagram is bounded by X ˆ χ , jα ihjφ , jα |Y|j ˆ ψ , jβ i hjξ , jβ |Y|j

jα ,jβ ,jγ ,jδ jξ ,jχ ,jφ ,jψ

ˆ φ , jγ ihjχ , jγ |Y|j ˆ ξ , jδ i × hjψ , jδ |Y|j v 2 Xu uX  ˆ χ , jα i t ≤ hjξ , jβ |Y|j jξ ,jχ jφ ,jψ

jα ,jβ

v uX  2 u ˆ ψ , jβ i hjφ , jα |Y|j ×t

for d > 3. There are (4!)2 total terms in the sum. Putting it all together, we have E[Sˆ4 ] ≤

2(4!)2 max{tr(J(Y))4 , kJ(Y)k4F } (B.44) d8

for d > 3. Now we combine Eqs. (B.15), (B.16), and (B.44), and the Payley-Zygmund inquality to obtain h i h i P Sˆ ≥ ξ ≥ P Sˆ2 ≥ (ξd2 )2 E[Sˆ2 ] ≥

jα ,jβ



v uX  2 u ˆ φ , jγ i ×t hjψ , jδ |Y|j



jγ ,jδ

v uX  2 u ˆ ξ , jδ i hjχ , jγ |Y|j ×t



jξ ,jχ jφ ,jψ

hjξ |K ′ |jχ ihjχ |K ′ |jξ ihjφ |K ′ |jψ ihjψ |K ′ |jφ i 6.

= tr(K ′2 )2 ≤ kK ′ k4F  2 2 X X ˆ b , jd i  = hja , jc |Y|j =

= d kJ(Y)k4F .

4

d max{tr(J(Y))

, kJ(Y)k4F }.

(B.46)

ˆ m (E) ˆ Upper-bounding W

ˆ Y ∈E

(B.41)

Looking at all possible diagrams, we see that the total contribution of any diagram is less than 4

2 2 2 ˆ ξ (E) ˆ ≥ (1 − (ξd ) ) . Q 8(4!)2

m

VII: We have a contribution of

≤d4 kJ(Y)k4F .

(B.45)

ˆ m (E) ˆ = E sup tr(Y H), ˆ W

(B.40)

ˆ 4F tr(Yˆ 2 ) tr(Yˆ2 ) ≤kYk

(1 − (ξd ) ) . 8(4!)2

In this section, we will bound

ja ,jb jc ,jd

ˆ 4 kYk F 4

(1−(ξd2 )2 )2 max{tr(J(F ))4 , kJ(F )k4F } 4d8 2(4!)2 4 4 d8 max{tr(J(F )) , kJ(F )kF } 2 2 2

Thus

jγ ,jδ

X

(1 − (ξd2 )2 )2 (ESˆ2 )2 ESˆ4

X ˆ = √1 εi J(Ci ), H m i=1 (B.47)

where the εi are Rademacher random variables, and the expectation is taken both over the εi and the choice of the unitaries Ci in the unitary 4-design. Because a unitary 4-design is also a unitary 2-design, a nearly identical argument to that used in Sec. III I holds, and we have √ ˆ m (E) ˆ ≤ c5 ln d · 1 , W d

(B.48)

where c5 > 0 is some numerical constant.

(B.42)

Hence this bounds the size of any term in Eq. (B.21). Each term in Eq. (B.21) is multiplied by a Weingarten function Wg (d, 4, στ −1 ). The largest Weingarten term is Wg (d, 4, (1234)) = d4 − 8d2 + 6 (d + 3)(d + 2)(d + 1)(d − 1)(d − 2)(d − 3)d2 2 (B.43) < 4 d

7.

Final result

Combining equations (B.8), (B.11), (B.46) and (B.48), and setting ξ = 1/(2d2 ), we get that: ˆ D(fˆ, J(E), δ)) inf λmin (A; E

√ 1 1 √ 1 ln d − 2 t (B.49) m − 2c 5 2d2 42 (4!)2 d 4d √ √ ln d 1 ≥ ( c0 − 2c5 − 4 ) , d



22 where we set 2  2 m ≥ 32(4!) · c0 d2 ln d,

(B.50)

√ and t = d ln d. This can be plugged into our approximate recovery bound (B.7). This finishes the proof of Theorem B.1.