Rates of Convergence for Quicksort - Semantic Scholar

Report 4 Downloads 124 Views
Rates of Convergence for Quicksort Ralph Neininger1 School of Computer Science McGill University 3480 University Street Montreal, H3A 2K6 Canada

Ludger Ruschendorf Institut fur Mathematische Stochastik Universitat Freiburg Eckerstr. 1 79104 Freiburg Germany August 13, 2001

Abstract

The normalized number of key comparisons needed to sort a list of randomly permuted items by the Quicksort algorithm is known to converge in distribution. We identify the rate of convergence to be of the order (ln(n)=n) in the Zolotarev metric. This implies several ln(n)=n estimates for other distances and local approximation results as for characteristic functions, for density approximation and for the integrated distance of the distribution functions.

AMS subject classi cations. Primary: 60F05, 68Q25; secondary: 68P10. Key words. Quicksort, analysis of algorithms, rate of convergence, Zolotarev metric, local approx-

imation, contraction method.

1 Introduction and main result The distribution of the number of key comparisons Xn of the Quicksort algorithm needed to sort an array of n randomly permuted items is known to converge after normalization in distribution for n ! 1; see Regnier [8], Rosler [9]. Recently, the problem of the identi cation of the exact rate of this convergence was posed more insistently. Some estimates for the rate are obtained by Fill and Janson [3], who roughly speaking get upper estimates O(n?1=2 ) for the convergence in the minimal Lp -metrics `p , p > 1, and O(n?1=2+" ) for the Kolmogorov metric for all " > 0 as well as the lower estimates (ln(n)=n) for the `p metrics, p > 1, and (1=n) for the Kolmogorov metric. After presenting their results at \The Seventh Seminar on Analysis of Algorithms" on Tatihou in July 2001 some indication was given at the meeting that (ln(n)=n) might be the right order of the rate of convergence for many metrics of interest. In this note we con rm this conjecture for the Zolotarev metric 3. Since 3 serves as an upper bound for several other distance measures this implies ln(n)=n bounds as well for some local metrics, for characteristic functions, and for weighted global metrics. For the proof we use a form of the contraction method as developed in Rachev and Ruschendorf [7] and Cramer and Ruschendorf [1]. We establish explicit estimates to identify the rate of convergence. 1

Research supported by NSERC grant A3450 and the Deutsche Forschungsgemeinschaft.

1

The paper is organized as follows: In this section we recall some known properties of the sequence (Xn ), introduce the Zolotarev metric 3, and state our main theorem, which is proved in section two. In the last section implications of the 3 convergence rate are drawn based on several inequalities between probability metrics. The sequence of the number of key comparisons (Xn) needed by the Quicksort algorithm to sort an array of n randomly permuted items satis es X0 = 0 and the recursion D Xn = XIn + Xn0 ?1?In + n ? 1; n  1; (1)

D denotes equality in distribution, (X ); (X 0 ); I are independent, I is uniformly distributed where = n n n n on f0; : : : ; n ? 1g and Xk  Xk0 , k  0, where  denotes as well equality of distributions. The mean and variance of Xn are exactly known and satisfy E Xn = 2n ln(n) + (2 ? 4)n + O(ln(n)); Var(Xn ) =  2 n2 ? 2n ln(n) + O(n); p where denotes Euler's constant and  := 7 ? 2 2=3 > 0. We introduce the normalized quantities Y0 := 0 and

:= Xn ?nE Xn ; n  1; which satisfy, see Regnier [8], Rosler [9], a limit law Yn ! Y in distribution for n ! 1. Rosler [9] showed that Y satis es the distributional xed-point equation D Y = U Y + (1 ? U )Y 0 + C (U ); (2) 0 0 where Y; Y ; U are independent, Y  Y , U is uniform [0; 1] distributed, and C (u) := 1 + 2u ln(u) + 2(1?u) ln(1?u), u 2 [0; 1]. Moreover this identity, subject to E Y = 0 and Var(Y ) < 1, characterizes Y , and convergence and niteness of the moment generating functions hold. We will use subsequently that Var(Y ) =  and kY k3 < 1, where kY kp := ( E jY jp )1=p, 1  p < 1, denotes the Lp -norm. The purpose of the present note is to estimate the rate of the convergence Yn ! Y . Our basic distance is the Zolotarev metric 3 given for distributions L(V ); L(W ) by 3(L(V ); L(W )) := sup j E f (V ) ? E f (W )j ; Yn

f

2F3

where F3 := ff 2 C 2(R; R) : jf 00 (x) ? f 00 (y )j  jx ? y jg is the space of all twice di erentiable functions with second derivative being Lipschitz continuous with Lipschitz constant 1. We will use the short notation 3 (V; W ) := 3 (L(V ); L(W )). It is well-known that convergence in 3 implies weak convergence and that 3(V; W ) < 1 if E V = E W , E V 2 = E W 2 , and kV k3 ; kW k3 < 1. The metric 3 is ideal of order 3, i.e., we have for T independent of (V; W ) and c 6= 0 3 (V + T; W + T )  3 (V; W ); 3 (cV; cW ) = jcj33 (V; W ): For general reference and properties of 3 we refer to Zolotarev [10] and Rachev [5]. Our main result states: Theorem 1.1 The number of key comparisons (Xn) needed by the Quicksort algorithm to sort an array of n randomly permuted items satis es !   Xn ? E Xn ln( n) 3 p ; (n ! 1); ;X =  n Var(Xn ) where X := Y = is a scaled version of the limiting distribution given in (2). For related results with respect to other distance measures see section 3. 2

2 The proof In the following lemma we state two simple bounds for the Zolotarev metric 3 , for which we do not claim originality. The upper bound involves the minimal L3 -metric `3 given by  kp : V  V; W  W g; p  1: `p (L(V ); L(W )) := `p(V; W ) := inf fkV ? W (3) Lemma 2.1 For V; W with identical rst and second moment and kV k3; kW k3 < 1 holds 1 E V 3 ? E W 3   (V; W )  1 ?kV k2 + kV k kW k + kW k2  ` (V; W ): 3 3 3 3 3 3 6 6 Proof: The left inequality follows from the fact that we have f 2 F3 for f (x) := x3=6; x 2 R. For the right inequality we use the estimate 3(V; W )  (1=6)3(V; W ), see Rachev [5, p. 268], where 3 denotes the third di erence pseudomoment, which has the representation   3 j : V  V; W   W : 3 (V; W ) = inf E jV 3 ? W With V 3 ? W 3 = V 2 + V W + W 2 V ? W and Holder's inequality we obtain  3 

V 2 + V W + W 2

3=2

V ? W

3 E V 3 ? W 

2 

2







 V 3 + V 3 W 3 + W 3 V ? W 3 : Taking the in ma we obtain the assertion.

Proof of Theorem 1.1: First we prove the easier lower bound, where only information on the moments of (Xn ) is needed. Throughout we use constants  (n)  0 de ned by  (n) := Var(Yn ) =  2

 

2

? 2 ln(n) + O 1 n

n

Lower bound: By Lemma 2.1 we have the basic estimate 3

Xn ? E X n p ;X Var(Xn )

!



 16 E



1 Y n  (n)

3

?E



(4)

:

1Y



3 :

The third moment of Yn satis es   1 1 1 ; 3 3 E Yn = 3 E (Xn ? E Xn ) = 3 {3 (Xn ) = M3 + O n n n with M3 = E Y 3 = 16 (3) ? 19 > 0, where we use the expansion of the third cumulant {3 (Xn ) of Xn given by Hennequin [4, p. 136]. From (4) we obtain   1 = 1 + 3 ln(n) + O 1 ;  3(n)  3  5 n n thus       1 E 1 Y 3 ? E 1 Y 3 = M3 ln(n) + O 1 ; 6  (n) n  2 5 n n which gives the lower estimate of the theorem. 3

Upper bound: The scaled variates Yn satisfy the modi ed recursion n ? 1 ? In 0 D In Yn = YIn + Yn?1?In + Cn (In ); n  1; n

(5)

n

where, as in (1), (Yn ); (Yn0 ); In are independent, Yk  Yk0 for all k  0 and 1 C (k) := ((k) + (n ? 1 ? k) ? (n) + n ? 1) ; n

with (n) :=

E Xn , n  0.

n

Furthermore, we de ne Z0 := Z00 := 0 and  (n) 0  (n) Y; Z 0 := Y ; n  1; Z := n

n





where Y; Y 0 are independent copies of the limit distribution also independent of In and de ne the accompanying sequence (Zn ) by Z0 := 0, n ? 1 ? In 0 D In Z  := Z + + C (I ); n  1: (6) Z n

n

In

??

n 1 In

n

n

n

Note that Yn ; Zn ; Zn have identical rst and second moment and nite third absolute moment for all n  0, thus 3-distances between these quantities are nite. We will show   ln( n)  (Y ; Z ) = O : (7) n

3

n

n

p

From this estimate the upper bound follows immediately since we have (Xn ? E Xn )= Var(Xn ) = Yn = (n), X  Zn = (n) and therefore Xn ? E Xn p ;X Var(Xn )

3

!



=  31(n) 3 (Yn ; Zn) = O ln(nn)



;

since ( (n)) has a nonzero limit. For the proof of (7) we obtain from the triangle inequality 3(Yn ; Zn )  3 (Yn ; Zn ) + 3 (Zn ; Zn ):

(8)

The rst summand satis es using (5),(6) and conditioning on In 3(Yn ; Zn)

  =

? 1

n X1 k =0 n X1

n



3

? 1

k =0 n X1

?

n

1



3

? 2 = n

k =1

k k Yk ; Zk n n

 3

k n

n

k =0  n X1

k n?1?k 0 n?1?k 0 k Yk + Yn?1?k + Cn (k); Zk + Zn?1?k + Cn (k) n n n n

k n

3



+ 3

3 (Yk ; Zk ) +





n?1?k 0 n?1?k 0 Yn?1?k ; Zn?1?k n n

n?1?k n

3 (Yk ; Zk ):

3





!

3 (Yn?1?k ; Zn?1?k )

(9) 4

We will show below that 3 (Zn ; Zn ) = O(ln(n)=n). Thus (noting that 3(Z1 ; Z1) = 0) there exists a constant c > 0 with ln(n) ; n  1:  (Z  ; Z )  c (10) 3

n

n

n

Then we prove (7) by induction using the constant c from (10): ln(n) ; n  1:  (Y ; Z )  3c n

3

n

(11)

n

Assertion (11) holds for n = 1. With (8),(9),(10) and the induction hypothesis we obtain n ?1 X 2 3 (Yn ; Zn )  n

 3

k n

k =1

 6c ln(nn)



3c ln(kk) + c ln(nn)

? k2

+ c ln(nn)

n X1 k =1

n

3

 ln(nn) 6c 13 + c = 3c ln(n) :



n

The proof is completed by showing (10): Since Y has a nite third absolute moment and ( (n)) is bounded, we obtain that the third absolute moments of (Zn ); (Zn) are uniformly bounded, thus by Lemma 2.1 there exists a constant M > 0 with 3(Zn ; Zn)  M `3(Zn ; Zn); n  1: (12) By de nition of Zn and the xed-point property of Y we obtain the relation  (n) D C (U ); (13) Z = U Z + (1 ? U )Z 0 + n

n

n



with U independent of (Zn ; Zn0 ) and U uniform [0; 1] distributed. We may choose In = bnU c; hence it holds that jIn =n ? U j  1=n pointwise. Replacing Zn , Zn by their representations (6) and (13) respectively we have `3(Zn ; Zn)

 

In

 (n) n ? 1 ? In 0 0

 n ZIn + Zn?1?In + Cn (In ) ? U Zn + (1 ? U )Zn + C (U )

n 



In

ZI

n n

?

U Zn

3

+

n

? 1 ? In Z 0 n

??

n 1 In

? (1 ?

0 U )Zn

3

+

Cn (In )

3

?

 (n)

C (U ) :  3

(14)

The rst and second summand are identical. We have







In  (In )

In I  ( n ) k Y k n 3



ZI ? U Zn =

n  Y ?  U Y =   (In ) n ?  (n)U

n n 3

and



 (In ) In

n

?

3

 (n)U

3





( (In )

3





? (n)) Inn

+ (n)

Inn 3

5

?

U

: 3

(15)

The second summand in (15) is O(1=n) since ( (n)) is bounded and jIn =n ? U j  1=n. For the estimate of the rst summand we use   ln( n) 2 2 ;  (n) =  + R(n); R(n) = O n

and obtain for n suciently large such that  (n)  =2 > 0

( (In)



? (n)) Inn

3



? 2  1

 (In )  2(n) In

n  (n) +  (In ) 3

2

? 2(I )  2(n) In

n  n

?

=

?



3

2

I ? 2 + R(I ) ?  2 ? R(n)

= n n n 3 = 2 O(ln(n)) n

=

O

ln(n)



n

;

which gives the O(ln(n)=n) bounds for the rst and second summand in (14). The third summand in (14) is estimated by







C (I ) ?  (n) C (U )  kC (I ) ? C (U )k + 1 ?  (n) kC (U )k :

n

n



n

n

3

3





3

We have kCn (In ) ? C (U )k3 = O(ln(n)=n) since the maximum norm satis es kCn (In ) ? C (U )k1 = O(ln(n)=n), see, e.g., Rosler [9, Prop. 3.2]. Finally, kC (U )k3 < 1 since C (U ) is bounded and   2  ( n ) ln( n ) 2  ( n ) 1 ? =  1 ? +O 1 :



2



2

n

n

Thus we have `3 (Zn ; Zn ) = O(ln(n)=n) which by (12) implies 3 (Zn ; Zn ) = O(ln(n)=n).

3 Related distances In the following we compare several further distances to 3 and obtain similar convergence rates for these distances. We denote the normalized version of Xn by n ? E Xn e n := X p ; X Var(Xn )

n  3;

and X as in Theorem 1.1. Furthermore let Ce be a constant such that, by Theorem 1.1, 3 (Xe n; X )  e ln(n)=n for n  3. C

3.1 Density approximation

Let # be a random variable with support [0; 1] or [?1=2; 1=2] and with a density f# being three times di erentiable and C#;3 := sup jf#(3)(x)j < 1:

2R

x

6

Let for random variables V; W with densities fV ; fW the sup-metric ` of the densities be denoted by `(V; W ) := ess sup jfV (x) ? fW (x)j :

2R

x

Then the smoothed sup-metric #;4 (V; W ) := sup jhj4`(V

2R

h

+ h#; W + h#);

with # independent of V; W , is ideal of order 3 and #;4 (V; W )  C#;33 (V; W );

see Rachev [5, p. 269]. Therefore, from Theorem 1.1 we obtain the estimate e ; X)  C C e ln(n) ; ; n  3:  (X #;4

n

#;3

n

This implies the following local approximation results for the densities of the smoothed random variates: Corollary 3.1 For any sequence (hn) of positive numbers holds e #;3 ln(n) ; n  3: ess sup fXen +hn # (x) ? fX +hn # (x)  CC nh4n x2R In particular for hn = 1 we obtain a ln(n)=n approximation bound. For a related approximation result for the density fX see Theorem 6.1 in Fill and Janson [3]. A global density approximation result holds in the following form. Assume



C#;2 := f# (2)

1

:=

Z

1

?1

(2)

f#

(x) dx < 1

(16)

for some random variable # with twice di erentiable density f# and support of length one, which is independent of Xe n ; X . Then it holds: Corollary 3.2 For any sequence (hn) of positive numbers holds



f ?f  CeC ln(n) ; n  3: (17)

Xen +hn #

X +hn #

#;2

1

nh3n

Proof: Consider the smoothed total variation metric #; (V; W ) := sup jhj kfV h# ? fW 3

2R

3

+

h

k

+h# 1

;

with # independent of V; W , which is a probability metric, ideal of order 3, satisfying #;3 (V; W )  C#;23 (V; W ), see Rachev [5, p. 269]. Therefore, Theorem 1.1 implies the estimate (17). In particular, we obtain a ln(n)=n convergence rate for hn = 1. Note that the left hand side of (17) is the total variation distance of the smoothed variables Xen + hn #; X + hn #. 7

3.2 Characteristic function distances

Denote for a random variable V by V (t) :=

exp(itV ); t 2 R, its characteristic function and by

E

(V; W ) := sup jV (t) ? W (t)j

2R

t

the uniform distance between characteristic functions. We obtain the following approximation result. Corollary 3.3 For all t 2 R it holds e 3 ln(n) ; n  3:  (t) ?  (t)  Ct (18) Xen



X

n

Proof: We de ne the the weighted -metric  by  (V; W ) := sup jtj? jV (t) ? W (t)j : 3

3

2R

3

t

Then 3 is a probability metric, ideal of order 3, satisfying 3  3, see Rachev [5, p. 279]. Therefore, (18) follows from Theorem 1.1.

3.3 Approximation of distribution functions

In this section we consider the local and global approximation of the (smoothed) distribution functions. We denote by FV the distribution function of a random variable V . Note that for integrable V; W we have `1 (V; W ) = kFV

? FW k

1

with `1 given in (3). Moreover we use the Kolmogorov metric %(V; W ) := sup jFV (x) ? FW (x)j :

2R

x

Let # be a random variate, independent of Xe n ; X , with twice continuously di erentiable density f# , support of length one and C#;2 as in (16). It is known that X has a bounded density, see Fill and Janson [2]. We obtain: Corollary 3.4 For any sequence (hn) of positive numbers holds



(19)  CeC ln(n) ; n  3; F ?F

Xen +hn #

sup FXen +hn # (x)

2R

x

?

X +hn #

#;2

1

FX +hn # (x)

nh2n

eC #;2 (1 + kfX k1 ) C



Proof: Note that between  = ` and  the relation  (V + #; W + #)  C#;  (V; W ) 1

1

ln(n) ; nh2n

n  3:

(20)

3

1

2 3

holds, see Zolotarev [10, Theorem 5], if V; W have identical rst and second moments. This implies that for all h 2 R C#;2 `1(V + h#; W + h#)  Ch#;23 (V; W ) = 2 3 (V; W ): (21) h 8

The inequality in (21) implies that the smoothed `1 metric `1 (2)(V; W ) := sup jhj2`1 (V

2R

h

+ h#; W + h#)

is bounded form above by `1(2) (V; W )  C#;23 (V; W ). With Theorem 1.1 this implies (19). For the proof of (20) rst note that kfX +hn # k1  kfX k1 < 1. With the stop loss metric



d1 (V; W ) := sup E (V ? t)+ ? E (W ? t)+ t2R we obtain from Rachev and Ruschendorf [6, (2.30),(2.26)] and Rachev [5, p. 325] %(Xn + hn #; X + hn #)

 (1 + kfX k1 ) d (Xn + hn #; X + hn#)  Chn#; (1 + kfX k1 )  (Xn; X )  = C#; (1 + kfX k1 )  (Xn ; X ); h 1

2

2

2

n

3

3

which implies the assertion.

Concluding remark

Our results indicate that ln(n)=n is the relevant rate for the convergence Yn ! Y for several natural distances. We do however have no argument to decide the order of the rate of convergence in the Kolmogorov metric %(Yn ; Y ) (without smoothing) as considered in Fill and Janson [3].

References [1] Cramer, M. and L. Ruschendorf (1995). Analysis of recursive algorithms by the contraction method. Athens Conference on Applied Probability and Time Series Analysis, Vol. I (1995), 18{33. Springer, New York. [2] Fill, J. A. and S. Janson (2000) Smoothness and decay properties of the limiting Quicksort density function. Mathematics and computer science (Versailles, 2000), 53{64. Birkhauser, Basel. [3] Fill, J. A. and S. Janson (2001) Quicksort asymptotics. Technical Report #597, Department of Mathematical Sciences, The Johns Hopkins University. Available at http://www.mts.jhu.edu/fill/papers/quick asy.ps [4] Hennequin, P. (1991) Analyse en moyenne d'algorithme, tri rapide et arbres de recherche. Ph.D. Thesis, Ecole Polytechnique, 1991. Available at http://pauillac.inria.fr/algo/AofA/Research/src/Hennequin.These.ps [5] Rachev, S. T. (1991). Probability Metrics and the Stability of Stochastic Models. John Wiley & Sons Ltd., Chichester. [6] Rachev, S. T. and L. Ruschendorf (1990). Approximation of sums by compound Poisson distributions with respect to stop-loss distances. Adv. in Appl. Probab. 22, 350{374. 9

[7] Rachev, S. T. and L. Ruschendorf (1995). Probability metrics and recursive algorithms. Adv. in Appl. Probab. 27, 770{799. [8] Regnier, M. (1989). A limiting distribution for quicksort. RAIRO Inform. Theor. Appl. 23, 335{343. [9] Rosler, U. (1991). A limit theorem for \Quicksort". RAIRO Inform. Theor. Appl. 25, 85{100. [10] Zolotarev, V. M. (1977). Ideal metrics in the problem of approximating distributions of sums of independent random variables. Theor. Probability Appl. 22, 433{449.

10