The Stochastic Structure of Images Jan-Mark Geusebroek ISLA, Informatics Institute, Faculty of Science, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands;
[email protected] Abstract. As resolving power increases, image features evolve in an iterative fashion; large scale features will persist, and finer and finer scale features are resolved at each increase in resolution. As such, the observation process is shown to overwhelm natural image statistics. Observation by a linear imaging process imposes natural image statistics to be of random multiplicative nature, rather than additive. The scaling behavior of the random process is driven by the gradient structure in the scene irradiance. From the general structure of multiplicative processes, image statistics are proven to follow a sequential fragmentation process. From these theoretical results, analytical forms for the distributions of image derivative filter responses and gradient magnitude are derived.
1
Introduction
The observation of scenes is dominated by coincidence. Coincidence stems from the fact that we perceive a two-dimensional projection of the physical world, the projection affected by occlusion, reflection, and clutter. Coincidence due to the viewpoint causing an accidental background. Coincidence due to the lighting conditions and accidental reflection characteristics. If the viewpoint, view, light, and the background are arbitrarily chosen, this does not imply the resulting image has completely arbitrary statistics. Objects having uniform visual characteristics impose structure to the scene. Shadow and shading effects, although accidental, are spatially correlated. Composition of the scene, by human or nature alike, brings structure to the scene. All these effects cause images to be covered by the laws of correlated spatial disorder. Axiom 1. The spatial structure of the irradiance from natural scenes is dominated by correlated disorder. From our experiments, we have reasons to believe that a large fraction of recorded images is covered by the axiom [1]. Hence, natural scenes are largely non regular. Repetitions of structures are considered a different phenomenon. The observation of the accidental scene by camera sensors boils down to the integration of energy over a certain area, spectral bandwidth, and for a certain time. Responses may be grouped by linear filtering, allowing the extraction of structural features from the observed light field. In human perception, an equivalent linear system is present, effectuated by the simple receptive fields in the pre-frontal cortex.
2
Axiom 2. The world is observed by an instrument with some (stochastic, regular) structure; for the observation of light the instrument performs linear integration over space, wavelength, and time. Previous research provides insight into the statistical properties of observed images, either from empirical studies [1–8], or from theoretical modelling [9–13]. However, the effects imposed by the observation instrument on the statistics of the irradiance from natural scenes is not evident. Observation implies diffusion over microscopic fluctuations in the projected irradiance to obtain the final integrated response of a pixel on a camera. For human vision, diffusion spatially extents over the multi-scale receptive fields on the retina. In both cases, the enormous diffusion span is likely to have significant effect on the observed statistics. Diffusion of the numerous small structures will result in fewer large structures [14]. Inversely, increasing magnification at large structures will reveal many smaller structures. As resolving power increases, image features evolve in an iterative fashion. Large scale features will persist, while finer and finer scale features are resolved. Recently, such hierarchical scaling processes in the presence of correlated spatial disorder are shown to be of random multiplicative origin [15–17]. Consequently, as I will demonstrate, the probability density of splitting into a given number of fragments of given contrast and size follows the laws of sequential fragmentation [18].
2
Stochastic Properties of Linear Diffusion
Consider the observation of light to be governed by linear response theory. Hence, observation boils down to linear diffusion of the incoming energy distribution, characterized by the diffusion equation [14] ∂E(x, t) = D∇2 E(x, t) , (1) ∂t where D is the diffusion coefficient, ∇ the spatial gradient operator, and t the scale of observation. The diffusion equation proportionally relates a decrease in resolution t to the spatial Laplacian of the energy density E(x, t). Diffusion may be considered as averaging the initial intensity distribution E(x, 0) to its equilibrium state. That is, the diffusion process in Eq. (1) is smoothing the energy distribution until it reaches its average value hEi. Natural scenes may be characterized by the probability density describing the random nature of the energy fluctuations, and the spatial correlation function describing how a localized fluctuation influences the local, regional, or total energy density. Consider the autocorrelation between the given arbitrary origin in scale and place, and any other point at (x, t) (t being scale), E D ˜ t)E(0, ˜ 0) . (2) C(x, t) = E(x, ˜ t) = E(x, t) − hEi is the Here, h.i represents the average operator, and E(x, deviation of E(x, t) from its average hEi.
3
Lemma 1. For a linear system, the autocorrelation function follows the same diffusion process as the energy density, ∂C(x, t) = D∇2 C(x, t) . ∂t Proof. The Lemma follows from linearity. Theorem 1. Linear diffusion of E(x, 0) causes the autocorrelation function ˜ t) to diffuse proportional to the autocorrelation of the spatial C(x, t) of E(x, gradient ∇E, ∂C(x, t) = D h∇E(x, t)∇E(0, 0)i . ∂t Proof. Consider the definition of the autocorrelation function C(x1 , x2 , t) of a stochastic process f (x, t), C(x1 , x2 ; t1 , t2 ) = hf (x1 , t1 )f (x2 , t2 )i .
(3)
Due to linearity of the derivative operator, differentiation to x1 and x2 , respectively, yields ∂C(x1 , x2 ; t1 , t2 ) ∂f (x1 , t1 ) ∂f (x2 , t2 ) = . (4) ∂x1 ∂x2 ∂x1 ∂x2 When f (x, t) is a wide sense stationary stochastic process, that is, its average is constant and its autocorrelation depends only on x = x1 − x2 , t = t1 − t2 [19, pp. 402–403], and is independent of the choice of origin. Fixing the origin for (x, t) at (0, 0), we get ∇2 C(x, t) = h∇E(x, t)∇E(0, 0)i .
(5)
Hence, the second-order spatial derivative of the correlation function equals the autocorrelation of the spatial gradient of E(x, t). The theorem then follows directly from substitution of Eq. (5) into Lemma 1.
3
The Multiplicative Nature of Linear Observation
The observation of an image is obtained by solving Eq. (1), which boils down to convolution with a Gaussian kernel G(x, t), where t denotes the resolution of observation. Such a scale-space kernel satisfies a decomposition law. Lemma 2. A Gaussian scale-space kernel can be decomposed in an arbitrary number of steps n, X G(x, t) = G(x, t1 ) ⊗ G(x, t2 ) ⊗ . . . ⊗ G(x, tn ) , t2i = t2 .
4
The proof is trivial and omitted for brevity. The Gaussian kernel is a special case satisfying the decomposition law of Lemma 2. As any linear kernel may be decomposed in –possibly varying– small kernels, the decomposition law is much wider applicable than for Gaussian kernels. Hence, the theory provided in this section extents to arbitrary linear observation kernels. The decomposition law defines a kind of cascade with an arbitrary number of steps n [16], where each kernel G(x, ti ) transports energy within a resolution interval ti . Corollary 1. One can not distinguish between the observation E(x, t) obtained at a single coarse resolution t, and the same observation derived from n arbitrary finer resolution steps in a multi-scale approach, yielding the same effective resolution t. As an important consequence, any property derived from the single coarse resolution image can also be derived from the multi-scale cascade. This non trivial notion will be used extensively in the remainder of this paper. The stochastic properties of the coarse resolution observation may be initiated at finer resolutions in the cascade. At this point we need a result from statistical mechanics, specifically the theory of correlated random fluctuations in diffusion processes [15]. Consider a random fluctuation at a fine scale t = t 0 . According to [20], the fluctuation will propagate through a multiplicative cascade as illustrated in Fig. 1. Hence, an increment in intensity ∇E at a coarse scale tc results from a random cascade, initiated by a correlated fluctuation at fine scale t0 [17]. Theorem 2. For a linear diffusion process, correlated fluctuations at coarse resolution tc initiated at a fine resolution t0 are propagated by a random multiplicative cascade n Y αi , ∇Etc = ∇Et0 i=1
where the αi are taken from the coefficients in the multi-scale Gaussian smoothing kernels G(x, ti ).
Proof. Consider an intensity fluctuation δ1 at a fine resolution t0 , see Fig. 1. Coarsening resolution by convolving with a Gaussian kernel can be considered as a step wise process as indicated by the decomposition law of Lemma 2. Each decrease in resolution will cause the fluctuation δ1 to be transported to a coarser resolution ti+1 , proportional to a convolution weight ai . By the energy conservation and the positivity of the Gaussian kernel, the weights 0 < αi < 1. Hence, at the observation resolution Q tc the initial fluctuation yields an increment in intensity proportional to δ1 ai . Consider a second fluctuation δ2 at, for simplicity the same fine resolution t0 , positioned relative to δ1 such that they are captured within the effective extent of the Gaussian kernel at coarse resolution tc . Consider the fluctuation caused by δ2 to propagate through a different branch of the resolution cascade yielding
5
a0
scale tc
a1
a3
a2
a4
a5
a6
scale t0
d1
d2
Fig. 1. The cascade imposed by scale-space convolution. A fluctuation δ1 at fine scale t0 will result in a increment in intensity at the coarser observation scale tc . The fluctuation will propagate through the multiplicative process indicated by the dashed edges in the tree, resulting in a final fluctuation ∇Etc = α0 α1 α4 δ. If the fluctuation is random in intensity, position, or resolution-level, the resulting process will be a random multiplicative process. Due to distributivity, any correlated fluctuation δ2 will yield a multiplicative process.
Q a final fluctuation δ2 bi , which is to be combined with the fluctuation caused by δ1 at the top-level resolution tc . The combined response is given by ∇E = δ1
n Y
ai + δ 2
n Y
bi
(6)
i=1
i=1
The fluctuations are correlated when δ2 = cδ1 = c∇Et0 , and the coarse level contrast is given by ! n n Y Y ai + c (7) bi ∇Etc = ∇Et0 i=1
i=1
The convolution coefficients ai and bi are given by the Gaussian kernel, hence are correlated bi = ci ai . Furthermore, if the correlated fluctuations combine at a resolution t0 , t0 < t0 < tc , additional multiplicative coefficients propagate the combined fluctuations to the observation resolution tc . Consequently, we may write n Y αi (8) ∇Etc = ∇Et0 i=1
where αi combines ai and bi , and n includes any extra resolution steps necessary. To complete the proof, consider that the initial positions of fluctuations δ 1 and δ2 are random, i.e. they are not aligned with the Gaussian smoothing kernel. Furthermore, the number of steps n in which the Gaussian kernel is decomposed
6
is arbitrary, as is the level at which fluctuations occur, propagate, and combine. The exact cascade coefficients αi are of random origin, constrained by the decomposition law for the Gaussian kernel. Hence, diffusion of correlated fluctuations is described by a random multiplicative process, where the random coefficients αi are of law G(x, ti ). Consequently, diffusion of spatially disordered correlated structures is governed by the laws of random multiplicative processes [15, 21]. The exact details of the initial distribution of the scene statistics, given by Axiom 1, are not important to obtain a power-law for the observed statistics. Hence, the properties derived from a multi-scale cascade hold for the direct observation of the statistics at a single resolution scale t. Corollary 2. One can not distinguish between the statistics of the observation E(x, t) of a natural image obtained at a single coarse resolution t, and the same statistics derived from n arbitrary finer resolution steps in a multi-scale approach, yielding the same effective resolution t. As an important consequence, the statistical structure of natural images derived from the multi-scale cascade is equivalent with the statistical structure of a direct observation at a single, fixed resolution t.
4
The Sequential Fragmentation Laws of Image Statistics
An image may be considered to be composed of several correlated components, the individual components being uncorrelated from each other. The stochastic properties of the correlated components follow from the theory of random multiplicative processes [22]. As proven by Levy and Solomon [23], the boundaries 0 < αi < 1 impose the constrained converging multiplicative process to lead to a power-law distribution in the resulting variable |∇E(x, t)|. The spatial composition of many of these correlated components is governed by additive statistical laws. As such, natural images follow the laws of fragmentation processes, as will be derived in this section. As a starting point, consider the observation of a single correlated component. Theorem 3. Diffusion of a correlated structure c yield the gradient magnitude |∇E| to follow a decaying, yet inverse, power-law probability density function pc (x), γ−1 1 . pc (|∇E(x, t)|) = ∇E(x, t) β
Proof. Following [24], we write li = log αi and y0 = log |∇E|. Then we may rewrite Eq. (8) in a recurrent relation yi+1 = yi + li .
(9)
This is a log transform on the multiplicative cascade of Theorem 2. In the transformed domain, the process describes a random walk with a drift hli < 0. The
7
strict lower boundary of αi > 0 ensures convergence of the process rather than escape to −∞. The process of a random walk is described by the master equation [23] Z ∞
P (y, i + 1) =
π(l)P (y − l, i)dl ,
(10)
−∞
where π(l) denotes the transformed distribution of the original probability density Π of αi . Solution of the master equation is obtained in [25] using renewal theory, and is given by [24] Pmax (ymax ) ≈ e−µymax ,
(11)
with µ implicitly given by Z
∞
Π(α)αµ dα = 1 .
(12)
0
The probability is controlled by the extreme values of the random field, as argued in [24] and elaborated upon in [22]. Substituting the original variables for the transformations y, l yields a power-law, pc (|∇E|) = c |∇E|
−µ
,
(13)
c representing a scaling constant. Details on the derivation and alternative proofs are given by Sornette and Cont [24]. The theorem follows from taking µ = 1 − γ and c = 1/β γ−1 , β indicating the width of the distribution. Note that the exact probability distribution π from which the αi are drawn is not of importance to end up with a power-law distribution. Further note that no assumptions about scale-invariance, nor self-similarity, are being made. Hence, the derived power-law is not the result of fractal behavior of the intensity distribution of natural images. As shown by Levy and Solomon [23], power-law behavior for multiplicative processes is as natural as Boltzmann laws for additive random processes. An image may be considered to be composed of several correlated structures. Components further apart than the correlation length are assumed to be uncorrelated. Consequently, statistical properties involve integration over several power-laws, as many as there are correlated patches in the image. Some of these patches may be associated with one specific object, others may be associated with another texture or object. This viewpoint is effectively similar to the theory of sequential fragmentation [18]. The probability of encountering a gradient response p(|∇E|) between r and r + dr, given the probability distribution p c (r) of Theorem 3 for a single contrasting structure, is given by Z ∞ p(r0 )pc (r) dr 0 (14) p(r) = c1 r
that is the accumulation of the contributions by all contrasts |∇E| > r.
8
Solving the sequential fragmentation equation Eq. (14) for a power-law distribution yields p(r) = c1
γ−1 Z ∞ r p(r0 ) dr0 β r
(15)
which is solved by γ−1 1 r γ r p(r) = c1 e− γ ( β ) , β
(16)
and where c1 normalizes p(r) to yield a Weibull probability density function, 1 c1 = R γ−1 ∞ r β
0
e
− γ1
( ) r β
γ
=
1 . β
(17)
The value of the shape parameter γ ranges from 0 to 2 for the distribution to be positive semi-definite [26] (see Fig. 2). Corollary 3. The gradient magnitude |∇E| in natural images follows a Weibull probability density, 1 p(r) = β
γ−1 1 r γ r e− γ ( β ) β
1
0