Gabor Filter Analysis for Texture Segmentation - CiteSeerX

Report 1 Downloads 188 Views
Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Gabor Filter Analysis for Texture Segmentation Michael Lindenbaum, Roman Sandler Abstract Gabor features are a common choice for texture analysis. There are several popular sets of Gabor filters. These sets are usually designed based on representation considerations. We propose here an alternative criterion for designing the filters set. We consider a set of filters and their responses to a pairs of harmonic signals. Two signals are considered separable if the corresponding two sets of responses are disjoint in at least one of the responses. We look for the set of Gabor filters maximizing the fraction of separable harmonic signals. The proposed semi-analytical algorithm calculates filters parameters for the optimal set, given the desired number of filters and the frequency range of possible signals. The resulting filters are significantly different from those traditionally used. We tested the proposed filters both in texture segmentation and texture recognition aspects with commonly used discrimination algorithms for each of the tasks. We show that, as expected, the resulting filters perform better than the traditional ones in discriminating synthetic and real textures. An important side effect of using the proposed filters with the popular features distribution based methods, considering a feature vector composed of the filters’ responses, is the possibility to use a more compact (a lower number of feature vector prototypes) representation of the texture classes than using the common filters.

1

Introduction

Texture is a basic visual cue, helping the human visual system in segmentation and recognition tasks. Its usage in computer vision has been a very active topic in the past three decades. Texture is defined in dictionary as ”the characteristic appearance of a surface having a tactile quality”. However, in computer vision, there is no universally accepted definition for texture [4]. The following images (Figures 1, 2) are some examples to what is referred as texture and textured objects. A texture is not specified by the intensity (or color) in a single point. Therefore, texture descriptors are always based on some neighborhood. It is common to characterize the texture using a vector of scalar descriptors, which may be simply gray levels in some neighborhood [29] or responses of different filters [12, 19, 16, 24, 10]. One of the important applications of texture analysis is image segmentation: the given image is divided into parts with contain homogeneous texture. Texture segmentation may 1

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Figure 1: Some example of textures

Figure 2: Some example of textured objects be done in several ways. A common approach is to associate image units (pixels or larger regions, referred to as patches) with feature vectors describing texture in these locations, and consider the distances between the vectors as ”distances” between the textures. Then the algorithm search for a segmentation, for which the feature vector based distance between points in the same segment is small and the feature based distance between points in different regions is large. Such partition may be found, for example, by graph algorithms [19]. Consider for example figure 3. A illustration of the desirable distances between feature vectors: The neighborhood of point P1 belongs to T exture1 , while those of P2 and P3 belong to T exture2 . Let d(Pi, Pj ) be a distance between associated feature vectors at Pi and Pj . Then ideally, d(P2 , P3 ) < d(P1 , P2 ) and d(P2 , P3 ) < d(P1 , P3 ). The validity of this relation depends on the type of metric, the feature vector characterizing the texture patch etc. Some features and comparison methods are described in section 2. This work considers the influence of the feature vector (and the associated distance measure) on texture discrimination. We restrict ourselves to Gabor features (a common choice for texture features, (see section 2.2) and propose a method for designing a Gabor filter set to maximize texture discrimination. The design is made over a simple set of synthetic textures, for which the discrimination is optimized by analytic and numerical methods. Then, we show that the derived filters work well for realistic textures. Unlike the common modern approach, we characterize a texture by a vector whose components are the responses of (Gabor) filters and not histograms of local texton cluster identities over a region. This way, the decision whether two points belong to the same texture may be made without additional clustering stage, which results is a faster answer to such queries. 2

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

50

100

150

200

250

Texture 1 50

Texture 2 100

150

200

250

300

350

400

Figure 3: Texture segmentation problem example. We also show that using the proposed feature in the context of a clustering based approach is advantageous as well. Our approach is different from previous ones also by the emphasis we put on translation invariance. To be a good texture characterization, the descriptor should not take significantly different values in different locations of the same texture. Common methods do not pay much attention to this demand and use filters which are optimized for texture representation with minimal distortion (measured, say, by MSE error). This causes completely different feature values in different location on the same texture. Although the texture can be correctly restored from this features, several representative features should be taken to describe each texture patch. Here, on the other hand, we directly optimize the filter to achieve maximal discrimination using a single texture feature vector, which, not surprisingly indeed results in superior discrimination. The paper is organized as follows: Section 2 reviews some commonly used texture features, and then focuses on Gabor functions as a method for texture features extraction. It described some examples of their usage, explains the motivation for the common design of Gabor filter banks, suggests that these filter banks are not optimal for texture discrimination, and propose an alternative design approach. Section 4 describes the framework and suggests an intuitive (semi-analytic) derivation of the proposed filters in a simple setting. Section 5 describes numerical procedures for optimizing one dimensional filter bank(s), describes the resulting filters, and compare them to the traditionally used filter sets. Section 6 extends the derived filter sets into 2D and suggests a related adaptive distance measure. Section 7 describe experiments, done on both synthetic and real textures, showing the advantage of the proposed texture features on the traditional ones. The work is concluded by drawing some relations on the obtained filter sets and suggests future directions of the research continuation (section 8).

3

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

2

Background: Texture discrimination methods

Texture discrimination is needed for two main applications: classification and segmentation. The aims of these applications are different but they often use the same texture characterization methods (See 2.1.5 for a short description of feature application methods).

2.1

Texture features

A texture feature associated with some texture patch, is a local descriptor describing some texture-relevant characteristic of this patch. The patch is thus represented by a vector of such descriptors. Many texture features were proposed. A simplistic (but in some context, also effective) method to use the actual grey-levels of pixel’s neighborhood as texture features. Some other approaches, described below, project the image onto another representation space. We shall discriminate between patch descriptors, soft neighborhood descriptors and indirect descriptors 2.1.1

Texture Patch Descriptors

Some of methods are traditionally classified as model based methods, because the pixel’s neighborhood is considered to be a result of some known parametric process. Other methods do not use such models and are called non-model based methods. Some of the popular Model Based Features are 1. Fractal Models [23, 1]. Pentland [23] proved that the fractal dimensions [1] of any object and its image are equal. From this proof he derived that images may be segmented into homogenous parts using a segmentation of their fractal dimension histograms. This approach actually claimed that textured areas may be distinguished by a single parameter. When opposite examples were provided, additional fractal parameters such as Lacunarity [1] and directional fractal measurement were proposed. 2. Autoregressive Models [21]. Autoregressive (AR) models consider the pixel’s gray level value as a result of some random process. This way each pixels’ value can be represented as a weighted sum of surrounding pixels. Region classification is done by comparing the models built for them. Rotation invariant versions were proposed as well. AR is a particular case of more general MRF approach. 3. Markov Random Fields [3, 29]. A texture is considered to be a realization of a stochastic, two dimensional Markov Random field. In a MRF model, the gray level in a pixel is a random variable randomized depending on its neighbors. Therefore, some typical neighborhoods characterizing the texture appear more frequently and may represent the texture. Alternatively, similar to AR models, the parameters specifying the dependency may be used as a representation. Some of the Non-model-based Features are

4

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

1. Grey-level co-occurrence [13]. These methods actually calculate two dimensional histograms of the occurrence of the pairs of grey-levels for a given displacement vector (distance and direction). That is, the resulting matrix contains the number of the occurrences for each possible pair of gray-levels separated by distance d in direction θ for a given texture patch. 2. Frequency domain methods Commonly used features consist of sums of coefficients within wedges, rings or sectors of two dimensional power spectra. Additional features include the characteristics of ”spectral peaks”. 2.1.2

Soft neighborhood descriptors

One often needs to characterize every pixel in the image with respect to the texture it lies on. One straightforward method would be to define a patch around every such pixel and to characterize the pixel using the patch texture features described above. A better representation would be one which puts more weight on the image properties closer to the pixel (relative to other locations in the patch). Features based on Gabor filters (see section 2.2), wavelet and wavelet packet decomposition [2] and other characterizations based on mixed space-frequency descriptors are especially suitable for such tasks. One example of the later is Laws’ texture energy filters [15] (which are sets of seven bandpass and high-pass directional filters, implemented as 5 × 5 masks). A comparative study which consider most of the common texture features and suggests new sets of filters for specific texture classes is described [27]. 2.1.3

Indirect, distribution based, descriptors

It often happens that textures contain several typical sub-patterns, which appear more frequently than others. These are commonly named texons. Several texture characterization methods rely on the distribution of textons: One popular method is to cluster the (vector) pixel (soft neighborhood) or patch descriptors vectors into several (K) groups (or clusters). This way, every pixel is assigned to some cluster and is given an index. The distribution of such indexes in the neighborhood of the pixel is used as a texture characterization of this pixel. This method turned out to be very successful but note that it is not local: the characterization depends, in principle, on all image gray levels. The feature vectors that are clustered may be either Gabor filter responses [19] or even the gray levels in a small neighborhood themselves [29]. Mean shift clustering is a more modern method of clustering which may be used for indirect texture features generation as well. Here, the number of groups is found in an iterative convergence process. The distribution of the groups in neighborhood of the pixel is used for textures comparison [10]. 2.1.4

Comparing texture descriptors

The feature vector descriptions may be compared by several methods. One way would be to use some metric distance d (satisfying symmetry, minimality, triangular inequality and self-similarity). However there are strong evidences that metric distance measurement is 5

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

not fully consistent with human perception system. For example, there is no proof that a property of triangular inequality exists for human perception [14]. Instead of a metric distance it is possible to compare texture features using some dissimilarity measure, which does not require the Triangle inequality. Practically, many measures are used. Some common methods are Euclidean distance Ln , Mahalanobis distances, correlation, χ2 and Earth Movers distance (EMD) [24]. Some papers suggest building rulebased systems or neural network classifiers [11]. 2.1.5

Application of the features

As mentioned earlier, there are two main tasks for texture discrimination systems: recognition and segmentation. A texture recognition task is divided into two stages: training and recognition. In training many patches of different textures are given together with their classifications. A representation of every texture is constructed. A modern example is the distribution of feature vectors. Then, in the recognition stage, a query patch of some texture is given and its classification is required. This may be carried out by finding the distribution of features in the given patch and comparing it to every one of the distributions obtained in training. In this task, much of the work is done in the feature extraction and in the distribution estimation. Texture recognition results are usually characterized by a relatively high performance, because a lot of features, for relatively large image areas are considered for a single decision on texture kind [29]. Texture based segmentation usually starts by extracting a large matrix of similarities associated with every pair of pixels (or some rougher image quantization). For an N × N image, this matrix is of size N 2 × N 2 . Then, the segmentation method based on this matrix may be of different complexities. A simple method could, for example, build a graph where every nodes corresponds to a unique pixel. Then, graph edges are added only where the similarity between the corresponding feature vectors is high. Then, connected components algorithm partitions the graph (and the corresponding image) into groups. A more complex algorithm could use less trivial graph partitioning methods such as Normalized Cut [26], edge flow [18], Multi-grid aggregation [25], Active contours and different Byesian criteria based methods.

2.2

Gabor features

Gabor kernels are commonly used for texture features extraction. Their popularity is motivated by the mathematical and the biological properties of Gabor functions. In 1946, D.Gabor [9] proved that a signal’s specificity simultaneously in time and frequency is fundamentally limited by a lower bound on the product of its bandwidth and duration (analogous to indeterminacy relations of quantum mechanics). The bound is t2

1 (∆x)(∆ω) ≥ 4π . He also proved that signals of the form s(t) = e− α2 +iωt achieve the theoretical limit he found. Computationally, the Gabor functions form a complete but non-orthogonal basis set and any given function can be expanded in terms of these basic functions.

6

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

In 1985 Daugman generalized the Gabor function to the following 2D form to model the receptive-field profiles of simple cells in the striate cortex[6]: 1 −π G(x, y) = e 2πσβ



(x−x0 )2 (y−y0 )2 + σ2 β2

 −2πi[u0 (x−x0 )+v0 (y−y0 )]

(2.1)

These functions, which are spatial bandpass filters with central frequency (u0 , v0 ), achieve the theoretical limit for conjoint resolution of information in the 2D spatial and 2D Fourier domains. The specified properties of 2D Gabor functions inspired a search for texture discrimination methods based on these functions. Porat and Zeevi [22] reviewed several Gabor features and suggested a texture discrimination method based on two of them. Fogel and Sagi [8] proposed a texture discrimination method, which is based on calculation of the dissimilarity between Gabor power spectrum of two texture elements. In other words they compute the absolute value of the complex Gabor-filtering result and compare the received values. Fogel and Sagi were the first to notice that texture features received by this method are locally shift invariant. Malik and Perona [20] explained that Gabor features should be non-linear, to make their recognition properties consistent with human perception. They also mentioned the Fogel and Sagi method as one of the methods which implement their findings. 2.2.1

Representing texture with oriented Gabor power filters

As it clear from eq. (2.1), Gabor kernel significantly responds to a limited range of signals which form a repeating structure in some direction (which is usually denoted as filter’s orientation or direction) and are associated with some frequencies (filters band). The obvious question is which filters should be taken to represent all possible textures. Textured images are commonly represented with responses to filters from some filter bank, which usually covers the frequency-direction domain (see figure 4). Therefore each image pixel is represented with a two dimensional vector of values specifying coarse frequency decomposition of the texture in several directions. Greenspan and Perona shown [12] numerically that taking complex filters in four directions is enough to represent most (97%) of the image energy. Still six directions are usually used. The number of scales (number of considered filter frequency bands) is usually three or four. It is generally believed that the number of filters in the bank and their parameters are not so important and that all reasonable choices provide almost the same quality of representation. The commonly used filters are either filter-set designed for general image representation or filter-set design for discriminating a set of known textures. We argue that these approaches are not optimal for texture segmentation of general images. We shall further show that some significant difference in discrimination performance may be achieved as a function of filter bank choice. 2.2.2

Filter-sets for general image representation

Commonly used 2D Gabor basis functions are spatially localized, oriented, and roughly one octave in bandwidth (see eq. 2.1). Such decompositions are inspired by biological visual 7

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

(a)

(b)

Figure 4: (a) The cumulative frequency responses of a bank of Gabor filters, plotted in a 2D frequency plane. Each bright spot represents a range of frequencies for which some filter strongly respond. Note that each ring of spots corresponds to filters with the same radial frequency. Spot with different distance from the origin but same directions correspond to different scales. This filter bank contains 5 scales and 4 directions. (b) The features calculated in every pixel of the given texture. Each greyscale in the right 18 squares equals to the response magnitude of the filter associated with this square in the pixel with the same coordinates on the original image (the most left image). The features are calculated using filter set of three filters oriented in six different directions. For example the value at position (14, 46) on the third square in the second row is associated with the response to the filter with the second frequency of the set in the third direction at pixel (14,46) of the original texture. processing [6] and by signal representation considerations. Specifically, Field [7] claimed that such filters are optimal for natural scene images coding: he found that the most compact representation of (a small set of) natural images is achieved when the representation is done with log-Gabor filters associated with 1.0 octave frequency bands. Lee [16] showed that 2D Gabor filters with octave steps form a frame. Family of functions forms a frame when any original signal may be reconstructed in a numerically stable way from its decomposition coefficients, and the difference between the energy of the reconstructed signal and the original energy is bounded. A tightness of a frame is specified as a measure of non-variability of the difference between the original and the reconstructed energies. Lee found that a 1.5 octave filters form a tighter frame and that they can be treated as though it were an orthonormal base. 2.2.3

Multi-textured images

In this approach [30, 11] all the possible textures are preliminarily known, as well as the bank of all possible Gabor filters. The issue is to select the best filters for the segmentation task. The general method in this case is to test all the filters on textures’ samples and to select the most valuable ones into the filter-set.

8

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

2.2.4

Some sample implementation of Gabor feature based texture discrimination systems

Many vision systems use texture discrimination by Gabor filters. To illustrate the usage of these feature, we now briefly describe three representative examples. Greenspan et al. [11] The method is based on extraction of texture features for each 8 × 8 block of the image and consecutive recognition from database with learning based algorithms. In the feature extraction stage the image processed in three scales. In each scale 4 absolute values of the responses to different orientations of the same complex logGabor filter are obtained. The Fourier transform of theses values, done relating to the orientation axis, provides rotation invariant representation of the texture on this scale. The transform also allows to find some ”texture’s direction” which is used to distinguish different orientations of the texture of the same kind. An additional feature for each scale is the response to Laplacian filter of the same size as the Gabor filter. The result of the feature extracting stage is a 15-dimensional feature vector. The scales (the filters frequencies) are distributed in octave steps, such that the filters bandwidth doubles for each next high-frequency band. In the second (recognition) stage the features were clustered along each dimension using K-means algorithm and the results were classified using different methods. The association of the patches with texture classes is used for image segmentation and texture classes recognition. EMD based image retrieval Rubner [24] uses Gabor features in his database image retrieval algorithm for searching an image that includes a given texture. In the preprocessing stage of the algorithm, every image passes a feature extraction stage, the features are clustered using K-means algorithm and K resulting features are stored as a representation of the textures which are present in the image. In the retrieval stage some image including only the requested texture is presented to the system, its features are averaged and the resulting feature vector is EMD-compared (see below) to every feature vector stored in the database in the preprocessing stage. The images that one of their feature vectors is close to the query are added to the answer as possible candidate. The EMD (Earth Mover’s Distance) is defined as the minimum amount of work needed to change one feature vector into the other. The ”work” may be defined differently for features of different nature (for example, moving energy between the features of the same scale may be defined as cheaper then moving it to other scales). In the case of Gabor features the resulting distance measure is more justified than the commonly used metrics that minimize geometrical distance. Rubner used 24-dimensional feature vectors, constructed from absolute value Gabor responses to 4 filters frequencies in 6 directions. He also used filters organized in octavelike way and used Fourier-Mellin transform for providing rotation and scale invariance to the resulting feature vectors. 9

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Normalized cut method [26, 19] In [26] the authors acted in a more or less common way - they calculated Gabor-like features of the input image and passed these features to the segmentation mechanism. They used 18 DOG absolute value features, received using 3 filters in 6 directions. DOG - Difference of non-isotropic Gaussians are complex filters which are similar to the Gabor filters with one cycle of the sine inside the meaningful area of the filter. In the segmentation stage a connected full graph is built, while each graph node representing a pixel of the image and every edge represents a ”likelihood” for the two pixels to be a part of the same object. The edge value is set as an exponential function of (minus) the scalar product between the pixels’ feature vectors. This way, pixels with similar feature vectors are getting a high edge value. The goal of graph segmentation methods is to find a minimal cut which is a combination of edges having minimal sum of edge values (i.e. find the least alike pairs of pixels), and removing these edges divides the graph into two unconnected subgraphs. The advantage of the normalized cut method is that it considers two aspects of graph segmentation: minimal cut (i.e. better separation) and preferring segments of large size. The normalized cut process divides the weighted graph into two subgraphs, such that the cut between the subgraphs, normalized by the total ”connectivity”, is minimal. In a second paper [19] the texture features are analyzed using the distribution based texture descriptors (see 2.1.3). The Gabor feature vectors are not directly compared as in the first article, but divided into K clusters and each image pixel is assigned to one of these clusters. Using the Delaunay triangulation a local scale is found for every pixel, and the histogram of clusters distribution in the neighborhood of this local scale size is attached to each pixel as a descriptor. A ”likelihood” for the pixels to be a part of the same object (denoted as texture cue) is calculated by χ2 test on the descriptor histograms of the pixels. In this method in addition to the texture cues, additional, edge detection based, contour cues are also considered. The segmentation is done using ”normalized cut” method, while the edge value is set to be the exponent of the product of contour and texture cues for the pixel pair.

10

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

3

Our approach

This work is done in the context of the following common framework for texture discrimination: the texture image is filtered by several (complex) Gabor filters and a feature vector is constructed with the absolute values of the responses as components. The discrimination between textures uses only these feature vectors and relies on simple procedures to decide whether two vectors are associated with the same texture. Specifically, such a procedure decides that the texture in two points is of the same type, if the Euclidean (or Mahalanobis) distances between the corresponding feature vectors is lower than some threshold. This work focuses on the design of the Gabor filters set. As mentioned above, the common choices are made for optimizing the filter design with respect to image representation (in MSE sense). We argue that the best filters for texture discrimination are different than those used for representation and shall indeed propose a set of Gabor filter, which is different that the commonly used sets. The response of a filter to some texture may take a range of values. We shall say that a filter (or a filter set) discriminate between two textures if the ranges of values associated with these two textures, are disjoint. We shall look for filters maximizing the discrimination between different textures, in a well defined sense described below; see section 4.2. To that end, we develop an analytical expression for the range of response values of a 1D Gabor filter to simple signals (e.g. Sine waves). We use these ranges to specify the best filter sets, which separate best between the simple signals. The filter sets are derived by numerical and semi-analytical methods, which lead to similar results. The resulting filter sets are different than the traditional ones. In a sense, the design of filter sets for specific textures [30] is related to our work. One main difference is that we model the signal and estimate the variation analytically. Another difference is that unlike the iterative, heuristic design, suggested in [30], we use a global process for the optimization. Our work can be considered to be an investigation into the translation sensitivity of Gabor filters. While the absolute value of Gabor filtered signal will never be truly shift invariant, its variability may be bounded in relatively small ranges, such that for a relatively far values we would be able to say that they belong to different textures. Essentially, traditional texture discrimination methods consider the filter response in qualitative form. For example, if the texture is simple (e.g. sinusoidal), they are interested in the filter that react maximally to it, and not much in the actual response size. In contrast to these methods but not unlike the modern, distribution based approaches, we model the actual response and therefore can make a finer, “super resolution” like, discrimination between textures. Analogously to super-resolution methods using several measurements corresponding to the same brightness, the resulting filters turn out to be highly overlapping in frequency coverage. This leads, as we shall see, to better discrimination. As described above, modern texture characterization methods rely on distributions of filter response. Such methods were proved to be insensitive to the type of texture features [29], provided that a sufficient number of textons is allowed. We, on the other hand, are interested in finding the best feature and therefore shall evaluate the proposed ones using direct methods (and not using distributions). It is likely that less complex distributions (with a lower number of textons) could be built with better features. 11

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

4

A filter response to harmonic input analysis

We start by considering the range of responses associated with one 1D filter and one simple signal. The simple signals considered as a model of texture are either Sines signals or combination of them. The discrimination study is done in the context of specific sets of signals associated with [fmin , fmax ] frequency range and [Amin , Amax ] amplitudes range. These ranges are latter referred to as the framework. Our goal is to maximize the number of signal pairs which may be distinguished. The desired result is a set of filter’s parameters, depending on the prespecified number of filters and the parameters fmin , fmax , Amin , Amax . We first focus on a few simple cases, allowing intuition to the Gabor filters signal discrimination capabilities and providing computational advantages in the final optimization. For a simple case of a single harmonic signal with known amplitude we can find the optimal single filter analytically. Using the acquired intuition we developed a semi-analytical method for building a filter set for discriminating between single harmonic signals with unspecified amplitudes. The resulting filter set is very similar to the one obtained from the full numerical search on the filter parameters domain. We also provide an expression for Gabor filter response bounds to dual-harmonic signals and multi-harmonic signals. Estimates based on these expressions allows us to understand the responses behavior and imply that the filters, obtained for single harmonic signals, perform well for the multi harmonic case too. As we shall see later, these filters perform well for distinguishing between real textures as well.

4.1

A single filter response to Sine signal −x2

1 e 2σ2 +ikx , and a signal gA,m (x) = Asin(mx). Consider a complex Gabor filter hk,σ2 (x) = √2πσ 2 The (complex) filter response is highly variable depending on location but its absolute value

rA,m,k,σ2 (x) = |hk,σ2 (x) ∗ gA,m (x)| −(k2 +m2 )σ 2 p 2 ch2 (kmσ 2 ) − cos2 (mx) = Ae

(4.1)

is more stable (see Appendix A for derivation). Here we consider only the absolute value of the response and refer to it simply as the response. This response varies with location (x coordinate) and depends on the relative phase of the Sine input signal with respect to the filter location (x = 0). This change is undesirable, but still unavoidable. It can be shown that the range of responses is bounded by     −(k−m)2 σ 2 −(k−m)2 σ 2 −(k+m)2 σ 2 −(k+m)2 σ 2 A A 2 2 2 2 e e > rA,m,k,σ2 (x) > . (4.2) +e −e 2 2 These bounds are tight. The response varies around a middle value, rA,m,k,σ2 =

A −(k−m)2 σ2 2 e , 2

with a maximal deviation ∆rA,m,k,σ2 = Ae 12

−(k+m)2 σ 2 2

(4.3)

.

(4.4)

0.4 0.35 0.3 r1,ω,k,σ2

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

0.5 0.45

0.25 r2’

0.2 r1’ 0.15

r2

0.1 r1

0.05 0 −2.5

−2

−1.5 log(ω1)

−1 −0.5 log(ω2) log(ω3)

0

0.5 1 1.5 log(k) log(ω4)log(ω ) 5

Figure 5: The upper and lower bounds on the response of a Gabor filter hk,σ2 to a constant amplitude signal, plotted against the signal frequency (in log (natural logarithm) scale). (k = 1.6, σ = 1.51). The signal with m2 frequency is indistinguishable from signals with frequencies in [m1 , m3 ] and [m4 , m5 ] ranges. This deviation is referred to as the filter’s variability. It increases (exponentially) as the frequency decreases, implying that the response to lower frequencies is more space varying; see Figure 5.

4.2

The optimality measure - number of separated harmonic signal pairs

As described above we intend to specify filters which maximize the separation of signals. Here we formally define the optimality measure for this separation. We intend to use it to find the filter sets with maximal discrimination power, given the parameter of the framework. The variables declared here are described more intuitively later. The goal of these formal definitions is to draw guidelines for derivations in the following sections. −x2

1 e 2σ2 +ikx and an harmonic signal 1. Range of responses For a filter hki ,σi2 = √2πσ 2 gA,m (x) = Asin(mx), the range of possible responses is: h i Rki ,σi2 (m, A) = min{rA,ki ,m,σi2 (x)}, max{rA,ki ,m,σi2 (x)} . (4.5)

To reduce the sensitivity to measurement noises we require that the response is not too small and is al least Rmin (see more on this in section 4.3.1) and redefine: h i ′ ′ (x)}, max{r (x)} (4.6) Rki ,σi2 (m, A) = min{rA,k 2 2 A,ki ,m,σ i ,m,σ i

where

′ rA,k 2 (x) i ,m,σi

=



i

rA,ki ,m,σi2 (x) rA,ki ,m,σi2 (x) ≥ Rmin , 0 rA,ki ,m,σi2 (x) < Rmin

13

(4.7)

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

2. Signal separability. Two signals gAi ,mi (x) and gAj ,mj (x) are separated using the filter set {hkl ,σl2 |l ∈ 1, .., L}, where L is the number of filters, if their responses for some filter are disjoint. The binary variable  1 ∃l s.t. Rkl ,σl2 (mi , Ai ) ∩ Rkl ,σl2 (mj , Aj ) = ∅ SAA (mi , mj , Ai , Aj ) = (4.8) 0 else is used to describe signals separability. 3. Frequency separability. Two frequencies mi , mj are separated if for all amplitudes in the allowed range [Amin ; Amax ] the associated signals are separated. The binary variable  1 ∀Ai ∀Aj SAA (mi , mj , Ai , Aj ) = 1 s(mi , mj ) = (4.9) 0 else is used to describe frequencies separability.

4. The measure of filter set performance. The variable Q ∈ [0, 1] specifies the relative fraction of separable frequencies and is a proposed optimality measure for filters hk1 ,σ1 , hk2 ,σ2 , ...:  Q (k1 , σ12 ); (k2 , σ22 ); ... =

R fmax R fmax fmin

s(mi , mj )dmi dmj fmin R fmax R fmax dmi dmj fmin fmin

(4.10)

All the following derivation are made in the context of maximizing Q. In the numerical experiments section (5) we consider a discrete form of this measure.

4.3

The design of a single filter for discriminating equal amplitude sine signals

As described above we intend to specify filters which maximize the separation of signals. Considering, first, the simpler, equal amplitudes, signals, we say that a filter separates two signals associated with frequencies m1 and m2 , if the set of possible responses are disjoint, \ (4.11) {rA1 =1,m,k1 ,σ2 (x), x ∈ R} {rA2 =1,m,k2 ,σ2 (x), x ∈ R} = ∅ In other words, if a value r is the response to a signal with frequency m1 at some location x1 , it cannot be the response to a signal with frequency m2 for any x. Figure 5 illustrates such a separation: The filter’s response to a signal with frequency m1 varies between R1 and R1′ . The closest frequency (from above, with the same amplitude) which can be separated from m1 is therefore m2 , for which the minimal response R2 is larger tan R1′ . In the same way the closest frequency which is separable from m2 is m3 . The variability of the responses to m2 is smaller than the variability of the responses to m1 (consistent with eq. 4.4). Therefore m3 is closer to m2 than m2 to m1 . Note the logarithmic scale.

14

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

1

0.5

0.9

0.45

0.8

0.4

0.7

0.35

0.6

0.3

0.5

0.25

0.4

0.2

0.3

0.15

0.2

0.1

0.1

0.05

0 −3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

0 −3

1.5

(a)

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

(b)

Figure 6: Specifying the filter’s frequency too low is not advisable because it causes either high variation (with low σ - fig. (a) - the lower bound on the response is slightly above X axis, while the upper one is very high ) or low response for most of the frequencies in the range - fig.(b) 4.3.1

Avoiding noise

The difference in response to high frequencies gets smaller and therefore, theoretically, even very close frequencies may be separated. This is not true in practice because the real signals are not pure sines and have additional ”noise” harmonics. The variability due to the ”noise” is bounded by the magnitude of the noise component response (see Appendix B and section 4.5.1 for details). Therefore, even a small ”noise” signal, with frequency close to the center of the filter (k), makes very low responding signals indistinguishable from all the other signals having a very low response. Therefore, we specify a minimal response threshold Rmin , and shall not consider two signals separated by some filter, if both responses of this filter are lower than Rmin . The threshold Rmin is specified relative to the maximal response of the filter to a signal with amplitude Amin . (This maximal response is obtained for a signal associated with frequency k). Thus, Rmin may be regarded as the minimal relative amplitude of a signal which causes a competing response. A typical value of Rmin chosen in our experiments is in 0.05. The filter’s bandwidth may be redefined as a range of frequencies associated with signals for which the response is at least Rmin . The goal of this section is to find an optimal filter, with given Rmin and optimized central frequency k and width σ, such that the fraction of separated frequency pairs in [fmin , fmax ] is maximal. 4.3.2

A preliminary consideration

Based on the discussion above, one could intuitively argue that the best way to design the filter would be to specify its central frequency k smaller than the minimal frequency fmin . This way, all the frequency pairs are in the right side of the filter where the variability is small, and therefore many pairs may be separated. Unfortunately, this approach is problematic because: 1. A choice of small σ implies a very high variation causing most pairs to be indistinguishable; see figure 6(a). 15

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

∆ rA,m’’,k,σ2 ∆ rA,m,k,σ2

∆ rA,m’,k,σ2

∆m m’ ′

m

m’’

′′

Figure 7: [m , m ] is a minimal interval of frequencies indistinguishable from signal with frequency m. ∆rA,m,k,σ2 are variability ranges for each of the three frequencies. In the analysis we use a linear approximation, to estimate ∆m. 2. On the other hand, a choice of large σ implies that the response is lower than Rmin for most of the frequencies in the range; see figure 6(b)). There is no intermediate value solving these problems, In fact, the graphs of figure 6(a) correspond to the maximal σ satisfying that the middle response to all frequencies is above Rmin = 0.05. The other option of placing k is between fmin and fmax . In the next subsection we derive the optimal values for k and σ for this case. 4.3.3

Optimal single filter derivation ′

′′

Let [m , m ] be a minimal interval of indistinguishable frequencies (using a filter hk,σ (x)), ′ ′′ see Figure 7. Consider an “asymptotic” approximation where m ≈ m ≈ m. Then, ∆rA,m′ ,k,σ2 ≈ ∆rA,m′′ ,k,σ2 ≈ ∆rA,m,k,σ2 Therefore, r A,m′ ,k,σ2 − r A,m′′ ,k,σ2 =

 1 ∆rA,m′ ,k,σ2 + ∆rA,m′′ ,k,σ2 + 2∆rA,m,k,σ2 ≈ 2∆rA,m,k,σ2 2

A linear local approximation implies that

−(k+m)2 σ 2

2

2 2∆rA,m,k,σ2 4Ae 4e−2kmσ = = ∆m ˜ ≈ ∂r −(k−m)2 σ 2 2 |k − m|σ 2 2e 2 A|k − m|σ A,m,k,σ ∂m ′′



where ∆m ˜ = ∆m(m, ˜ k, σ 2 ) = m − m (see eq. (4.3), (4.4)). 16

(4.12)

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

A higher discriminability is obtained if ∆m ˜ is smaller. Therefore its value is sometimes ∂r 2 referred to as the filter indiscriminability. Note that near the filter’s frequency k, A,m,k,σ ≈ ∂m 0 and the range of non separability is wider. This problem is latter resolved using additional filters. Note that if a frequency is non separable from any other frequency, then its ∆m ˜ is still fmax − fmin and not infinity. Therefore, the indistinguishability is redefined ∆m = min{∆m, ˜ fmax − fmin }

(4.13)

As mentioned earlier, our goal is to segregate the maximum possible amount of frequency pairs m1 , m2 ∈ [fmin , fmax ]. As it is clear from figure 5 that there are two ranges of signal frequencies which respond similarly to signal m2 - the first is m ∈ [m1 , m3 ] and the second is m ∈ [m4 , m5 ]. For simplicity we ignore the second range of signals and consider every signal in [fmin , fmax ] to be distinguishable from the signal with frequency m if it’s farther than ∆m(m, k, σ)/2 from m. I.e. in case on figure 5 we consider m2 separable from every frequency smaller than m1 or larger than m3 . With this simplification, the fraction of indistinguishable signal pairs is (proportional to) Z fmax ∆m(m, k, σ 2 )dm, (4.14) fmin

and the filter which minimizes this measure, maximizes 4.10 and is thus optimal one. By symmetry, adding the ignored second range of indistinguishable signals, change (4.14) expression by a multiplicative constant, and hence does not matter for the optimal choice. Requiring that the response is above the threshold Rmin , is approximately equivalent to ∀m ∈ [fmin , fmax ],

rA,m,k,σ2 > Rmin ,

(4.15)

which, considering the filter’s unimodal response, is satisfied if r A,k,fmin = rA,k,fmax = Rmin .

(4.16)

Choosing these minimal responses in the range endpoints specifies the filter’s parameters as: k=

fmin + fmax 2

;

σ2 = −

8log(Rmin ) . (fmax − fmin )2

(4.17)

This choice provides filter parameters which are close to the optimal ones, minimizing (4.14) expression. As explained below (section 5), the values fmax = π, fmin = .1 and Rmin = .05 are good choice for practical texture discrimination tasks. These choices lead to a filter associated with k = 1.62 and σ 2 = 2.59, which are indeed close to the, numerically obtained, optimal single filter parameters, which are k = 1.61 and σ 2 = 2.31. See Figure 5 for a plot of this filter response. This filter turns out to be optimal due to the following considerations: Increasing σ beyond the value (4.17) contradicts the constraint (4.16). Decreasing it increases ∆m. Increasing k require to decrease σ to satisfy (4.16) again. In Appendix D it is shown that increasing k and decreasing σ (and preserving r A,k,fmin = Rmin ) causes a bigger ∆m for every m, and therefore the filter can separate less frequency pairs (see 4.14). 17

.05

.1

m2

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Discriminability of sin(m1x) from sin(m2x) using a single filter

.15

.2

.25

.3 .05

.1

.15 m1

.2

.25

.3

Figure 8: The discriminability of a single filter. For signal pairs, where the frequency of the first signal is along x axis and the frequency of the second signal is along y axis, white pixel value means that the signals are distinguishable using single filter and black pixel value means they are not. Most of the indistinguishable signal pairs are associated with similar signal frequencies or with signal frequencies which are symmetric around the filter’s frequency (as ranges [m1 , m3 ] and [m4 , m5 ] on figure 5) The conclusion is that any change of the filter specified by (4.17) degrades its discriminability. Thus a single filter for best discrimination of single sine signals with equal amplitudes in an interval [fmin , fmax ] for a preset Rmin response is specified by (4.17). The separating power of a single filter is illustrated in Figure 8 where pairs of indistinguishable frequencies are colored dark. As we shall see latter a combination of filters may improve discriminability.

4.4

The design of a filter-set for signals with a single harmonic and arbitrary amplitudes

Clearly, the discrimination of two arbitrary sine signals using a single Gabor power filter is impossible because for every pair of frequencies m1 and m2 and for every point x, there are always non-zero amplitudes A1 and A2 , such that rA1 ,m1 ,k,σ2 (x) = rA2 ,m2 ,k,σ2 (x). Adding more filters can improve this situation, see Figure 9. A pair of signals indistinguishable by the first filter may be distinguished by the second filter and vice versa. Our goal is to find such organization of filter pairs that separates maximal number of signal pairs. Consider a pair of signals g1 (x), g2 (x) where gi (x) = Ai sin(mi x). To discriminate between the signals we look for two filters, hk1 ,σ12 , hk2 ,σ22 satisfying the following two conditions: 1. |(hkj ,σj2 ∗ gi )(x0 )| > Rmin i ∈ {1, 2}, j ∈ {1, 2}.

(4.18)

2. At least one of the filters (hkj ,σj2 ) satisfies {rA1 ,m1 ,kj ,σj2 (x), x ∈ R}

\ {rA2 ,m2 ,kj ,σj2 (x), x ∈ R} = ∅

(4.19)

The noise consideration (described above, in section 4.3.3), require that the response is above a some threshold. This threshold was specified relatively to the maximal response of 18

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0 −2.5

−2

−1.5 m1

−1

−0.5

0

0.5

1

0 −2.5

1.5

m2 m3

−2

−1.5

−1 m1

(a)

−0.5

0

0.5

1

1.5

m2 m3

(b)

Figure 9: The discrimination of two arbitrary sine signals using a single Gabor power filter is impossible. For example (a) signals m1 and m2 are separated for some amplitude. However slight change in the amplitude makes m2 and even more distant m3 indistinguishable from m1 . Adding another filter (b) resolves this problem for many signals. Although m1 and m2 are still indistinguishable with the first filter they can be separated using the second filter. the minimal signal. Such response is obtained when the signals frequency coincide with the filter center frequency. For mi = kj , the response of hkj ,σj2 to gi is at most Ai /2. Therefore, the maximal response to the minimal signal is Amin /2. Rmin was usually specified as a tenth of this value, i.e. Amin /20. Satisfying the first condition thus implies that the signals are “observable” (by these filters), and are not too sensitive to some small amplitude noise. The second condition implies that at least one of the filters is able to distinguish between the signals based on its response. This condition is satisfied if the relative responses satisfy the (stronger) condition: ( ) ( ) \ rA2 ,m2 ,k1 ,σ2 (x) rA1 ,m1 ,k1,σ12 (x) 1 ,x ∈ R , x ∈ R = ∅; (4.20) rA1 ,m1 ,k2,σ22 (x) rA2 ,m2 ,k2 ,σ22 (x) see Claim 1.1 in Appendix E for the proof. The stronger condition is even more intuitive: suppose that two signals associated with unit amplitudes satisfy 4.11. Increasing one of the amplitudes does not change the ratio and hence leaves the signals separable. Therefore, this condition is independent of the amplitudes. In the rest of this section we shall consider two signals as separable if their frequencies satisfy 4.19, assuming that the minimal response condition (4.18) is satisfied. Later we shall see that this condition is satisfied by specifying the filters properly. Note also that 4.20 is a sufficient condition to 4.19. That is, it may be that two signal have disjoint responses even if the ratio ranges are not disjoint and 4.19 is not satisfied. As we shall see the filters are chosen so that the ranges of both the responses and their ratios is small, implying, for most signals, that 4.20 is equivalent to 4.19 and not only a sufficient condition. 19

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

The following calculations bring this condition into a simpler form, similar to that of the discriminability measure (4.12) and allow relatively simple optimization on {~k, σ~2 }1 parameters. Note, that since the rA,m,k,σ2 (x) = Ar1,m,k,σ2 (x), the ratio expression is amplitude independent, although its derivation is based on signals with various amplitudes. 4.4.1

An expression for indistinguishability

The ratio of two Gabor filters responses to a signal gA,m (x) = Asin(mx), is bounded for any x by the following upper (G+ ) and lower (G− ) expressions, derived using (4.3)(4.4): ch(mk1 σ12 ) sh(mk2 σ22 ) 2 2 +m2 )σ 2 −(k2 +m2 )σ 2 (k1 1 2 2 sh(mk1 σ1 ) 2 G− (m, ~k, σ~2 ) = e− ch(mk2 σ22 ) G+ (m, ~k, σ~2 ) = e−

2 +m2 )σ 2 −(k2 +m2 )σ 2 (k1 2 2 1 2

As done before (see 4.3) and (4.4), it is convenient to describe this (relative) response range ~k, σ~2 ) and its variation ∆G(m, ~k, σ~2 ): ¯ using its middle response G(m, + − ~ ~2 ~ ~2 ~k, σ~2 ) = G (m, k, σ ) + G (m, k, σ ) ¯ G(m, 2 + ~ 2 ~ G (m, k, σ ) − G− (m, ~k, σ~2 ) ∆G(m, ~k, σ~2 ) = 2

The sensitivity of the middle response to frequency may be quantified by its derivative (in analogy to the derivation of (4.12)) 2 +m2 )σ 2 −(k2 +m2 )σ 2 2 2 ~k, σ~2 ) ¯ (k1 ∂ G(m, 1 2 2 m(σ2 − σ1 )sh(2mα2 ) − 2α2 ch(2mα2 ) 2 = e− ch [m(α1 + α2 )] ∂m sh2 (2mα2 )

+

e−

2 +m2 )σ 2 −(k2 +m2 )σ 2 (k1 1 2 2 2

sh(2mα2 )

(α1 + α2 )sh [m(α1 + α2 )] ,

where αi = ki σi2 . The indiscriminability measure quantifying the undistinguishable difference in frequency is: ∆m ˜ = =

∆G(m, ~k, σ~2 ) ~k,σ~2 ) ¯ ∂ G(m, ∂m [m(σ22



σ12 )

(4.21)

ch [m(α1 − α2 )] . − 2α2 coth(2mα2 )] ch [m(α1 + α2 )] + (α1 + α2 )sh [m(α1 + α2 )]

As before (see explanation for (4.13)) ∆m = min{∆m, ˜ fmax − fmin } 1

We denote ~k = (k1 , k2 ) and σ~2 = (σ12 , σ22 )

20

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

4.4.2

Some useful approximations

R We would like to to minimize cumulative indistinguishability (4.14) - ∆m(m, ~k, σ~2 )dm over all filter parameters k1 , k2 , σ12 , σ22 . While we can look for this minimum, numerically, by a dense search over the 4 dimensional parameters space, we prefer to find also some shortcuts, which will be especially useful for the design of larger filter banks. Reducing the search to two parameters - We first observed that specifying ki (i = 1, 2), the cumulative indistinguishability is minimized by choosing the parameters σi2 (i = 1, 2) as the highest satisfying (4.15). This observation is consistent with the result obtained for single amplitude Sine signal analysis and with intuition: Filters with bigger σ 2 have smaller variability. Unfortunately, we were not able to formally prove this claim, but it is verified by our numerical tests. R Using this observation we may plot the cumulative indistinguishability ∆m(m, ~k, σ~2 )dm as a function of only two parameters k1 and k2 . See Figure 10(a). For the chosen frequency range m ∈ [.2, 3/14] the optimal parameters are k1 = 1.61 and k2 = 1.29 (or vise versa due to symmetry). Reducing the search to one parameter - Note that k1 gets the same value obtained for a single optimal filter (see (4.17)). This is somewhat expected due to the following argument. For large α (say, larger than 5, which is common for most considered filters), ch [m(α2 + α1 )] ≈ sh [m(α2 + α1 )] and coth(2mα2 ) ≈ 1 leading to the approximation 1 ch [m(α1 − α2 )] · |m(σ22 − σ12 ) + (α1 − α2 )| ch [m(α1 + α2 )] ∆m = min{∆m, ˜ fmax − fmin } ∆m ˜ ≈

(4.22)

The term ch [m(α1 + α2 )] have a dominant influence on the size of ∆m. ˜ To minimize ∆m, ˜ it should be larger which happens when its argument, m(α1 + α2 ) is maximal. Both α1 and α2 are bounded with (4.17) because of (4.15). They cannot get the same value because then, they will have equal k and σ 2 , making ∆m large due to the first part of the expression. Therefore only one of α1 , α2 can get maximal value. Letting α2 = αmax and setting k2 , σ22 according to (4.17), we are left with one parameter, α1 . Performing a one parameter optimization over α1 which can only take values smaller than α2 while hk1 ,σ12 satisfies (4.15), we varied k1 , and updated σ12 accordingly, looking for minimal value of integral (4.14). The results obtained using this faster, approximation based method (see figure 10(b)) are the same as obtained from the search on k1 , k2 (figure 10(a))and also similar to the results of full search on k1 , k2 , σ12 , σ22 , maximizing explicitly the number of separated signal pairs, described in the next section. 4.4.3

Three filters and more

For more than two filters the analysis seems too complex and we rely on numerical optimization. Essentially, the optimization looks for a set of filters such that for every frequency 21

3.01

2.51

k2

2.01

1.61

1.01

.51

global minimum .51

1.01

1.61

2.01

2.51

3.01

k1

(a) log(∫ffmax ∆ m(m,k,σ)dm)

∆ m optimization by α1 parameter

3

10

2

10

min

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

∆ ω (k1, k2, σ21, σ22)

1

10

0

.5

1

1.5

k*

2

2.5

3

3.5

k

(b) Figure 10: The total discriminability of filter pairs hk1 ,σ12 (x), hk2 ,σ22 (x). (a) varying both α1 and α2 , (b) having k2 σ22 = α2 = αmax , the discriminability is optimized along k1 σ12 = α1 parameter, where k1 ∈ [fmin , fmax ]. Note that k1 = k ∗ = 1.29 is the optimal value and it is equal to one of the optimal values from the first case there is at least two different filters with different significant responses. This optimization thus uses a modification of the conditions 4.18,4.19 and is now described. See Figure 11(c,d) for the resulting filters (satisfying this demand). Note that now, having more than two filters, not every filter has to cover the entire range. Therefore, the constraint (4.16) is somewhat modified: For any pair of signals g1 (x) = A1 sin(m1 x) and g2 (x) = A2 sin(m2 x), there are at least two filters hki1 ,σi2 , hki2 ,σi2 in the filter set satisfying the following two conditions: 1 2   1. hki ,σi2 ∗ gk (x) > Rmin k ∈ 1, 2 i ∈ i1 , i2 2. At least one of the filters (hki ,σi2 ;

i ∈ i1 , i2 ) satisfies

o o\n n rA2 ,m2 ,ki,σi2 (x), x ∈ R = ∅ rA1 ,m1 ,ki ,σi2 (x), x ∈ R

22

(4.23)

0.45

.5

0.4 1

0.35

m2

R(m)

0.3 1.5

0.25

0.2

2

0.15 2.5

0.1

0.05 3 0

.5 0

0.5

1

1.5

2

2.5

3

1

1.5

3.5

(a)

2.5

3

(b)

Three filters set

Five filters set

0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3

R(m)

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

2

m1

m

R(m)

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Discriminability of A1sin(m1x) from A2sin(m2x) using two optimal filters Two filters set 0.5

0

0.5

1

1.5

2

2.5

3

0

3.5

0

0.5

1

1.5

m

2

2.5

3

3.5

m

(c)

(d)

Figure 11: Single-harmonic non-constraint amplitude optimal filter-sets. (a) 2 filters, (b)the discriminability of a filters pair (c) 3 filters, (d) 5 filters. Note, that the signals associated with low frequencies are the most problematic for discrimination with two filters. Intuitively, one would add a third filter to cover these signals especially. The result of optimization is in full agreement with this intuition. The latter condition is (again) satisfied for hki1 ,σi2 and hki2 ,σi2 if the ratio condition 2

1

(

rA1 ,m1 ,ki1 ,σi2 (x) 1

rA1 ,m1 ,ki2 ,σi2 (x) 2

,x ∈ R

)

\

(

rA2 ,m2 ,ki1 ,σi2 (x) 1

rA2 ,m2 ,ki2 ,σi2 (x) 2

,x ∈ R

)

=∅

(4.24)

holds. These modified conditions allow to use filters with larger σ values, which distinguish more signals pairs (in agreement with (4.12) and (4.22)).

4.5

The response to multi-harmonic signal

P We now consider the response of a single filter to a sum of harmonic signals. i Ai sin(mi x)+ P j Aj cos(mj x). This response is a sum of responses to each harmonic and some additional non-linear terms. The variability of the nonlinear part is of the same order of magnitude as the harmonic response and cannot be neglected. Therefore the previous analysis cannot be directly extended to these signals; see Appendices B and C. We approximate the number of pairs of signals which are undistinguishable by Gabor filter using a variation on the indistinguishability ∆m derivation technique. This approximation shows that the filter sets obtained for single harmonic signals are reasonable for use with multi-harmonic signals. The 23

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

approximations in this section are much rougher then in previous sections. Therefore all the derivations for the multi-harmonic signals should be considered only as supporting evidences for suggested filter set and not as exact calculations. 4.5.1

Single filter response to dual-harmonic signal −x2

1 e 2σ2 +ikx , and a signal gA, Consider first a complex Gabor filter hk (x) = √2πσ ~m 2 ~ (x) = Asin(m1 x) + Bsin(m2 x) or gA, ~m ~ (x) = Asin(m1 x) + Bcos(m2 x). It can be shown (see Appendix B) that the filter response is bounded with:

Ae

2 −(k2 +m2 1 )σ 2

2 −(k2 +m2 2 )σ

2 ch(km1 σ 2 ) + Be ≥ rA, 2 (x) ≥ ~ m,k,σ ~

ch(km2 σ 2 ) (4.25)

2 −(k2 +m21 )σ2 −(k2 +m2 2 )σ 2 2 Ae 2 2 sh(km σ ) − Be sh(km σ ) 1 2 2

For large σ, sh(kmσ 2 ) ≃ ch(kmσ 2 ) ≃ ekmσ the inequality may be rewritten in even simpler form: −(k−m1 )2 σ2 −(k−m2 )2 σ 2 −(k−m2 )2 σ 2 −(k−m1 )2 σ 2 2 2 2 2 + Be ≥ 2rA, − Be (4.26) Ae 2 (x) ≥ Ae ~ m,k,σ ~

and without loss of generality we can assume that for some A, B, m1 , m2 Ae

−(k−m1 )2 σ 2 2

≥ Be

−(k−m2 )2 σ 2 2

(4.27)

and refer to m1 as the signal’s primary frequency. Therefore r A, 2 = ~ m,k,σ ~

A e 2

−(k−m1 )2 σ 2 2

∆rA, 2 = Be ~ m,k,σ ~

−(k−m2 )2 σ 2 2

(4.28)

Note, that the variability is now proportional to the highest response for the single sine case and depends on other variables than the median response magnitude.. Note that according to our approximation, (B, m2 ) specifies only the responses variability, therefore changing them does not make the signal distinguishable in all cases. For example, (A, m1 , B, m2 ) is not distinguishable from (A, m1 , B ′ , m′2 ) for any B ′ , m′2 . Following the derivations above (section 4.4, 4.3.3), the interval of indistinguishable signals with primary frequency m′1 6= m1 is ∆rA,k, ~ m,σ ~ 2 = ∆m ∼ = ∂r ~ m,σ ~ 2 A,k, ∂m1

Be A e 2

−(k−m2 )2 σ 2 2

−(k−m1 )2 σ 2 2

The condition (4.27) is equivalent to

Be

−(k−m2 )2 σ 2 2

Ae

−(k−m1 )2 σ 2 2

24

≤ 1,

|k − m1 | σ 2

.

2 . |k − m1 |σ 2

∆m ≤

(4.29)

Note that this expression relies on the approximation of the response by the approximated bounds center value and variability. Still the expression provide useful intuition: the bound on ∆m decreases when m1 is distant from k, and substantially increases when it becomes closer to k. Therefore, given a pair of signals where both frequencies m1 of both signals are close to filter’s frequency k, it is less likely that the filter would separate these signals (although it still may happen for some values of A, B, m2 ). Optimal Filter Set on Linear Frequency Domain

Optimal Filter Set on Linear Frequency Domain 0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3

i

i

i

RA,m,k , σ2

i

RA,m,k , σ2

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0

0.5

1

1.5

2

2.5

3

0 −1 10

3.5

m

(a)

0

10 m

1

10

(b) Optimal Filter Set on Logarithmic Frequency Domain

Optimal Filter Set on Logarithmic Frequency Domain 0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3 i

RA,m,k ,σ2

0.3 i

i

i

RA,m,k ,σ2

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

implying that the indiscriminability measure is roughly bounded with:

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0

0.5

1

1.5

2

2.5

3

0 −1 10

3.5

m

(c)

0

10 m

1

10

(d)

Figure 12: The optimal three filter sets as received by numerical optimization on linear(a) and logarithmic(c) frequency domains. It is common to consider real signals to be logographically spread on frequency domain. (b) and (d) shows (a) and (c) on log frequency space. Although (a) and (c) are very similar, it is very important for multi-harmonic signals, that in (c) for each filter i there exist another filter, responding low (but above Rmin ) at frequency ki . Therefore, there should be another filter for the separation of such signals. That is, for each filter hki ,σi2 we prefer to have filter hkj ,σj2 , such that the latter filter response to frequencies near ki would be not too low (specifically above Rmin ) and, in the same time, not too high (where ∆m is smaller). We found that the filter set derived for single harmonic case (on log frequency domain; see next section) satisfies the latter demand.

25

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

4.5.2

Multi-harmonic signals

It is shown in Appendix C that the filter design considerations in the case of multi-harmonic signals are similar to those of the dual-harmonic case. The filters are likely to have difficulties with recognition of signals, some of which frequencies are close to the central frequency of the filter, as in the case of dual harmonic signals. Therefore, according to the reasons described for dual harmonic signals, the filters derived for the single harmonic signals are suitable for all harmonic signals. This conclusion provides some justification to apply the derived single harmonic filters to the task of segmentation of general textured signals.

26

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

5

Building an optimal filters-set

In previous section we derived expressions for parameters of optimal single filter for discriminating harmonic signals with equal amplitudes. We also shown a semi analytical way to find optimal filters pair for segmentation of harmonic signals with unconstrained amplitudes. In this section we describe the numerical search we performed on the filters’ parameters to support the theoretical conclusions. Making the full search for many filters is computationally expensive. We show how to extend the semi-analytical method (developed for finding two filters set) so that multi-filter set can be found with relatively small complexity. In order to find the optimal filter-sets we performed a numerical search. For combination of filters parameters ~k, σ~2 taken on some hierarchical grid, the value of a discrete version of Q(~k, σ~2 ) was calculated. In the case of single filter segregating harmonic signals with preset amplitudes we calculated Rk,σ2 using (4.3,4.4) and then inserted it into (4.8). In the case of several (2,3,4,5) filters for segmentation of signals with unconstrained amplitudes, (4.20) was used to for evaluation of Rk,σ2 . Note, that the use of (4.20) allowed to evaluate s(mi , mj ) without considering actual amplitudes (see claim 1.1). The frequencies of signal pairs were taken from a 2D uniformly spaced grid in either a linear: fmax − fmin lin lin filin = fmin + i · fstep , where fstep = N or a logarithmic: s i  fmax log log filog = fmin · fstep , where fstep = N fmin frequency domains. The discrete versions of Q(~k, σ~2 ) are defined as: P lin lin i,j∈[0...N ] s(fi , fj ) lin Q = N2 log

Q

=

log log i,j∈[0...N ] s(fi , fj ) N2

P

(5.1)

(5.2)

Note that the analytically calculated filters correspond to the linear domain. Optimizing over the logarithmic domain is advantageous because the number of separated frequencies is preserved with the scaling process, which presents in every vision system. Therefore, the separation power of the filter set doesn’t depend on the distance from the camera to the object, which is an important property for a filter set. There are also psychophysical evidences that logarithmic domain is consistent with human perception. The receptive-field profiles of simple cells in the striate cortex have an octave-like order of frequencies, which is a way to cover uniformly the logarithmic domain. The results for the logarithmic domain were somewhat different. For each type of frequency domains we tested two [fmin , fmax ] bounds - [0.1, 2.82] and [0.1, 3.14]. 2 2

The considered fmax values are associated with the Nyquist frequency specified by the sampling of the image, and fmin is to be the frequency associated with a wavelength equal to the maximal detectable texture patch size, which we have set to 20π

27

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

5.1

The technique of the numerical search

We searched for optimal filters’ parameters for filter sets of one filter for preset amplitude signals and 2 to 5 filters for unconstrained amplitudes signals. Since each signal is described by its frequency m (2 parameters in each search) and each filter is described by two parameters k and σ 2 (2,4,6,8 or 10 parameters), the complexity of search for the optimal filters set varied between O(n4 ) to O(n12 ), where n is the number of possible values tested for each of the parameters. This may be computationally hard. Therefore we used a hierarchical, grid based, search. We started with a widely spaced grid of k and σ 2 parameters (10 values of each) and examined every combination of them against a rough frequencies m grid (305 × 305 values). For the filters’ parameters with best Q measure we defined a finer grid around each parameter. We took the optimal value and its two neighbors from the coarse grid and added two equally spaced values between them (see figure 13). We repeated the search on the received finer grids and repeated the grid redefinition around the optimal values. We repeated this operation until the resolutions of filter parameter grids where of the same order as the resolution of the frequencies grid.  fine gridX   HH XXX    ? z y  9  j i XX H y y i yXX X :   XX  6 grid X coarse 

Figure 13: Converting coarse grid to fine In the case when several different sets of filter parameters give the same Q value, we increased the resolution of signals grid and repeated the calculations for these sets, trying to find the best one.

5.2

Signals with equal amplitudes

In section 4.2 we derived theoretically a single optimal filter for segregation of harmonic signals with equal amplitudes (4.17). This derivation was numerically confirmed with a search over k parameter in [fmin , fmax ] domains and over σ 2 parameter in [1,100] range. Each signal pair was examined vs. (4.9) test and the k, σ 2 parameters which maximized Q(k, σ 2 ) were chosen as the optimal filter. We considered different values of Rmin parameter between 5% and 20% of Rmax . We found that while the optimal ki was not sensitive to Rmin , the σi2 was. For small Rmin values the optimized parameters are very similar to the theoretical ones; see (4.17). For bigger values of Rmin it is preferred to decrease the filter’s coverage and to give up the discrimination of some signal pairs in order to increase the segregation rate for the remaining ones.

5.3

Extension of the semi-analytical method for filter sets of three and more filters for signals with unconstrained amplitudes

In section 4.3 we proposed a semi-analytic method for receiving the approximately optimal filter parameters for segregation of single harmonic signals with unconstrained amplitudes. 28

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

(a)

(b)

Figure 14: The optimal single filter’s band for segmentation signals between minimal and maximal frequencies for different Rmin . There are vertical lines on both ends of each plot corresponding to the fmin and fmax frequencies. The curves in the center of each plot correspond to the optimal values of k for different Rmin values. The curves on the left and right sides of each plot correspond to the band bounds for the optimal filter width σ and the optimal k. Note, that the left side curves are overlapping the fmin lines. (a) the filters received from search on linearly spaced frequency domain. (b) the filters received from search on log frequency domain. The semi-analytic method is based on a search for the maximum of Q by the minimization of the integral 4.14 which relies on the approximate expression for ∆m (4.22). Numerical optimization of the integral measure gave us the filter set parameters for two filters. The latter method may be extended for bigger filter sets using the following heuristic consideration: it is necessary and sufficient to have two different filters responding above Rmin for every frequency in [fmin , fmax ] for segregating harmonic signals with unconstrained amplitudes. On the other hand we prefer filters with σ 2 as large as possible (which results with small bandwidth) in order to decrease the variability of its responses. Therefore, in the optimal filter sets there will be exactly two filters responding above Rmin for each frequency. Since the filter’s band is symmetric around its central frequency ki and its width is proportional to σi , it is enough to find the optimal frequencies of ”filter switch”. For Nk filters there are Nk − 2 such points, due to the demand of two filters for each frequency. For every choice of Nk , f1 = fmin and fNk = fmax . Suppose now that the switch frequencies f2 , ..., fNk −1 are known. Then, the n filters set (n > 2) is build as follows (see also figure 15): • Construct one filter which covers the range [f1 , f2 ] and another filter which covers the range [f1 , f3 ]. • For i = 2 to Nk − 2 construct a filter which covers the range [fl , fl+2 ]. • Construct one filter which covers the range [fNk −1 , fNk ].          r r r r qqq r r r r-m

f1

f3 f2 f4

fNk −3fNk −1 fNk −2 fNk

Figure 15: Constructing Nk filters set 29

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Between each two filter switch points there are exactly two filters, and their total inefficiency (or non quality), which is integrated indistinguishability for each frequency in this range may be calculated by Z fl+1 ∆m(m, ki , kj , σi2 , σj2 )dm NQl ≃ fl

where l is the index of the range, fl is filter change frequency, ∆m is defined by (4.22) and hki ,σi2 , hkj ,σj2 are the filters responding above Rmin for frequencies fl ≤ m ≤ fl+1 . The total inefficiency of filter set is given by NQ =

N k −1 X

NQl

l=−1

where Nk − 1 is the number of ranges specified by Nk − 2 filter change points in [fmin , fmax ]. Note, that the signals associated with distant frequencies are always distinguished because they are covered by different filters and therefore their responses to these filters are disjoint. The optimal filter set is obtained by minimizing NQ. For example finding optimal three filters set optimization is as follows: 1. Choose f2 ∈ [fmin , fmax ]. 2. Calculate the parameters of the three filters, such that r¯1,k1 ,fmin ,σ12 (x) = r¯1,k1 ,f2 ,σ12 (x) = Rmin r¯1,k2 ,fmin ,σ22 (x) = r¯1,k2 ,fmax ,σ22 (x) = Rmin r¯1,k3 ,f2 ,σ32 (x) = r¯1,k3 ,fmax ,σ32 (x) = Rmin 3. Calculate NQ = NQ1 + NQ2 . 4. Run 1-3 on all possible f2 values and select the one which minimizes NQ. The complexity of the obtained numerical optimization is O(nNk −1 ) where n is the number of tested frequencies. For Nk > 5 a dynamic programming algorithm of optimization would be faster than the proposed one. However, filter sets of 6 and more filters are impractical and because of this we didn’t consider such algorithm. The results of this algorithm for 3 and 4 filters are shown at figures 16(e), 16(g).

5.4

Numerical optimization of signals with unconstrained amplitudes

For the numerical optimization each signal pair was examined vs. (4.9) test and the ~k, σ~2 parameters which maximized Q (4.10) were chosen as the optimal filters. The full numerical search on 2Nk parameters of the Nk filters obtained values which were very similar to those obtained by semi-analytically considerations. 30

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3

RA,m,k,σ2

RA,m,k,σ2

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0 0

0.5

1

1.5

2

2.5

3

3.5

0

0.5

1

1.5

(a) 0.5

0.5

0.45

0.4

0.4

0.35

0.35

3.5

3

3.5

3

3.5

3

3.5

RA,m,k,σ2

0.3

RA,m,k,σ2

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0.5

1

1.5

2

2.5

3

0

3.5

0

0.5

1

1.5

m

(c)

2.5

(d) Numerically optimal three filters set

0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3

RA,m,k,σ2

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

2

m

Optimal theoretical Three filters set

RA,m,k,σ2

3

Numerically optimal two filters set

0.45

0

0.5

1

1.5

2

2.5

3

0

3.5

0

0.5

1

1.5

2

2.5

m

m

(e)

(f)

Four theoretically optimal filters

Numerically optimal four filters set

0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3

RA,m,k,σ2

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

2.5

(b)

Optimal theoretical two filters set

0

2

m

m

RA,m,k,σ2

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Numerically optimal single filter

Theoretically optimal single filter 0.5

0

0.5

1

1.5

2

2.5

3

0

3.5

m

0

0.5

1

1.5

2

2.5

m

(g)

(h)

Figure 16: Theoretical (left) vs. numerically optimized (right) filter sets

31

Commonly used octave filters

Commonly used octave filters

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

RA,m,k,σ2

0.7

0

0

0.5

1

1.5

2

2.5

3

0 −1 10

3.5

(a)

1

10

(b) 1.5 octave filters

1.5 octave filters 0.5

0.5

0.4

RA,m,k,σ2

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

10

log(m)

m

RA,m,k,σ2

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

We searched for optimal filter sets on signal domains which were both linearly and logarithmically spaced. For both cases the optimal filter sets are very similar. We repeated all the searches on two different [fmin ; fmax ] frequency ranges. The filters’ coverage of the given domains was similar. The results for linearly spaced domain are shown at figure 16(b), 16(d), 16(f) and 16(h) against their semi-analytical estimation. We considered different values of Rmin parameter between 5% and 20% from Rmax . For small Rmin values the ki parameters tend to be more distant one from another and σi2 values are larger. Again, for bigger Rmin values it is preferred to decrease the filter’s coverage and to give up some signal pairs in order to increase the segregation rate in the remaining ones.

0

0.5

1

1.5

2

2.5

0 −1 10

3

m

0

10

log(m)

(c)

(d)

Figure 17: Commonly used filter sets. (a) The most popular octave set. (b) The same set on log frequency domain. Note that the variability for lower frequencies is large. Each filter responds strongly for all the frequencies smaller than its central frequency. (c) More theoretically justified 1.5 octave set and (d) the same set on log domain.

5.5

Comparison with commonly used filter sets

As discussed in the introduction, most texture filters, based on Gabor filters are octave filters set [19], [7], [11]. In addition to their empirical justification [7], these filters are computationally efficient [11]. An alternative approach, suggesting to use 1.5 octave filter, was suggested and justified for signals representation [16]. To build N-octave filter set one starts by choosing the fmin frequency. Then, the central N frequency of the first filter is specified by k1 = 2 2 · fmin . The central frequencies of the 32

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Octave Filters vs. Proposed Filters 0.1

0.1

0.05

0.05

0

0

−0.05

−0.05

−0.1 −10

−5

0

5

−0.1 −10

10

0.2

0.2

0.1

0.1

0

0

−0.1

−0.1

−0.2 −10

−5

0

5

−0.2 −10

10

0.4

0.1

0.2

0.05

0

0

−0.2

−0.05

−0.4 −10

−5

0

5

−0.1 −10

10

−5

0

5

10

−5

0

5

10

−5

0

5

10

Proposed Filters

Octave Filters

Figure 18: Commonly used octave filter set of three filters compared to the proposed filter set of the same size (3 filters) covering the same frequency range. following filters are calculated by ki+1 = 2N · ki . The filter’s band associated with the iN N th filter is [2− 2 · ki ; 2 2 · ki ]. This way, the ratio between the maximal and the minimal frequencies covered by one filter is 2N . The filter response at this cutoff frequencies is 1e of the response to the central frequency. As one can see from figures 17,18 such commonly used filter sets are very different from the sets we propose. Note that the common filters are equally spaced on log frequency space. Note also that for many frequencies, the response of the common filters is associated with high variability. In particular, it cannot be claimed for example that the response to low frequency signals is high only for the first filter. The 1.5 octave filters have smaller variability, and as we shall see later, they indeed perform better in texture discrimination. The proposed filter respond more weakly to most frequencies, but the variability in responses is also much smaller.

33

Extending the filters to 2D

In principle, we could extend our approach to two (or higher) dimensions by trying to find optimal filters for maximally separating signals from some standard set (e.g. 2D harmonic functions). We shall take the easier (and common [6]) approach: The basic 2D filter is constructed as a product between the proposed 1D filter (along the x axis) and a Gaussian along the y axis. The width of the Gaussian is specified as the width σ of the proposed 1D filter. Similarly to other methods, we created the 2D kernels in 6 directions by rotating the 2D filter by π6 . The rotated x axis is denoted the filter’s direction. See Figure 19 for a 2D illustration of a typical 2D filter set (corresponding to the 1D filters described in Figure 16(f).

Figure 19: 2D filters corresponding to 1D filters on figure 16(e-f).

Variavility of Responses for Different Angles 0.055 Traditional Filter Proposed Filter 0.05

0.045

0.04

0.035

R1,.1,k,σ2(θ)

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

6

0.03

0.025

0.02

0.015

0.01

0.005

20

40

60

80

100

θ(°)

120

140

160

180

Figure 20: Response variability as a function of 2D filter’s direction applied to a sine signal. Note that for octave filters the magnitude remains of the same order for all directions, while for the proposed filters the responses differ up to 10 times. Note that the response of these filters to a harmonic signal may be more variable than their 1D counterpart. When the signal’s variation is not exactly in the direction of the filter, the filter’s response is some combination of of the response of the proposed filter (which is relatively non-variable) and a Gaussian (low-pass) filter (which varies a lot). For the common (e.g. octave) filters this variability is comparable to their normal 1D variability and therefore doesn’t make a large difference. For the proposed filters, on the other hand, this variability matters and decreases performance. See figure 20 for example of variability values as function of filter’s direction to signals direction. 34

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

6.1

Response normalization

To reduce the variability associated with the low-pass directions, we examine the response of each filter in several nearby image points along the direction of the filter. The set of responses associated with the j-th filter in (xi , yi ) is denoted {Rj }i . µRj and σRj are the i i mean and standard deviation of this set. Specifically, 2⌊σ⌋ + 1 points were sampled in unit spacings along the line segment coinciding with the direction of the filter. A low variance of these responses indicates that the texture direction is close to the filter direction. A high variance indicates, on the other hand, that the texture direction is different and is not expected to yield a informative response, with low spatial variability. The detected variance is used to create two (related) distance measures associated with some filter (denoted the j−th filter): 1. Given a set of filter responses {Rj }1 ( µRj , σRj ) along filter’s direction in some location 1

1

(x1 , y1) and another response R2j in some other location (x2 , y2 ), the distance between the responses is µ j − R2j R d8j ({Rj }1 , R2j ) = 1 (6.1) σRj 1

2. Given two sets of filter responses {Rj }1 , {Rj }2 (to the same filter) along the filter’s direction, corresponding to two locations. Then, the distance between the responses sets is specified as µRj − µRj 88 j j 1 2 (6.2) dj ({R }1 , {R }2 ) = q σR2 j + σR2 j 1

2

For a jet of J = 6·N filters (including N filters rotated in 6 directions), the corresponding distances between the the responses in two locations are 8

D ({R}1 , R2 ) =

J X

d8j ({R}1 , R2 )

j=1

for measure (6.1) and 88

D ({R}1 , {R}2 ) =

J X j=1

d88j ({R}1 , {R}2)

for measure (6.2) Normalizing by the variance, these distance measures are in a sense, similar to Mahalanobis distance using diagonal covariance matrix estimated locally. In the experimental part we used the asymmetric measure (6.1) because in the clustering procedure we try to fit the feature vectors to some known centers. The symmetric measure is used in graph partitioning methods where each two nodes of the graph are of the same importance. 35

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

7 7.1

Experiments Testing texture descriptors

Retrieval test - This test, built around the image retrieval tasks. A (typically large) set of images containing various textures is available. In a query, a texture patch is given, and the images with a similar texture are sought. A query is successful if an image with the (subjectively judge) same texture is found to be one of the k most similar (according to the descriptor) images [24], or if all the images associated with this texture in the database are found [29]. It is sufficient to find the image and no specification where the query texture is present in the image is required. Segmentation test - In this test, an image with several textured regions present in it, is given, and the algorithm needs to correctly specify these regions [19]. The algorithm may involve texture descriptor as well as spatial considerations and is essentially a clustering procedure. The texture descriptor ability to provide an effective distance measure is tested. Simpler segmentations tests involve semi synthetic images composed of tiles where every tile contains some (synthetic or real) texture. Harder tests involve real images, containing textured objects, where the texture is influenced by say 3D structure, or object pose. Focusing on the issue of descriptor stability under translation, which is important mostly for segmentation, we tested the proposed descriptors using the segmentation test. Textures changes associated with 3D structure and pose correspond to scaling and shear and are beyond the scope of this work. Moreover, spatial considerations such as region size, and preference to texture continuity were not used as well. This is done so that these considerations, or a segmentation algorithm relying on them, would not mask the differences associated with the usage of different filter sets. Our experiments basically test the texture distance between different textures (and its stability) compared with a similar distance within the same texture. We tested the discrimination power of the proposed filter-sets as well as several other, commonly used, filter-sets, using both the common L2 metric and the two new distance measures proposed above. The experiments were done with synthetic sine images (biased for our filters) and with the popular Brodatz textures. Our results indicate that the proposed filter sets perform significantly better than the commonly used methods. In aspect of retrieval test, we compare the proposed filters versus common octave filters set using the texture recognition algorithm by Varma and Zisserman [28]. The algorithm is implemented exactly as described by the authors and tested with the original and the proposed filter sets on the original database. The results for the algorithm parameters reported in the article are similar for both filter sets. However the proposed filters obtain very close results using much more compact texture representation (smaller runtime), unlike the octave filters.

36

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

(a)

(b)

(c)

Figure 21: Synthetic sines image (a) and its 16-means segmentation using common (b) and the newly proposed (c) features segmented with L2 distance measure.

7.2

Synthetic sine images

To demonstrate the stability available with the use of the proposed filter, we started with a test image composed of sine textures; see Fig. 21(a). (This example is clearly biased towards favoring our method, constructed with Sine signals as a model, but other example will follow below). This image is segmented using 16-means algorithm using the simplest distance measure: L2 distance metric between the feature vectors. The results corresponding to a traditional one octave filters and the proposed filters (over the same frequency range), are shown in Fig. 21(b)) and Fig. 21(c)), respectively. The high intra-texture variability associated with the traditional filters is clear.

7.3

Brodatz images

We now turned to the standard Brodatz textures and considered the commonly used set of 16 texture patch described in Fig. 22. We tested these textures in a pairwise discrimination task as well as in one-against all segmentation tests.

Figure 22: Patch of 16 Brodatz images we used in experiments

7.3.1

Clustering texture pairs

Here we considered the 120 different pairs of 128 × 128 texture patches; see Fig. 23 for an an example of such a pair. A vector of filter responses was calculated for every point. 37

Texture patches classification success

Texture patches classification success

120

120

100

100

Number of texture pairs

Number of texture pairs

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Figure 23: Example pair of patches of Brodatz textures

80

60

40

20

0 0.55

our [0.1; 2.9] our [0.17; 2.0] 1.5 octave [0.1; 2.9] 1 octave 4 filters [0.17; 2.9] 1 octave [0.17; 2.0] our 2 filters [0.1;2.82] 1.5 octave 2 filters 1 octave 2 filters 0.6

0.65

0.7

0.75

80

60

40

20

0.8

0.85

0.9

0.95

0 0.55

1

Percent of successfully classified pixels

our[0.1; 2.9] our[.17; 2.0] 1.5 octave[0.1; 2.9] 1 octave [0.17; 2.0] our[.17; 2.0] new metric 1 octave[.17; 2.0] new metric 0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Percent of successfully classified pixels

(a)

(b)

Figure 24: Segmentation success rates for Brodatz patches. (a) shows the results with L2 distances for different filter sets. (b) shows the results for the new distance measure compared to L2 for best filter sets. The vectors were clustered using the K-means (K = 2) algorithm. After convergence, we labelled each cluster as either left of right depending on the origin of the pixels which are the majority in this cluster. The success rate is the fraction of pixel classified into the correct cluster. Note that, according to this definition, the worst segmentation quality is 50%. We performed such tests for several filter-sets developed according to our design, for the popular 1-octave filter-sets [7, 19] and for the 1.5- octave filter-set, claimed to be more mathematically justified in [16]. For the comparison, the proposed filter sets were constructed for the same frequency ranges associated with the traditional filters and contained the same number of filters. The tests were performed for both common L2 distance metric and for the asymmetric adaptive distance measure (6.1) on all the filter-sets. The results, described in Figure 24, show the quality of clustering associated with the various representations. Specifically, for every level of clustering quality, measured by the success rate, it shows the number of texture pairs which are clustered better than this level. Ideally all pairs should be clustered perfectly so that the clusters are pure and the success rate is 1. This never happens but the clusters obtained with the proposed filters approximate this ideal behavior much better than the clusters constructed with 1-octave filters or 1.5-octave filters. The performance difference is large for the L2 distance and is even larger for the proposed new distance. Note that the performance of common filter-sets remained the same with both 38

100

Number of Texture Pairs

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Texture Patches Classification Success 120

80

60

40

20 1 octave 3 filters meanshift L2 [0.1; 2.0] proposed 3 filters meanshift adaptive measure [.17; 2.0] proposed 3 filters meanshift adaptive measure [.1; 3.14] 0 0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Percent of Successfully Classified Pixels

Figure 25: Segmentation success rates for Brodatz patches with meanshift clustering for three filter sets. distances, while for the proposed filters, the adaptive distance is preferred. Note, for example, that 106 pairs out of the 120 (89%) were clustered better than 80% with the proposed filter set and the adaptive distance. On the other hand, such accuracy was achieved by only 49 of 120 pairs (41%) using the 1-octave filters (both filter sets were developed over the same range and include the same number of filters (3)). These results were obtained with k−means procedure initialized by the average of the texture vectors (one patch for each cluster). This is justified because we do not proposed here the k-means as a clustering algorithm but just test the clustering properties of the various representation. Similar results were obtained with randomly initialized seeds. Figure 25 shows results for similar experiment where meanshift clustering was used instead of K-Means. The filters responses were clustered by meanshift to several groups and these groups were attached to one of the two patches according to majority of its pixels. The success rate is the ratio of the sum of the correctly clustered pixels to the total image size. This experiment also shows the advantage of the proposed method on the traditional one. Another important issue is that the meanshift finds much less (usually half) clusters while using the proposed filters and adaptive distance measure. This implies that the proposed method provides more compact texture representation using textons methods. 7.3.2

Primitive segmentation quality

In this experiment we tested how well the distance between two texture vectors serves as a classifier telling whether they belong to the same texture. In this experiment we considered L2 and the proposed symmetric metric (6.2). To carry out this test, we considered the 4 × 4 patch image (figure 22), chose a sample of 4 × 4 pixels from each patch, calculated the distance from each of these pixels to all the others. This distance was compared to a threshold. The results was considered as positive (same texture) if the distance was below the threshold and as negative otherwise. The ROC graphs in figure 26 show the false negatives as a function of the false positives for different values of the threshold. A ROC curve is better if it is closer to the axes and the origin. It is apparent that the classifiers constructed using the proposed filter set (with the 39

our filters (0.19−2.0) new symmetric distance 1.0 octave L2 metric 1.5 octave L2 metric our filters (0.19−2.0) L2 metric 1.0 octave new symmetric distance

0.9

0.8

0.7

False Negatives

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

RoC 1

0.6

0.5

0.4

0.3

0.2

0.1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positives

Figure 26: ROC curve for Brodatz patches two distance measures) are better than those built with the common filters. 7.3.3

Stability test

In this test we choose a representative feature vector for each texture patch and attach each pixel in the image to the closest representative among those 16 vectors. This was done with the L2 norm and with the proposed asymetric adaptive distance (6.1). First, the representative feature vectors were the average feature vectors in each patch; see Figure 27 (a,b). For the proposed filter set, this choice yields much better segmentations of the image than that obtained with the common 1-octave filters. In the second case, which results are shown at figure 27(c-h), the representative feature vectors were randomly chosen from each patch. This method is not deterministic because for each run the representative vectors are different. The results show that the proposed method provides much more robust segmentation of the image into correct patch squares.

7.4

Texture recognition test

For this test we implemented the texture recognition algorithm proposed by Varma and Zisserman [28] and partially based on earlier work by Leung and Malik [17]. We compared the recognition results obtained for octave filters (the filters used in the original work) to the results obtained for the proposed filters while preserving all other components of the original work.

40

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 27: Stability test for Brodatz patches. (a-b) Supervised test: each pixel was attached to the closest average response from each patch. (a) the proposed filter set with new distance measure. (b) one octave filter set with L2 distance metric.(c-h) Unsupervised test: each pixel was attached to the closest random response from each patch. In the lower row (f-h) the standard octave filter set (3 frequencies) with L2 distance measure was used. In the upper row (c-e) the proposed filter set (3 filters, the same frequencies range) with new distance measure was used.

41

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Figure 28: Varma-Zisserman algorithm scheme. 7.4.1

Brief description of the algorithm

The scheme of the algorithm is presented in figure 28. The database, which is a subset of CUReT (Columbia-Utrecht Reflectance and Texture Database [5]), contains 20 classes of textures; 92 samples for each class. Each class is divided to 46 training and 46 test images. For each of the images Gabor features are calculated using the tested filter set. Some examples of the database images are shown in figure 29, and it is easy to see that the database is hard for recognition task even with the human eye (the samples of the same texture class have different scales and photographing angles). As one can see there are some examples of the same class which have a very different appearance, while some examples of different classes look similar. In the learning stage, the representative K feature vectors for each class are obtained by K-Means clustering of feature vectors of a subset of the training images. The 20K feature vectors obtained from all classes represent, together, all the textures. Each pixel of each training image is labelled with an index of the representative feature vector which is closest to it. The histogram of features distribution for each training texture sample is stored with the corresponding texture class label. In the recognition stage, every pixel of the test images is labelled with the index of the closest representative feature vector, and the distribution of the features is calculated. The same features as in the learning stage are used in the recognition stage. For each test image its features histogram is χ2 compared to the histograms stored during the learning stage. The class referred by the closest sample histogram is considered as the recognized class. The fraction of the correctly recognized images from 46 · 20 testing samples is considered as the success rate associated with the tested filter set. See [28] for more details.

42

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

(a)

(b) Figure 29: CUReT images examples. The upper image contain four sample images of the same class. The bottom image contain four samples from different classes. As one can see sometimes the images in the same class have a very different appearance, while images in different classes are very similar 7.4.2

Experiments setup

As first stage we repeated the recognition system exactly as it was proposed in the article [28] and tested it on the same database. The obtained results were similar to those reported by the authors, thus we concluded that our implementation is correct. We used a variation of the filter set denoted RFS in the original work, with the omission of the two isotropic filters. The best result reported by the authors for 20 classes recognition is 98.37%; we obtained 96.6% without isotropic filters. In the second stage we replaced the filter set with the filter sets proposed here (sets with three and five filters). We considered six texture recognition methods: 1. The original octave filters, 18 complex responses (36D feature space), L2 distance. 2. The original octave filters, absolute values of complex responses (18D feature space), L2 distance. 3. The proposed filter set with three filters, covering frequencies range equal to the range of the octave filters set (18D feature space), L2 distance. 4. The proposed filter set with three filters, covering frequencies range equal to the range of the octave filters set (18D feature space), adaptive distance measure. 5. The proposed filter set with five filters, covering frequencies range [.1, 3.14] (30D feature space), L2 distance.

43

0.95 0.95 0.9

Success rate (%)

Success rate (%)

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

1

0.85

0.8 Octave Complex L2 Octave L2 Our 3 L2 Our 3 AM Our 5 L2 Our 5 AM

0.75

0.7

1

2

3

4

5

6

7

8

9

0.9

0.85

Octave3 complex L2 (38D) Octave3 L2 (20D) MHD3 L2 (20D) MHD3 AM (20D) MHD5 L2 (32D) MHD5 AM (32D)

0.8

0.75

10

Number of features for each class

1

2

3

4

5

6

7

8

9

10

Number of features for each class

(a)

(b)

Figure 30: Recognition tests using Varma-Zisserman algorithm. (a) Recognition rate using 13 random training images for features extraction. (b) Recognition rate using all 46 training images for features extraction. 6. The proposed filter set with five filters, covering frequencies range [.1, 3.14] (30D feature space), adaptive distance measure. In this test we checked the dependence of the recognition rate on the number of representative feature vectors, K, taken for each texture class. Figure 30 (a) shows that for a small number of features the proposed filters provide better recognition rates than the octave filters, but after that, for larger number of representative texture prototypes, the set proposed in [28] (using octave set and complex responses) wins. 7.4.3

Modified representative features extraction

The image database used in these experiments contain images of the same texture taken with different illumination, angle and scale. Therefore the frequency domain signature of different examples of the same texture may change as a function of these three parameters. If the initial learning group doesn’t contain an example of some scale or angle of photographing, the feature set learned for this class will not contain all possible response values, and therefore the learned features will not be representative enough. In the case of octave filters this may be less problematic, since the responses of octave filters for many frequencies are overlapping. Note that the training in [28] requires all 46 images in each class, but uses only some of them for the feature extraction. In the case of the proposed filters this is a serious problem, because the response ranges are mostly disjoint. To overcome this problem we repeated the experiments using all the training images to find the most representative feature vectors for each class. Figure 30 (b) shows that in this case the proposed filters out-perform the octave ones for small number of features and provide the same performance for more features.

44

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

8

Conclusions

A novel method for constructing set of Gabor filters was proposed. Unlike the commonly used filter sets which concentrate on the optimal representation (in MSE sense) of the signals, the main consideration in constructing the proposed set is maximizing the number of distinguished signal pairs. The method is based on analytical and semi-analytical derivations of the optimal filter sets for separating harmonic signals. It was also analytically shown that the proposed filters are reasonable for signals with several harmonics. The proposed filter sets are extended to 2D and a special, based on analytical properties of the filters, Mahalanobis-like distance measure was proposed to compare the resulting features. It was shown in experiments with real-life texture samples that the proposed filters are superior than the commonly used ones in the task of image segmentation, even using such primitive methods as K-means. It was also shown that for a reasonable small set of textures, the proposed method provides a stable recognition of texture members, given representative of each texture. Note, that the proposed filters performed better using a common L2 metric for features comparing. Using a new adaptive distance measure improved the results for the proposed filters and didn’t change the results for the commonly used filters. The difference in filter set building considerations caused that the obtained filter sets are significantly different from the commonly used ones. The representation based filter sets achieve better reconstruction of the texture from the response coefficients in each location on the texture, but this may cause a very different responses in different locations. The response distribution comparing methods deal with this phenomenon, but they are computationally hard. The responses of the proposed filters to textured signals are less varying with translation than the responses to the commonly used filters. This allows to use a single feature vector to describe each texture. In context of segmentation of natural textured images this is a simpler and faster method. The proposed method fits the main goal of the work: the distance between feature vectors of pixels from the same texture is mostly smaller than the distance to pixels from other textures. However, since the goal was to describe each texture by a single feature vector, this method may work worse than the distribution based ones for textures with large variability of responses. The next step in this work seems to be an integration of the proposed filter sets into complex segmentation algorithms, mentioned in the beginning of the work. Another direction may be development of a new fast segmentation algorithm which would use the special properties of the proposed features, such as a good separation using a simple threshold criterion. Considering geometrical properties of compared pixels should increase even more the high distinguishability power of the proposed features. The segmentation tasks usually imply that inter texture distances are small, therefore the stability of responses in the same texture of the proposed features should increase the segmentation quality.

45

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

9

Acknowledgements

We would like to thank the creators of the CUReT database for enabling us to use it, and to Manik Varma, who let us use the processed version of their database as well. We would like to thank Ilan Shimshoni for providing us an implementation of meanshift clustering algorithm. We thank Israeli science foundation for financial support of this research.

46

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

A

A response to single harmonic signal −x2

1 e 2σ2 +ikx response to a signal gm,A (x), The absolute value of the Gabor filter hk,σ2 (x) = √2πσ 2 is q |hk,σ2 (x) ∗ gm,A (x)| = (Re(hk,σ2 (x) ∗ gm,A (x)))2 + (Im(hk,σ2 (x) ∗ gm,A (x)))2

For gm,A (x) = Asin(mx) the Real part response is: =√

Re(hk,σ2 (x) ∗ gm,A (x))

A

=



=



=

−Ae

1

−x2

e 2σ2 sin(kx) ∗ Asin(mx)

2πσZ2 ∞

2πσ 2 Z−∞ ∞ A

−ξ2

e 2σ2 sin(kξ)sin(m(x − ξ)) dξ −ξ2

e 2σ2 sin(kξ) {sin(mx)cos(mξ) − cos(mx)sin(mξ)} dξ

2πσ 2 −∞ Z Asin(mx) ∞ −ξ22 √ = e 2σ sin(kξ)cos(mξ) dξ 2πσ 2 −∞ Z Acos(mx) ∞ −ξ22 − √ e 2σ sin(kξ)sin(mξ) dξ 2πσ 2 −∞ Z Asin(mx) ∞ −ξ22 √ = e 2σ {sin((k + m)ξ) + sin((k − m)ξ)} dξ 8πσ 2 −∞ Z Acos(mx) ∞ −ξ22 e 2σ {cos((k − m)ξ) − cos((k + m)ξ)} dξ − √ 8πσ 2 −∞ Z Acos(mx) ∞ −ξ22 √ = e 2σ {cos((k + m)ξ) − cos((k − m)ξ)} dξ 8πσ 2 −∞   −(k+m)2 σ 2 −(k−m)2 σ 2 A 2 2 e −e cos(mx) =∗ 2 o A −(k2 +m2 )σ2 n −kmσ2 2 2 e e − ekmσ cos(mx) = 2 where * is due to

R∞ 0

2 x2

e−a

−(k2 +m2 )σ 2 2

cos(mx)dx =

sh(kmσ 2 )cos(mx)



2

π − m2 e 4a 2a

47

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

For gm,A (x) = Acos(mx) the Real part response is: =√

Re(hk,σ2 (x) ∗ gm,A (x))

1

−x2

2πσ 2 ∞

e 2σ2 sin(kx) ∗ Acos(mx)

π − m(x − ξ)) dξ 2 −∞ n o π π A ∞ −ξ22 2σ e sin(kξ) sin( )cos(m(x − ξ)) − cos( )sin(m(x − ξ)) dξ = √ 2 2 2πσ 2 −∞ ∞ 2 −ξ A e 2σ2 sin(kξ) {cos(m(x − ξ))} dξ = √ 2πσ 2 −∞ A ∞ −ξ22 e 2σ sin(kξ) {cos(mx)cos(mξ) − sin(mx)sin(mξ)} dξ = √ 2πσ 2 −∞ Acos(mx) ∞ −ξ22 √ = e 2σ sin(kξ)cos(mξ) dξ 2πσ 2 −∞ Asin(mx) ∞ −ξ22 − √ e 2σ sin(kξ)sin(mξ) dξ 2πσ 2 −∞ Acos(mx) ∞ −ξ22 √ = e 2σ {sin((k + m)ξ) + sin((k − m)ξ)} dξ 8πσ 2 −∞ Asin(mx) ∞ −ξ22 − √ e 2σ {cos((k − m)ξ) − cos((k + m)ξ)} dξ 8πσ 2 −∞ Asin(mx) ∞ −ξ22 √ = e 2σ {cos((k + m)ξ) − cos((k − m)ξ)} dξ 8πσ 2 −∞   −(k+m)2 σ 2 −(k−m)2 σ 2 A 2 2 e =∗ −e sin(mx) 2 o A −(k2 +m2 )σ2 n −kmσ2 2 2 e e − ekmσ sin(mx) = 2 where * is due to

R∞ 0

A

=



=

−Ae

2 x2

e−a

2πσ 2

−ξ2

e 2σ2 sin(kξ)sin(

−(k2 +m2 )σ 2 2

cos(mx)dx =

sh(kmσ 2 )sin(mx)



2

π − m2 e 4a 2a

48

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

For gm,A (x) = Asin(mx) the imaginary part response is: Im(hk,σ2 (x) ∗ gm,A (x)) = √ = = =

= = = = =

1

−x2

e 2σ2 cos(kx) ∗ Asin(mx) 2πσ 2 Z ∞ −ξ2 A √ e 2σ2 cos(kξ)sin(m(x − ξ)) dξ 2πσ 2 Z−∞ ∞ −ξ2 A √ e 2σ2 cos(kξ) {sin(mx)cos(mξ) − cos(mx)sin(mξ)} dξ 2πσ 2 −∞ Z Asin(mx) ∞ −ξ22 √ e 2σ cos(kξ)cos(mξ) dξ 2πσ 2 −∞ Z Acos(mx) ∞ −ξ22 − √ e 2σ cos(kξ)sin(mξ)) dξ 2πσ 2 −∞ Z Asin(mx) ∞ −ξ22 √ e 2σ {cos((k + m)ξ) + cos((k − m)ξ)} dξ 8πσ 2 −∞ Z Acos(mx) ∞ −ξ22 − √ e 2σ {sin((k − m)ξ) + sin((k + m)ξ)}cos(mx) dξ 8πσ 2 −∞ Z Asin(mx) ∞ −ξ22 √ e 2σ {cos((k + m)ξ) + cos((k − m)ξ)} dξ 8πσ 2 −∞   −(k−m)2 σ 2 −(k+m)2 σ 2 A 2 2 e +e sin(mx) 2 o A −(k2 +m2 )σ2 n kmσ2 −kmσ2 2 e sin(mx) e +e 2

= Ae

−(k2 +m2 )σ 2 2

ch(kmσ 2 )sin(mx)

49

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

For gm,A (x) = Acos(mx) the imaginary part response is: Im(hk,σ2 (x) ∗ gm,A (x)) = √ = = = =

=

= = =

1

−x2

e 2σ2 cos(kx) ∗ cos(mx) 2πσ 2 Z ∞ −ξ2 π A √ e 2σ2 cos(kξ)sin( − m(x − ξ)) dξ 2 2 2πσ Z−∞ o n ∞ −ξ2 π π A 2 2σ √ e cos(kξ) sin( )cos(m(x − ξ)) − cos( )sin(m(x − ξ)) dξ 2 2 2πσ 2 Z−∞ ∞ −ξ2 A √ e 2σ2 cos(kξ) {cos(mx)cos(mξ) − sin(mx)sin(mξ)} dξ 2πσ 2 −∞ Z Acos(mx) ∞ −ξ22 √ e 2σ cos(kξ)cos(mξ) dξ 2πσ 2 −∞ Z Asin(mx) ∞ −ξ22 − √ e 2σ cos(kξ)sin(mξ) dξ 2πσ 2 −∞ Z Acos(mx) ∞ −ξ22 √ e 2σ {cos((k + m)ξ) + cos((k − m)ξ)} dξ 8πσ 2 −∞ Z Asin(mx) ∞ −ξ22 − √ e 2σ {sin((k − m)ξ) + sin((k + m)ξ)} dξ 8πσ 2 −∞ Z Acos(mx) ∞ −ξ22 √ e 2σ {cos((k + m)ξ) + cos((k − m)ξ)} dξ 8πσ 2 −∞   −(k−m)2 σ 2 −(k+m)2 σ 2 A 2 2 e +e cos(mx) 2 o A −(k2 +m2 )σ2 n kmσ2 −kmσ2 2 cos(mx) e +e e 2

= Ae

−(k2 +m2 )σ 2 2

ch(kmσ 2 )cos(mx)

Thus, for gm,A (x) = Asin(mx) the absolute value of the response is: q = A e−(k2 +m2 )σ2 sh2 (kmσ 2 )cos2 (mx) + e−(k2 +m2 )σ2 ch2 (kmσ 2 )sin2 (mx) |hk,σ2 (x) ∗ gm,A (x)| −(k2 +m2 )σ 2 p 2 = Ae sh2 (kmσ 2 )cos2 (mx) + ch2 (kmσ 2 )sin2 (mx) r −(k2 +m2 )σ 2 1 1 2 = Ae (ch(2kmσ 2 ) − 1)cos2 (mx) + (ch(2kmσ 2 ) + 1)sin2 (mx) 2 2 r 2 2 2 −(k +m )σ 1 1 2 = Ae ch(2kmσ 2 ){cos2 (mx) + sin2 (mx)} + {sin2 (mx) − cos2 (mx)} 2 2 r 2 2 2 −(k +m )σ 1 1 2 = Ae ch(2kmσ 2 ) − cos(2mx) 2 2 −(k2 +m2 )σ 2 p 2 ch2 (kmσ 2 ) − cos2 (mx) = Ae 50

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

and for gm,A (x) = Acos(mx) the absolute value of the response is: q = A e−(k2 +m2 )σ2 sh2 (kmσ 2 )sin2 (mx) + e−(k2 +m2 )σ2 ch2 (kmσ 2 )cos2 (mx) |hk,σ2 (x) ∗ gm,A (x)| −(k2 +m2 )σ 2 p 2 = Ae sh2 (kmσ 2 )sin2 (mx) + ch2 (kmσ 2 )cos2 (mx) r −(k2 +m2 )σ 2 1 1 2 = Ae (ch(2kmσ 2 ) − 1)sin2 (mx) + (ch(2kmσ 2 ) + 1)cos2 (mx) 2 2 r 2 2 2 −(k +m )σ 1 1 2 = Ae ch(2kmσ 2 ){cos2 (mx) + sin2 (mx)} + {cos2 (mx) − sin2 (mx)} 2 2 r 2 2 2 −(k +m )σ 1 1 2 = Ae ch(2kmσ 2 ) + cos(2mx) 2 2 −(k2 +m2 )σ 2 p 2 ch2 (kmσ 2 ) − sin2 (mx) = Ae

B

A response to dual-harmonic signal 2

The absolute value of the Gabor filter hk,σ2 (x) = signal gm, ~ (x) = Asin(m1 x) + Bsin(m2 x), is ~ A

−x +ikx √ 1 2σ 2 e 2 2πσ

response to a dual harmonic

|hk,σ2 (x) ∗ gm, ~ (x)| = |hk,σ2 (x) ∗ Asin(m1 x) + hk,σ2 (x) ∗ Bsin(m2 x)| ~ A ( 2 2 2 (k2 +m2 (k2 +m2 2 )σ 1 )σ 2 − 2 − 2 2 sh(km1 σ )cos(m1 x) + Be sh(km2 σ )cos(m2 x) = Ae

2 ) 21  2 +m2 )σ 2 2 (k (k2 +m2 )σ 2 1 2 2 ch(km1 σ 2 )sin(m1 x) + Be− ch(km2 σ 2 )sin(m2 x) + Ae−

(  2  2 ! 2 2 (k2 +m2 (k2 +m2 1 )σ 1 )σ 2 2 − − 2 2 sh(km1 σ )cos(m1 x) + Ae ch(km1 σ )sin(m1 x) = Ae +





Be

+2Ae− −

2 (k2 +m2 2 )σ 2

2 (k2 +m2 1 )σ 2

2

sh(km2 σ )cos(m2 x)

2

sh(km1 σ 2 )cos(m1 x)Be−

2 (k2 +m2 1 )σ 2

2







+ Be

2 (k2 +m2 2 )σ 2 2 (k2 +m2 2 )σ 2

2 (k2 +m2 2 )σ 2

2

ch(km2 σ )sin(m2 x)

2 !

sh(km2 σ 2 )cos(m2 x) 2

 12

+2Ae ch(km1 σ )sin(m1 x)Be ch(km2 σ )sin(m2 x) n   2 2 2 2 2 2 = A2 e−(k +m1 )σ ch2 (km1 σ 2 ) − cos2 (m1 x) + B 2 e−(k +m2 )σ ch2 (km2 σ 2 ) − cos2 (m2 x) +2Ae−

2 (k2 +m2 1 )σ 2

2

Be−

2 (k2 +m2 2 )σ 2

·

2

sh(km1 σ )cos(m1 x)sh(km2 σ )cos(m2 x) + ch(km1 σ 2 )sin(m1 x)ch(km2 σ 2 )sin(m2 x)

51

 21

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

since ch(x) > 0 for all x-s,we can bound the received result with: n 2 2 2 2 2 2 max{|h ∗ g|} = A2 e−(k +m1 )σ ch2 (km1 σ 2 ) + B 2 e−(k +m2 )σ ch2 (km2 σ 2 ) −

2 (k2 +m2 1 )σ 2



2 (k2 +m2 2 )σ 2

2



2

+Ae Be ch(kσ (m1 + m2 )) + ch(kσ (m1 − m2 )) n 2 2 2 2 2 2 = A2 e−(k +m1 )σ ch2 (km1 σ 2 ) + B 2 e−(k +m2 )σ ch2 (km2 σ 2 )  21 (k2 +m2 )σ 2 (k2 +m2 )σ 2 1 2 2 2 Be− ch(km1 σ 2 )ch(km2 σ 2 ) +2Ae− = Ae−

2 (k2 +m2 1 )σ 2

ch(km1 σ 2 ) + Be−

2 (k2 +m2 2 )σ 2

1  2

ch(km2 σ 2 )

n 2 2 2 2 2 2 min{|h ∗ g|} = A2 e−(k +m1 )σ sh2 (km1 σ 2 ) + B 2 e−(k +m2 )σ sh2 (km2 σ 2 ) 1 2 2 (k2 +m2 (k2 +m2  2 2 )σ  1 )σ − 2 2 − 2 2 Be sh(km1 σ )sh(km2 σ ) −2Ae 2 − (k2 +m21 )σ2 2 (k2 +m2 2 )σ 2 − 2 2 2 2 sh (km1 σ ) − Be sh (km2 σ ) = Ae 2

The absolute value of the Gabor filter hk,σ2 (x) =

52

−x +ikx √ 1 2σ 2 e 2 2πσ

response to a dual har-

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

monic signal gm, ~ (x) = Asin(m1 x) + Bcos(m2 x), is ~ A |hk,σ2 (x) ∗ gm, ~ (x)| = |hk,σ2 (x) ∗ Asin(m1 x) + hk,σ2 (x) ∗ Bcos(m2 x)| ~ A ( 2 2 2 (k2 +m2 (k2 +m2 1 )σ 2 )σ − 2 − 2 2 2 = Ae sh(km1 σ )cos(m1 x) + Be sh(km2 σ )sin(m2 x)

2 ) 12  2 +m2 )σ 2 2 (k (k2 +m2 )σ 2 1 2 2 ch(km1 σ 2 )sin(m1 x) + Be− ch(km2 σ 2 )cos(m2 x) + Ae−

(  2  2 ! 2 2 (k2 +m2 (k2 +m2 1 )σ 1 )σ 2 2 − − 2 2 sh(km1 σ )cos(m1 x) + Ae ch(km1 σ )sin(m1 x) = Ae +





Be

2 (k2 +m2 1 )σ 2

+2Ae− −

+2Ae

2 (k2 +m2 2 )σ 2

2

sh(km2 σ )sin(m2 x)

2

sh(km1 σ 2 )cos(m1 x)Be−

2 (k2 +m2 1 )σ 2

2



2 !  2 (k2 +m2 2 )σ 2 − 2 ch(km2 σ )cos(m2 x) + Be

2 (k2 +m2 2 )σ 2

sh(km2 σ 2 )sin(m2 x)

2 (k2 +m2 2 )σ 2

2

 21

ch(km1 σ )sin(m1 x)Be ch(km2 σ )cos(m2 x) n   2 2 2 2 2 2 = A2 e−(k +m1 )σ ch2 (km1 σ 2 ) − cos2 (m1 x) + B 2 e−(k +m2 )σ ch2 (km2 σ 2 ) − sin2 (m2 x) +2Ae−

2 (k2 +m2 1 )σ 2

Be−

2 (k2 +m2 2 )σ 2

2

·

2

sh(km1 σ )cos(m1 x)sh(km2 σ )sin(m2 x) + ch(km1 σ 2 )sin(m1 x)ch(km2 σ 2 )cos(m2 x)

since ch(x) > 0 for all x-s,we can bound the received result with: n 2 2 2 2 2 2 max{|h ∗ g|} = A2 e−(k +m1 )σ ch2 (km1 σ 2 ) + B 2 e−(k +m2 )σ ch2 (km2 σ 2 )  21 2 2 (k2 +m2 (k2 +m2 1 )σ 2 )σ − − 2 2 2 2 +2Ae Be ch(km1 σ )ch(km2 σ ) = Ae−

2 (k2 +m2 1 )σ 2

ch(km1 σ 2 ) + Be−

2 (k2 +m2 2 )σ 2

ch(km2 σ 2 )

n 2 2 2 2 2 2 A2 e−(k +m1 )σ sh2 (km1 σ 2 ) + B 2 e−(k +m2 )σ sh2 (km2 σ 2 ) 1 2 2 (k2 +m2 (k2 +m2  2 1 )σ 2 )σ  − − 2 2 2 2 −2Ae Be sh(km1 σ )sh(km2 σ ) 2 − (k2 +m21 )σ2 2 (k2 +m2 2 )σ 2 − 2 2 2 2 sh (km1 σ ) − Be sh (km2 σ ) = Ae

min{|h ∗ g|} =

53

 21

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

C

A response to multi-harmonic signal

In analogy to the bounds on the response of a single sine signal to a filter, described in previous appendix, we have developed an expression for a filter response to a signal which is a sum of sines with different amplitudes. P −x2 +ikx 1 2σ 2 Consider a complex Gabor filter hk (x) = √2πσ , and a signal gA, e ~m 2 i Ai sin(mi x). ~ (x) = It can be shown that the filter response: rA,k, ~ m ~ (x) = hk (x) ∗ gA,m (x) ( X  2 2 2  = A2i e−(k +mi )σ ch2 (kmi σ 2 ) − cos2 (mi x)

(C.1)

i

+

X



Ai e

2 (k2 +m2 i )σ 2



Aj e

2 (k2 +m2 j )σ 2

i,j;i6=j

·

 · ch[kσ 2 (mi + mj )]cos[(mi − mj )x]

) 21 − ch[kσ 2 (mi − mj )]cos[(mi + mj )x]

The upper bound of this expression is: ) ( 2 X X (k2 +m2 i )σ 2 max hk (x) ∗ Ai e− Ai sin(mi x) = ch(kmi σ 2 ) i

i

A lower bound depends in a complex way on the combination of mi frequencies and amplitudes Ai . We could not find a useful lower bound and just use the trivial one, which is zero. Thus, N 2 X −(k2 +m2 i )σ 2 Ai e ch(kmi σ 2 ) ≥ rA, (C.2) 2 (x) ≥ 0 ~ m,k,σ ~ i=1

Relating on the upper bound as the actual maximal response and use the trivial lower bound implies that   2 −(k2 +m2 PN i )σ 1 2 2 ch(kmi σ ) rA~1 ,k,m,σ i=1 Ai e ~ 2 ≥ 2 ∆rA,k, ~ m,σ ~ 2 ≤

PN

i=1 Ai e

2 −(k2 +m2 i )σ 2

ch(kmi σ 2 )

Applying ∆m estimation to this approximation: ∆rA, 2 ~ m,k,σ ~

PN

2 −(k2 +m2 i )σ 2

ch(kmi σ 2 ) ≤ ∆m ∼ = 2 2 2 −(k +m )σ PN i ∇rA,k, 2 ~ m,σ |k · sh(kmi σ 2 ) − m1 · ch(kmi σ 2 )| σ 2 ~ 2 i=1 Ai e i=1 Ai e

54

(C.3)

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

Note, that this time the middle response is multidimensional function. To obtain a linear approximation for its change tempo we should consider the direction of maximal change, i.e. the gradient. Note also, that the received expression is an upper bound on ∆m. The dividend may be only smaller. The divisor an absolute value of a gradient of some Gaussianlike function, which tails are always near zero, but the maximal value may only increase. The absolute value of the gradient of such function may only increase. Therefore the total fraction is bounded from above by the given expression. 2 For large σ (sh(kmi σ 2 ) ≃ sh(kmi σ 2 )) ≃ 12 ekmi σ ) , it may be simplified to PN −(k−mi )2 σ 2 2 A e i=1 i . (C.4) ∆m ≤ P −(k−mi )2 σ 2 N 2 2 |k − mi |σ i=1 Ai e The obtained expression gets its maximum at k = mi similarly to the two frequencies case (4.29). Thus, the filters design considerations for dual-harmonic signals are identical to those of the multi-harmonic case.

D

Optimality of the chosen single filter

If a response in fmin have to be equal Rmin , the σ 2 of the filter is given by: −2log(Rmin ) σ2 = (k − fmin )2

If we increase k by δ the new σ 2 would be: −2log(Rmin ) σδ2 = (k + δ − fmin )2 The ∆m measure for fmin and filter k, σ is given by: −

2

(D.2)

−4log(Rmin )kfmin

2(k−fmin )2 e−2kfmin σ e ∆mǫ = = min ) (k − fmin )σ 2 (k − fmin ) −2log(R (k−fmin )2

(D.3)

2

) −(k−fmin ) fmin − k C [(k+fmin 2(k−fmin )2 e C   (k+fmin )2 1 − fmin − k C 2(k−f 2 2 min ) e = C and filter k + δ, σδ is given by:

2]

(D.4)

=

The ∆m measure for fmin

(D.1)



2

(D.5) 

(k+δ+fmin ) 1 fmin − k − δ C 2(k+δ−f 2 −2 min ) = (D.6) ∆mǫδ = e e C Since Rmin is a small number, its log is a negative number. Therefore C is a negative constant. Both k + fmin and k − fmin are above zero, therefore 2 (k+δ+fmin )2 σδ − 2

(k + δ + fmin )2 (k + fmin )2 < (D.7) (k + δ − fmin )2 (k − fmin )2 and thus ∆mǫδ < ∆mǫ because its exponent value is less negative and its non-exponent factor is smaller. Therefore increasing of the filter’s frequency and preserving the given response at fmin would increase ∆m in filter’s responses. 55

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

E

Ratio theorem

Theorem 1.1. (

rA1 ,m1 ,k1 ,σ12 (x) ,x ∈ R rA1 ,m1 ,k2 ,σ22 (x)

)

(

T

⇓ T

{rA1 ,k1 ,m1 ,σ12 (x), x ∈ R}

)

=∅

{rA2 ,k1 ,m2 ,σ12 (x), x ∈ R} = ∅

or T

{rA1 ,k2 ,m1 ,σ22 (x), x ∈ R}

rA2 ,m2 ,k1 ,σ12 (x) ,x ∈ R rA2 ,m2 ,k2 ,σ22 (x)

{rA2 ,k2 ,m2 ,σ22 (x), x ∈ R} = ∅

Proof: Suppose the claim is wrong, then w.l.g. ( ) ( ) rA1 ,m1 ,k1 ,σ12 (x) rA2 ,m2 ,k1 ,σ12 (x) T ,x ∈ R ,x ∈ R = ∅ rA1 ,m1 ,k2 ,σ22 (x) rA2 ,m2 ,k2 ,σ22 (x) {rA1 ,k1 ,m1 ,σ12 (x), x ∈ R} {rA1 ,k2 ,m1 ,σ22 (x), x ∈ R} which is:

and T

and T

min{rA1 ,m1 ,k1 ,σ12 (x)} max{rA1 ,m1 ,k2 ,σ22 (x)}

{rA2 ,k1 ,m2 ,σ12 (x), x ∈ R} = 6 ∅ {rA2 ,k2 ,m2 ,σ22 (x), x ∈ R} = 6 ∅

>

max{rA2 ,m2 ,k1 ,σ12 (x)} min{rA2 ,m2 ,k2 ,σ22 (x)}

and min{rA1 ,m1 ,k1 ,σ12 (x)} < max{rA2 ,m2 ,k1 ,σ12 (x)} and min{rA2 ,m2 ,k2 ,σ22 (x)} < max{rA1 ,m1 ,k2 ,σ22 (x)} combining the latter two expressions: min{rA1 ,m1 ,k1 ,σ12 (x)} max{rA1 ,m1 ,k2 ,σ22 (x)}

>

max{rA2 ,m2 ,k1 ,σ12 (x)} min{rA2 ,m2 ,k2 ,σ22 (x)}

and max{rA2 ,m2 ,k1 ,σ12 (x)} > min{rA2 ,m2 ,k2 ,σ22 (x)}

1

>

min{rA1 ,m1 ,k1 ,σ12 (x)} max{rA1 ,m1 ,k2 ,σ22 (x)}

we reached a contradiction, therefore the assumption was wrong, w.s.p.

56

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

References [1] B.B.Mandelbrot, The fractal geometry of nature, New York: W.H. Freeman, 1983. [2] K. Brady, I.H. Jermyn, and J.Zerubia, Adaptive probabilistic models of wavelet packets for the analysis and segmentation of textured remote sensing images, Proceedings of the British Machine Vision Conference (2003). [3] R. Chellappa and S. Chatterjee, Classification of textures using gaussian markov random field, IEEE Transactions on Acoustics, Speech and Signal Processing 33 (1985), no. 4, 959–963. [4] G.R. Cross and A.K. Jain, Markov random field texture models, TransPAMI 5 (1983), 25–39. [5] K.J. Dana, B. van Ginneken, S.K. Nayar, and J. J. Koenderink, Reflectance and texture of real world surfaces, ACM Transactions on Graphics 18 (1999), no. 1, 1–34. [6] J.G. Daugman, Uncertainty relation for resolution , spatial frequency, and orientation optimized by 2d visual cortical filters, Journal of the Optical Society of America A 2 (1985), 1160–1169. [7] D.J.Field, Relation between the statistics of natural images and the response properties of cortical cell, J.Opt.Soc.Am.A 4 (1987), no. 12, 2379–2394. [8] I. Fogel and D. Sagi, Gabor filters as texture discriminator, BioCyber 61 (1989), 102– 113. [9] D. Gabor, Theory of communication, J.IEE 93 (1946), 429–459. [10] B. Georgescu, I. Shimshoni, and P. Meer, Mean shift based clustering in high dimensions: A texture classification example., 9th International Conference on Computer Vision (2003), 456–463. [11] H. Greenspan, Multi-resolution image processing and learning for texture recognition and image enhancement, Ph.d. thesis, California Institute of Technology, Pasadena,California, May 1994. [12] H. Greenspan, S. Belongie, P. Perona, R. Goodman, S. Rackshit, and C.H.Anderson, Overcomplete steerable pyramid filters and rotation invariance, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (1994). [13] R.M. Haralick, K. Shanmugam, and I. Dinstein, Textural features for image classification, SMC (1973), 611–621. [14] J.M. Jolion, Feature similarity (in principles of visual information retrieval), ch. I.5, Springer, 2000. [15] K.I.Laws, Textured image segmentation, Ph.D. thesis, Dept. Electrical Engineering,University of Southern California, January 1980. 57

Technion - Computer Science Department - Technical Report CIS-2005-05 - 2005

[16] T. S. Lee, Image representation using 2d gabor wavelets, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (1996), no. 10, 959–971. [17] T. Leung and J. Malik, Representing and recognizing the visual appearance of materials using three-dimensional textons, International Journal of Computer Vision 43 (2001), no. 1, 29–44. [18] W.Y. Ma and B.S. Manjunath, Edgeflow: A technique for boundary detection and image segmentation. [19] J. Malik, S. Belongie, T. K. Leung, and J. Shi, Contour and texture analysis for image segmentation, International Journal of Computer Vision 43 (2001), no. 1, 7–27. [20] J. Malik and P. Perona, Preattentive texture discrimination with early vision mechanism, JOSA-A 7 (1990), no. 5, 923–932. [21] J. Mao and A.K. Jain, Texture classification and segmentation using multiresolution simultaneous autoregressive models, Pattern recognition 25 (1992), no. 2, 173–188. [22] M.Porat and Y.Y.Zeevi, Localized texture processing in vision: Analysis and synthesis in the gaborian space, IEEE Trans Biomed Eng (1989), 115–29. [23] A.P. Pentland, Fractal-based description of natural scenes, PAMI 6 (1984), no. 6, 661– 674. [24] Y. Rubner, Perceptual metrics for image database navigation, Ph.D. thesis, Stanford University, 1999. [25] E. Sharon, A. Brandt, and R. Basri, Fast multiscale image segmentation, IEEE International Conference in Computer Vision (1999), 70–77. [26] J. Shi and J. Malik Self inducing relational distance and its application to image segmentation, EVVC (1), 1998, pp. 528–543. [27] T.Randen and J.H.Husoy, Filtering for texture classification: A comparative study, PAMI 21 (1999), no. 4, 291–310. [28] M. Varma and A. Zisserman, Classifying images of materials: Achieving viewpoint and illumination independence, Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, vol. 3, May 2002, pp. 255–271. [29] M. Varma and A. Zisserman, Texture classification: Are filter banks necessary?, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, June 2003, pp. 691–698. [30] T. P. Weldon, W. E. Higgins, and D. F. Dunn, Efficient Gabor filter design for texture segmentation, Pattern Recognition 29 (1996), no. 12, 2005–2015.

58