IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
657
Intrinsically Optimal Bayesian Robust Filtering Lori A. Dalton, Member, IEEE, and Edward R. Dougherty, Fellow, IEEE
Abstract—When designing optimal filters it is often unrealistic to assume that the statistical model is known perfectly. The issue is then to design a robust filter that is optimal relative to an uncertainty class of processes. Robust filter design has been treated from minimax (best worst-case performance) and Bayesian (best average performance) perspectives. Heretofore, the Bayesian approach has involved finding a model-specific optimal filter, one that is optimal for some model in the uncertainty class. Lifting this constraint, we optimize over the full class from which the original optimal filters were obtained, for instance, over all linear filters. By extending the original characteristics determining the filter, such as the power spectral density, to “effective characteristics” that apply across the uncertainty class, we demonstrate, for both linear and morphological filtering, that an “intrinsically optimal” Bayesian robust filter can be represented in the same form as the standard solution to the optimal filter, except via the effective characteristics. Solutions for intrinsic Bayesian robust filters are more transparent and intuitive than solutions for model-specific filters, and also less tedious, because effective characteristics push through the spectral theory into the Bayesian setting, whereas solutions in the model-specific case depend on grinding out the optimization. Index Terms—Canonical expansion, optimal linear filter, robust filter, Wiener filter, granulometric filter.
I. INTRODUCTION
S
IGNAL estimation involves making inferences from observations that are distorted or corrupted in some manner. Canonical examples of optimal signal estimation are the Wiener filter, which provides the solution to a classical optimal linear estimation problem where the observation and signal are jointly wide-sense stationary stochastic processes, and the Kalman filter, which provides the solution to a classical optimal linear estimation problem for linear stochastic systems [1]. Optimal signal estimation has also been solved in the case of morphological granulometric filtering, which provides an optimal solution to the problem of estimating a signal random set from a noisy random set [2]. In all of these theories, the observation and signal processes are joint random processes, lying in some index set and lying in some index set , having known or parbe tially known statistical properties. In particular, let a probability space. Then is a family of complex Manuscript received June 26, 2013; revised August 29, 2013; accepted October 27, 2013. Date of publication November 14, 2013; date of current version January 14, 2014. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Branco Ristic. L. A. Dalton is with the Department of Electrical and Computer Engineering and the Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210 USA (e-mail:
[email protected]). E. R. Dougherty is with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA, and also with the Computational Biology Division of the Translational Genomics Research Institute, Phoenix, AZ 85004 USA (e-mail:
[email protected]). Digital Object Identifier 10.1109/TSP.2013.2291213
random variables
indexed by and indexed by , each mapping to a complex number in . Optimal signal estimation involves estimating a signal at time via a filter given observa. We write to emphasize that tions is meant to estimate the signal at time . Optimization is relative to a family of filters, , where a filter is and is the space of possible oba mapping served signals, each signal being a complex-valued function on . Performance measurement is relative to an objective func, quantifying the cost or error in estimating tion, signal with . If an optimal filter (with finite error) ex, then it can be expressed as ists for a fixed (1) where the minimum may be achieved by more than a single . We are particularly interested in the mean-square error (MSE) of a filter , for which , where denotes an expectation relative to the probability measure , and where an optimal filter is referred to as a minimum-mean-square-error (MMSE) filter. For a constrained class of filters, filter error and optimality often depend on only certain characteristics (such as joint moments) relating the observed and ideal signals, not the full joint distribution. For instance, the minimum-mean-square error (MMSE) linear filter depends on only second order moments. In practice, it is often unrealistic to assume that the statistical model is known perfectly because the filter will be applied in situations where the design model only approximates the state of nature. This may occur because the system generating observations varies in performance over time or because the goal is to have the filter apply to a family of systems. This problem has become prominent in genomic signal processing, where signal characterization depends on network models in which specific pathways can vary substantially among different cells and whose transcriptional logic is influenced by latent variables outside the model. This has lead to the use of Bayesian methods for designing optimal therapies based on gene networks [3]. If our intent is to filter genomic signals, then we are motivated to design filters taking model uncertainty into account. The issue is then to design a filter that is optimal relative to an uncertainty class of processes, where optimality must be defined relative to the uncertainty class as a whole. While it would take us too far afield to discuss robust filtering in the context of transcriptional logic, it is not difficult to illustrate the problem for genomic imaging. We consider comparative genomic hybridization (CGH) analysis. In CGH, a test genome and a reference genome are simultaneously hybridized to normal metaphase target chromosomes. In [4] it is discussed how a gray-scale image of these chromosomes is processed to
1053-587X © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
658
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
where denotes expectation relative to . A state for which achieves the minimum in (3) is called a maximally robust state. MCBR filters have been developed for digital binary [16], linear [17] and morphological granulometric [18] filtering. In these works, typically a great effort goes towards evaluating in closed form, while a solution to the minimization problem itself is found by brute force. Rather than constrain the search to operators in , that is, operators optimal for some fixed model, in this work we address intrinsic Bayesian robust (IBR) filters that minimize over all operators in , namely, Fig. 1. Metaphase chromosome image. (a) boundary-filled image; (b) bandpass filtered image.
produce a binary image as shown in Fig. 1(a). A granulometric bandpass filter (GBF) is applied to remove noise and overlapping chromosomes to arrive at the filtered image in Fig. 1(b). If we have a full description of the image as a random process, then we can find an optimal GBF; however, this is unlikely. As we will discuss in Section IV, an optimal robust filter can be determined from a certain partial description of the image process. Historically, robust filter design was first treated in minimax frameworks, where the underlying model is a member of an uncertainty class parameterized by with parameter space . Minimax robust filtering aims to find a filter that has best worst-case performance. For a fixed , denote the joint observation and signal processes by . A minimax robust filter is given by (2) Depending on the filter class and the process, a solution to (2) may or may not exist, and a key part of the minimax theory is to find conditions guaranteeing existence. For instance, one can constrain the space of filters under consideration to only those filters that are optimal for some state in the uncertainty class, i.e., rather than minimizing over in (2), one may minwhere imize over the set is a solution to (1) with and in place of and , respectively. Minimax robust linear filters have been studied in terms of both the signal-to-noise ratio and signal-noise power spectra pairs [5]–[10], and have been applied in other linear optimization frameworks, including matched and Kalman filtering [11]–[14]. A general formulation has been developed in the context of game theory, where it can be proved under some convexity assumptions on the uncertainty class that a minimax robust filter exists [15]. Minimax robustness tends to be overly conservative, especially if is large. From a probabilistic perspective, a minimax robust filter can be overly influenced by outlier models because it does not take into account the probability mass over . An alternative methodology is Bayesian robust filtering, which assumes a known prior distribution, , over . To formulate a tractable problem, Bayesian robust filtering was originally posed relative to the constrained filter class . In particular, a model-constrained Bayesian robust (MCBR) filter is the optimal operator in corresponding to (3)
(4) , the expected cost associated with the intrinsic Since solution is at least as small as the model-constrained solution, making the MCBR filter suboptimal relative to the IBR filter. Until now, closed form solutions for IBR filtering has remained an open problem. Our main contribution is in developing a theory to solve (4), in the same way one may solve (1), by introducing the notion of effective characteristics. IBR filters can not only outperform MCBR filters, but can now be found fully in closed form for many filtering problems. Robust Wiener filtering will be solved in Section III. Wiener filtering considers MSE optimization over integral linear operators of the form (5) and are zero-mean complex-valued We assume that random processes. Optimization involves finding a weighing function, , to minimize the MSE, and depends on second-order statistics: the auto-correlation of the observed signal and the cross-correlation between the observed and desired signals. The formulation can be reduced in particular cases, such as for linear degradation models and wide-sense stationary (WSS) processes. For WSS processes, the optimal filter becomes a function of the power spectral densities. Under an uncertainty class of states, our objective is to minimize the average MSE across all states, . For instance, consider a linear model with a is convolved with a random parameter , where the signal process plus an uncorrelated zero-mean noise process , (6) We will show that optimal Bayesian robust linear filtering depends on only an effective auto-correlation and cross-correlation, or the corresponding effective power spectral densities. Simulations also confirm superior performance for the IBR filter for a Gaussian blurring and uncorrelated additive white Gaussian noise model. For example, in the lower left plot of Fig. 4 we observe that with a beta prior the IBR filter has MSE 1.043, while the MCBR filter has MSE 1.074 and the minimax filter has MSE 1.181. Were one to actually know the true state of nature, , then the optimal choice would be to use , which is optimal for . The use of , while optimal given model uncertainty, reflects that uncertainty. With this in mind, the objective cost of uncertainty
DALTON AND DOUGHERTY: BAYESIAN ROBUST FILTERING
(OCU) relative to using instead of
659
is defined as the differential cost of [19]:
Proof: The proof is immediate from (4):
(7) The mean objective cost of uncertainty (MOCU) is given by (8) The objective cost of uncertainty is relative to the filter family , the cost , and the uncertainty distribution . A more general definition for OCU is given in [19]. Here we confine ourselves to filtering. The MOCU differs from entropy as a measure of uncertainty in that it takes the objective into account, which in our case is signal estimation. Bayesian uncertainty models are not new in the literature [20]–[22], for instance, there has been extensive work in Bayesian model selection [23]–[25], which attempts to select the most likely model in an uncertainty class, and Bayesian model averaging [25]–[28], which attempts to find an average model to represent the uncertainty class. While the above approaches are more focused on model inference in the presence of uncertainty, our objective is not necessarily to infer a model but to evaluate and optimize the performance of an operator over the uncertainty class. We do not start with the assumption that it is even necessary to infer a model, but instead we begin with a basic definition of performance and prove that closed form solutions to many optimal filtering problems can be found via a kind of average model, which we call the effective process or effective characteristics. Our approach does not necessarily require the effective process be a member of the uncertainty class, or even for it to exist as a valid process.
where the second equality follows from the definition of effective processes. The effective process may or may not belong to the uncertainty class, , but should be solvable. The general procedure is to establish a class, , of solvable processes possessing known optimal filters and find a process for which (9) is satisfied. Owing to Theorem 1, an IBR filter then falls out as the optimal filter corresponding to the effective process. We next develop a general approach to finding IBR filters based upon the fact that ordinary optimal filters are often found in terms of characteristics of the joint random process , not the process itself, where by a characteristic we mean a deterministic function derived from the process. The problem becomes much more general in that it only requires finding effective characteristics, not necessarily an effective process. Further, given a joint random process , the error of a filter can often be expressed in the form , where refers to the characteristics of and refers to the filter parameters—for instance, in the case of linear filtering corresponds to the auto- and cross-correlation functions, and to the filter weighing function. Definition 3: A class, , of process pairs, , is reducible under cost and function class if there exists a functional such that for each and , (10)
II. EFFECTIVE CHARACTERISTICS Extending the notion of an effective feature-label distribution used to derive an intrinsically optimal Bayesian robust classifier [29], [30], the following definitions and theorem show how an IBR filter can be expressed in exactly the same closed form as a model-specific optimal filter with the original characteristics replaced by “effective” characteristics. be a measurable cost Definition 1: Let is the set of non-negative function of two variables, where be a function class, where each real numbers. Let maps from a signal space, , to . An observation and signal , is solvable under cost and function class pair, if there exists a solution to (1) under the processes, that is, if there exists a function minimizing over all . Definition 2: Let be an uncertainty class of process pairs having prior, . An observation and signal pair is an effective process under cost , function class and uncertainty class if for all , both and exist and (9) Theorem 1: Let be an uncertainty class of process pairs having prior, . If there exists a solvable effective process, , with optimal filter , then .
is a collection of process characteristics and repwhere resents parameters for filter . In this context, we refer to as a cost functional. Definition 4: A collection of characteristics, , is solvable in the weak sense under cost functional and function class if there exists a solution to (11) Given a set of characteristics, , that are solvable in the weak sense, there is an optimal filter, , that possesses a functional, . Definition 5: Let be an uncertainty class of process pairs contained in a reducible class. The characteristic is an effective characteristic in the weak sense under cost functional , function class and uncertainty class if for all , both and exist and (12) Theorem 2: Let be an uncertainty class of process pairs contained in a reducible class. If there exist weak-sense solvable weak-sense effective characteristics, , with optimal filter , then .
660
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
Proof: The proof is immediate from (4):
integrals, and applying the definitions of the auto- and crosscorrelation functions. parameterized Consider an uncertain signal model by with the MSE cost function and function class (15)
where the second equality follows from reducibility, the third from the definition of weak-sense effective characteristics, and the fourth from the weak-sense solvability of the effective characteristics. If there exists an effective process providing the effective characteristics, we say that these are effective in the strong sense; otherwise, we say they are effective in the weak sense. While it might be theoretically useful to have an effective process, since filter optimization is achieved via the characteristics, from a practical perspective weak-sense effective characteristics are sufficient. Indeed, the procedure for finding an effective process typically involves putting the error expression into a desired form so that effective characteristics of the process emerge, and then determining the existence of a process possessing the effective characteristics. We close this section by noting that effective processes/characteristics for a function class are also effective processes/ characteristics for any smaller function class under the same cost and uncertainty class. Hence, if one’s interest is in IBR filtering and an effective process is found under a function class , then the effective process under the function class used in MCBR filtering is exactly the same. Note that Theorems 1 and 2 theoretically apply for both IBR and MCBR (using ) filtering, although MCBR filtering typically requires the additional computationally intensive steps of finding and solving the constrained optimization problem in (1) under and the effective process. III. WIENER FILTERING In this section, we return to the classical optimal linear filis a linear tering problem discussed in the Introduction. If function space on and, for any , the stochastic integral of (5) gives a random variable having a finite second moment, then yields the optimal MSE linear estimator of based on if and only if it satisfies the Wiener-Hopf equation, (13)
, such The solvable class consists of all process pairs, has a finite second moment for any and that there exists for which the Wiener-Hopf equation is satisfied. In the robust model the error is given by
(16) Define the constant functions
and the (17) (18)
is conjugate symAs an autocorrelation function, and non-negative definite metric for all functions such that ( exists) for each . It is straightforward to show that has the same properties and is therefore also a valid auto-correlation function. Thus, there exists an effective zero-mean Gaussian process, , with autocorrelation function . Likewise, there exists a zero-mean with auto-correlation at Gaussian process and cross-correlation at for all . According to (16), the robust model error equals the error for the process . Consequently, if , meaning there exists a solution of the Wiener-Hopf equation relative to the effective process, then by Theorem 1 the optimal robust filter is found via the solution, , to (19)
is the auto-correlation function for and where is the cross-correlation function between and . Under the specified conditions, the optimal filter minimizes
(14) where and the right-hand side results from expanding the square, bringing the expectation inside the
Moreover, an MCBR filter agrees with an IBR filter if and only if there exists a maximally robust state and the solution, , to the Wiener-Hopf equation (13) under auto-correlation and cross-correlation also happens to be a solution to (19). In the present circumstances, the MSE can be written in the form where and . In the robust setting, are the effective characteristics, so that and are effective autoand cross-correlations relative to uncertainty class in the strong sense.
DALTON AND DOUGHERTY: BAYESIAN ROBUST FILTERING
661
we can also write the optimal filter as
A. Canonical Expansions Consider the design of optimal linear filters in the context of an integral canonical expansion of the observed signal process. This facilitates closed-form expressions for the error of a robust filter in terms of second-order statistics for the ideal signal and the white-noise process determining the canonical expansion. An integral canonical expansion of a random function takes the form
(30) and the error of the optimal filter as
(20)
(31)
is the mean of is white noise over where (the domain of ), and the coordinate functions are deterministic. The covariance function of continuous white noise is the generalized function , where is the intensity of the white noise, and the theory of integral representation is interpreted in the generalized sense. Based on (20), has covariance function
In the presence of uncertainty, (22) through (28) continue to hold for the effective process, except that (23), (26) and (28) now take the forms
(21)
(33)
and , where is the crossand . covariance between Integral canonical expansions are formed via a kernel by defining (22)
(32)
(34) The co-ordinate functions and the intensity of the white noise now depend on expectations, relative to , of the second moments and the intrinsic Bayesian robust filter is defined by (34). We may also give the IBR filter by
Three conditions are necessary and sufficient for a canonical expansion to result [31], [32]: (23)
(35) with error
(24) (25)
(36) 1) Discrete Karhunen-Loeve Transform: Suppose that the canonical representation takes the form
The intensity of the white noise is (26) Assuming is a zero-mean random function, we can drop from (20) and replace by . If the optimal filter exists, then it can be shown via the Wiener-Hopf equation that it is given by (27) where (28) Since (29)
The necessary and sufficient conditions for the integral canonical representation, (23), (24) and (25), are replaced by orthogonality conditions, the integral representation for the optimal linear filter, (30), is replaced by the corresponding discrete representation, the integral in the error of the optimal filter, (31), is replaced by a summation over , and the white-noise intensity , is replaced by the variance of [31]. This all holds analogously for robust filter design, with all auto-correlation functions replaced by the effective auto-correlation functions. To illustrate the theory, consider a signal plus noise model, where
662
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
in which the noise is white with constant variance and is uncorrelated with the signal, . Let be the KarhunenLoeve coordinate system for the signal auto-correlation function, with corresponding eigenvalue sequence , i.e.,
From (26), has intensity into (28) yields
Then
and denotes the Fourier transform. Applying spectral form
forms an orthonormal system over , and . It is well known that the optimal linear filter for noise state is given by
. Substitution , where (39) yields the
(40) The error of the Wiener filter is then given by where is the th Karhunen-Loeve coefficient for has the same coordinate functions Note that , as , but that the eigenvalues for are . The optimal MSE can also be found:
.
(41)
Robustifying the model with a prior, correlation can be reduced to the form
, the effective auto-
Effective cross-correlation functions can be found similarly. The optimal filter essentially considers an effective white noise process with variance , resulting in the IBR filter
with optimal error
B. Wide-Sense Stationary Processes For the classical setting involving the power spectral density without uncertainty, suppose is WSS with auto-correlaand power spectral density , and tion function suppose is WSS with auto-correlation , power spectral density and cross-correlation function having Fourier transform . Letting and , (22) becomes
In the presence of uncertainty, the intensity of the white noise comes from (33) and involves the Fourier transform . In addition, we now apply (34) of involving the Fourier transform of . We refer to the Fourier transforms of and as the effective power spectra and denote them by and , respectively. Then (37) through (40) hold with the effective power spectra in place of the model-specific power spectra. In particular, the IBR filter is defined by (40) with and in place of and , respectively. We refer to the resulting filter as the IBR Wiener filter. 1) Convolved Signal Plus Noise Model: Consider the blurred signal plus noise model from the Introduction section, with system equation given by (6). For fixed , assume and are all uncorrelated processes, and and are zero-mean processes. Assuming and are both WSS, the Wiener filter is
(42) where is the power spectral density of is is the auto-correlathe power spectral density of tion function of has Fourier transform , and interchanging the Fourier transform and expectation integrals is assumed to be justified. If is a fixed non-random function, then the Wiener filter is
(37) (43) Equations (23), (24) and (25) are satisfied and tegral canonical expansion
has the in(38)
In the robust scenario, auto-correlations and become random functions of , which we assume to be uncorrelated, meaning that the random variables and
DALTON AND DOUGHERTY: BAYESIAN ROBUST FILTERING
663
are uncorrelated relative to for any Under this assumption, the effective auto-correlation is
.
(49) The IBR Wiener filter is given by (50)
(44) where we used the uncorrelatedness of the final integral, and cross-correlation function is
and
when interchanging the Fourier transform and expectation integrals is assumed to be justified. The MSE of the Wiener filter is
to obtain
. The effective
(51) where If is a fixed non-random function, filter is given by
. , then the IBR Wiener
(52) (45)
with MSE
are both WSS for every , we can write and . We and can also write , where applying a change of variables ,
Assuming
and
(53) For just convolution (no noise), the inverse filter is optimal, which has zero MSE. In the case of just additive noise where the effective noise is white with variance , the IBR Wiener filter is (54)
(46) and the corresponding MSE is (47) Letting and
and be the Fourier transforms of and , the auto-correlation functions for , respectively, the effective power spectra are given by
(55) 2) Gaussian Blurring With Uncorrelated Additive White Noise: Here we consider the WSS case with Gaussian blurring plus uncorrelated white noise. That is, consider the model in (6) with , where parameterizes a Gaussian blurring function,
with Fourier transform
(48)
and
is the power spectral density of the white noise process.
664
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
The effective noise power spectral density is the expected noise variance, . Given a conjugate prior for having a gamma distribution with hyperparameters and , (56) we also have
where and are integer valued pixel indices and we assume that . The power spectral density of the orig, is estimated from realizainal random texture process, tions, and we assume that independent distributed noise is added to each pixel. The final result is quantized to integer values between 0 and 15. A realization of the original texture process and an example degraded image with are shown in Fig. 2. The RMS reported in the figure is averaged over pixels in this particular realization. A degraded image passed through a filter, , produces the . The MSE for an arbitrary restored image filter applied under true state is given in[33], namely,
(59) (57) and
(58) Hence, the IBR Wiener filter is given by (50) as
with MSE given by (51). C. Example We illustrate optimal Bayesian robustness in the context of a convolved signal with uncorrelated additive white noise model. We utilize a random texture process represented by a random boolean function formed as the maximum of random primary functions (in this case a digitized triangle) [17]. Random boolean functions are stationary, and so long as they are degraded by an operator (such as a blurring function) that maintains stationarity, the preceding power spectral density theory applies with the minor modification that we work in a quantized discrete 2D spatial domain. We assume the original image, with 2D Fourier transform , undergoes a Gaussian blurring operation with noise, resulting in a distorted signal with transform . The blurring function is the transform of
For a Bayesian approach, we will employ a uniform prior over , a beta (0.1, 0.1) prior (scaled over the interval ) and a gamma (100, 100) prior, as shown in Fig. 3. Results are presented in Fig. 4, which in the upper left shows the MSE surface with respect to the optimal filter designed for state and the true state , which is based on (59). The upper right and both bottom plots show performance of several robust filters with respect to for all three priors. The solid curved line labeled is the average MSE of every model-specific optimal filter relative to the corresponding prior. Hence, the performance of the MCBR filter is the minimum point in this line (the corresponding maximally robust state is marked with an ‘o’ on the x-axis). The maximally robust state under the uniform prior is , under the beta prior it is , and under the gamma prior it is . The solid line labeled is the worst-case performance of every model-specific optimal filter. The minimax filter corresponds to the minimum point in this line, which does not depend on the prior (the corresponding state is always and marked with an ‘x’ on the x-axis). The average MSE for the minimax filter can be found for each prior by evaluating at the state marked with an ‘x’. We also evaluate two “model-averaged” global filters from [17] for each prior: the mean-response global (MRG) filter, which is essentially the average Wiener filter, and the mean-characteristic global (MCG) filter, which is essentially the optimal filter designed for the average signal and noise PSDs. The MRG and MCG filters are not designed for a specific state in the uncertainty class, and so we only present their average MSE as constants. In all cases shown, performances for these model-averaged global filters are comparable to, or slightly worse than, that of the MCBR filter. Finally, the IBR filter is found based on (50), where and are approximated by Monte-Carlo simulation for the uniform and beta prior and found exactly for the gamma prior via (57) and (58). The MSE of the IBR filter averaged over is also shown in Fig. 4 as a constant. In the lower left graph, we see an example where the IBR filter is not among the set of optimal filters for the uncertainty class and significantly outperforms any other filter with respect to the assumed (beta) prior.
DALTON AND DOUGHERTY: BAYESIAN ROBUST FILTERING
665
Fig. 2. Image realizations. (a) Original image; (b) degraded image with Gaussian blurring and additive white Gaussian noise; (c) optimally filtered image; (d) minimax filtered image; (e)-(h) images resulting from filtering by various Bayesian robust filters designed assuming a beta prior on the state of the , (c) , (d) , (e) (f) , (g) blurring function. (a) Original image, (b) , (h) .
than any other robust filter for this particular realization of the process. IV. OPTIMAL GRANULOMETRIC FILTERING To illustrate the Bayesian theory for morphological filters, we consider families of set operators of the form (60)
Fig. 3. Priors over the variance of the blur function.
We show performance of each filter designed with each prior over the uncertainty class of states in Fig. 5. The Wiener filter for each state (model) is shown as a solid black line, representing a lower bound in achievable performance. Note that no robust filter is optimal over the entire uncertainty class, but that the IBR filter is optimal on average with respect to the assumed prior. Fig. 2 presents an example of a state (model) that is highly likely under the beta prior, in the sense that the beta prior peaks around this state, and as is typical the IBR filter performs better
where , called the generator, is a family of convex compact sets and is the usual morphological opening of by , namely, the structuring element . The family belongs to a class of operator families called granulometries. The general theory of optimal granulometric filtering applies to a more general class of granulometries; however, to keep to our main theme and avoid getting side-tracked into offering numerous definitions, we will restrict our attention to the class of granulometries represented by (60), which constitutes the main class used in applications [2]. A. Model-Specific Optimal Granulometric Filters If is a compact set and denotes the Lebesgue measure (volume), then the size distribution is increasing and continuous from the left. Assuming that at least one generating set contains more than a single point, for sufficiently large . The mean size distribution (MSD) , expectation being taken with is defined by respect to the random set. The general granulometric spectral
666
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
Fig. 4. Performance of robust filters. In the upper right and bottom plots, the solid line labeled is the worst-case performance of every model-specific optimal filter. The minimax filter finds the minimum of this line (the corresponding state is marked with an ‘x’ on the x-axis). The solid curved is the average performance of every model-specific optimal filter for the specified prior. The MCBR filter finds the minimum line labeled of this line (the corresponding maximally robust state is marked with an ‘o’ on the x-axis). The average MSE for the IBR, mean-response global (MRG) and mean-characteristic global (MCG) filters are presented as constants. Note the optimality of the IBR filter. (upper left) MSE surface with respect to the optimal and the true state ; (upper right) MSE averaged over for the uniform prior; (lower left) MSE averaged over for the beta prior; filter designed for state for the gamma prior. (lower right) MSE averaged over
theory rests on the Lebesgue decomposition of into absolutely continuous and singular parts [2]. Here, we make the practical assumption that is differentiable in the ordinary sense. In fact, while it is common for to possess jump discontinuities, the governing distribution for is typically continuous, in which case the expectation over has a smoothing effect so that is continuously differentiable. The derivative of the MSD, (here taken in the ordinary sense but, in the general theory taken in the generalized sense) is called the granulometric size density (GSD). The granulometric spectrum of relative to the granulometry is defined by
able-interval subset if it can be represented as a countable union of disjoint intervals , where singleton point sets are considered to be intervals of length zero. Without loss of generality we assume that implies, for all and , that and that there exists such that . This asis to the left of and that and sumption means that are separated by the complement of . The granulometric bandpass filter (GBF) corresponding to , relative to the spectrum , is defined by
(61)
are called the pass and fail sets for the filter, respecand tively. Now consider a random set composed as a disjoint union, , of signal and noise random sets and the estimation of from via a GBF relative to the symmetric difference
. The collection of spectral components partitions for . A subset of the nonnegative real axis is called a count-
(62)
DALTON AND DOUGHERTY: BAYESIAN ROBUST FILTERING
667
(where absent the assumption that the MSD is differentiable, the GSDs might involve delta functions and the inequality would then be interpreted appropriately). Designing a GBF relative to the granulometry involves finding the MSDs of the signal and noise processes and then solving the differential inequality. This is analogous to Wiener filtering characterized by powerspectral densities. Having found a pass set, it is straightforward to construct a filter that passes the appropriate spectral components. Suppose there is a sequence of points such that . Then (65) While, in general, it can be very difficult to find an MSD, a convenient analytic form can be obtained if we consider reconstructive granulometries. The filter family is derived from the granulometry by reconstruction if, for each connected component in the image, if and if . The reconstructive filter fully passes each component (grain) not eliminated by and eliminates any component eliminated by . The preceding theory applies to reconstructive granulometries with the advantage that there is a convenient representation for the MSD. Given a reconstructive granulometry and a random compact and con-measure) nected set , define the granulometric measure ( of by . Then is a random variable whose distribution depends on the distribution of . Consider a disjoint random grain model (66) are identically distributed to the priwhere mary grain is a random positive integer independent of the grains, and are locations randomized up to the constraint that the union forming is disjoint. If depends on the random vector possessing density , then and depend on and (67) Fig. 5. RMS performance of various filters over the uncertainty class of states, . (a) Uniform prior; (b) beta prior; (c) gamma prior.
error in terms of the GSD by
. The error can be expressed
where [4]. If a grain model is appropriately pa, rameterized, the following conditions typically hold: a single parameter; is a function of variables, , where is an -dimensional vector, is indepenis strictly increasing and continuously differdent of , and entiable. Applying these assumptions to (67) and differentiating yields
(63) where and are the GSDs for the signal and noise, respectively. Filter optimization involves finding an optimal pass set relative to the spectral decomposition . Given a , an optimal pass set and corresponding GBF granulometry are denoted by and , respectively. From (63) we see that an optimal pass set is given by (64)
(68) where (69) is the expected number of grains; The result is intuitive: the numerator of is the expected area of the primary relative to and is grain evaluated at at . Regarding the infinitesimal probability mass of
668
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
, for a single grain there is a change in the GSD between and , and the probability of this happening (at the infinitesimal scale) is inversely proportional to the derivative of the granulometric size at . B. Intrinsic Bayesian Robust Granulometric Filters Let us now suppose that the random set belongs to an uncertainty class , in which case we write and for the MSD and GSD for the random set . The error, , now depends on and using (63) we obtain for a GBF ,
ceding effective MSDs and GSDs and then the latter would be effective in the strong sense. Specifically, does there exist a theorem for the class of granulometries given in (60) of the form: If is a monotonically increasing differentiable function on with , then there exists a GBF with MSD . No such proposition is known; however, one can obtain effective random set processes as special cases. For a geometrically intuitive example, consider a granulometry consisting of a single opening whose structuring element is a horizontal line segment. We construct a random set consisting of a single grain. If we define a primary random grain and a granulometric filter for which is the identity, depends only on , and the following reduction of (67) is satisfied,
(70) The class of compact sets is reducible under the expected volume of the symmetric difference cost and the class of granulometric bandpass filters since this error can be expressed in the form where . We have effective GSDs (71) in the weak sense that are solvable in the weak sense via (64). By Theorem 2, it is immediate that the pass set (72) defines an IBR granulometric filter. Since , we would like to interchange expectation with differentiation to obtain effective MSDs, but here one must be careful. To this point, for any realization of the random set, we have found the MSD and differentiated it to get the GSD. More is required to exchange expectation and differentiation. To wit, must be mean-square differentiable with mean-square derivative . This, of course will depend on the distribution of the random set, not simply its realizations. There are two necessary and sufficient conditions for meansquare differentiability: (1) and (2) [31]. Assuming these conditions hold, (70) becomes
(73)
(76) then we have an effective random set. This is accomplished by defining the grain at to be a rectangle with base parallel to the axis, base length , and height , where base length 0 means that the grain is a vertical line. C. Example Let be an opening by a disk of radius , and be the induced reconstructive filter. Consider a signal consisting of squares with random angle of rotation and random radius, , so that . In this case, and , where is the expected number of signal squares and is the density of . Let the noise consist of ellipses having random angle of rotation, random minor axis , and random major axis . With being an opening by a disk of radius , where is the density of . Throughout, we assume . We assume an uncertainty class of models parameterized by where is normally distributed with mean 20 and variance 9, and is normally distributed with mean and variance 4. Figs. 6(a) and (e) show realizations of for and , respectively. When , the optimal model-specific pass set consists of all , and when , the optimal pass set is . We assume is a mixture of two Gaussian distributions, one having mean 10, variance 5 and mass 3/4 and the other having mean 30, variance 5 and mass 1/4. Letting be a normal density with mean and variance , we have the effective characteristics , and
we obtain weak-sense effective MSDs (74) (77)
and the pass set becomes (75) In the case of Wiener filters, to apply Theorem 1 and have effective auto- and cross-correlation functions in the strong sense, we simply applied the well-known theorem that there exists a Gaussian process possessing the second-order moments. In the present setting the issue is whether, given , there exists a random set having MSD equal to . If so, then we can select effective signal and noise processes possessing the pre-
Scaled effective GSDs are depicted in Fig. 7. The optimal robust pass set is determined by . Referring to the figure, the pass set consists of all . The optimal robust granulometric bandpass filter is (78) The optimal robust filter has a single passband.
DALTON AND DOUGHERTY: BAYESIAN ROBUST FILTERING
669
Fig. 6. Original and filtered images consisting of signal grains (squares) and noise grains (ellipses). Black grains are passed, light gray grains are removed. (a) ; (b) Robust filter applied to image 1; (c) Optimal filter for image 1 applied to image 1; (d) Optimal filter for image 2 applied to Original image 1 where ; (f) Robust filter applied to image 2; (g) Optimal filter for image 1 applied to image 2; (h) Optimal filter for image 2 image 1; (e) Original image 2 where applied to image 2.
V. CONCLUSION
Fig. 7. Scaled effective GSDs for signal (squares) and noise (ellipses).
Fig. 6 shows filtered versions of images 1 (Fig. 6(a)) and 2 (Fig. 6(e)), where black grains are passed by the filter and grey grains are removed. Fig. 6(b) shows image 1 filtered using the optimal robust filter, , having pass set , Fig. 6(c) shows image 1 filtered using the model-specific optimal filter for this image, having pass set , and Fig. 6(d) shows image 1 filtered using the model-specific optimal filter for image 2, having pass set . Image 2 is filtered analogously in Fig. 6(f) through (h). We observe that model-specific filters perform very poorly in non-design conditions, but the robust filter performs quite well in both cases.
By introducing the notion of effective characteristics we have arrived at a general methodology for finding intrinsic Bayesian robust filters. Of course, one has to have a standard optimization solution involving characteristics, but this approach has strong historical roots in spectral theory. Its extension to morphological filtering shows that optimization via characteristics can be applied whenever the signal model, objective function, and filter representation can be brought into a suitable coherent structure. Indeed, this point was made by Pugachev more than half a century ago when he was developing the general theory of canonical representations. Once an optimization problem is solved via characteristics, finding effective characteristics, either in the strong or weak sense, extends the solution to the robust setting. It is interesting to note that the solutions for the intrinsic Bayesian robust filters developed in the current paper, both linear and morphological, are mathematically more straightforward than the previous solutions for model-specific filters. The reason for this is that effective characteristics push through the spectral theory, be it linear or granulometric, to the Bayesian setting, whereas the solutions in the model-constrained case depend on grinding out the optimization. Not only is the latter approach more tedious, it also lacks the transparency and intuition of the extended characteristic theory. REFERENCES [1] H. V. Poor, An Introduction to Signal Detection and Estimation, 2nd ed. New York, NY, USA: Springer, 1994. [2] E. R. Dougherty, “Optimal binary morphological bandpass filters induced by granulometric spectral representation,” Math. Imag. Vision, vol. 7, no. 2, pp. 175–192, 1997.
670
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 3, FEBRUARY 1, 2014
[3] R. Pal, A. Datta, and E. R. Dougherty, “Bayesian robustness in the control of gene regulatory networks,” IEEE Trans. Signal Process., vol. 57, no. 9, pp. 3667–3678, 2009. [4] Y. Chen and E. R. Dougherty, “Optimal and adaptive reconstructive granulometric bandpass filters,” Signal Process., vol. 61, pp. 65–81, 1997. [5] V. P. Kuznetsov, “Stable detection when the signal and spectrum of normal noise are inaccurately known,” Telecommun. Radio Eng., vol. 30–31, pp. 58–64, 1976. [6] S. A. Kassam and T. I. Lim, “Robust wiener filters,” J. Franklin Inst., vol. 304, no. 415, pp. 171–185, 1977. [7] H. V. Poor, “On robust Wiener filtering,” IEEE Trans. Autom. Control, vol. 25, no. 4, pp. 531–536, 1980. [8] H. V. Poor and D. P. Looze, “Minimax state estimation for linear stochastic systems with noise uncertainty,” IEEE Trans. Autom. Control, vol. 26, no. 4, pp. 902–906, 1981. [9] K. S. Vastola and H. V. Poor, “Robust Wiener-Kolmogorov theory,” IEEE Trans. Inf. Theory, vol. 30, no. 2, pp. 316–327, 1984. [10] S. A. Kassam and H. V. Poor, “Robust techniques for signal processing: A survey,” Proc. IEEE, vol. 73, no. 3, pp. 433–481, 1985. [11] H. V. Poor, “Robust matched filters,” IEEE Trans. Inf. Theory, vol. 29, no. 5, pp. 677–687, 1983. [12] S. Verdu and H. V. Poor, “Minimax linear observers and regulators for stochastic systems with uncertain second-order statistics,” IEEE Trans. Autom. Control, vol. 29, no. 6, pp. 499–510, 1984. [13] C.-T. Chen and S. A. Kassam, “Robust multiple-input matched filtering: Frequency and time-domain results,” IEEE Trans. Inf. Theory, vol. 31, no. 6, pp. 812–821, 1985. [14] W. X. Zheng, A. Cantoni, and K. L. Teo, “Robust design of envelopeconstrained filters in the presence of input uncertainty,” IEEE Trans. Signal Process., vol. 44, no. 8, pp. 1872–1877, 1996. [15] S. Verdu and H. V. Poor, “On minimax robustness: A general approach and applications,” IEEE Trans. Inf. Theory, vol. 30, no. 2, pp. 328–340, 1984. [16] A. M. Grigoryan and E. R. Dougherty, “Design and analysis of robust binary filters in the context of a prior distribution for the states of nature,” Math. Imag. Vision, vol. 11, no. 3, pp. 239–254, 1999. [17] A. M. Grigoryan and E. R. Dougherty, “Bayesian robust optimal linear filters,” Signal Process., vol. 81, no. 12, pp. 2503–2521, 2001. [18] E. R. Dougherty and Y. Chen, “Robust optimal granulometric bandpass filters,” Signal Process., vol. 81, pp. 1357–1372, 2001. [19] B.-J. Yoon, X. Qian, and E. R. Dougherty, “Quantifying the objective cost of uncertainty in complex dynamical systems,” IEEE Trans. Signal Process., vol. 61, no. 9, pp. 2256–2266, 2013. [20] J. Bernardo and A. Smith, “Bayesian theory,” Measure. Sci. Technol., vol. 12, no. 2, p. 221, 2001. [21] M. Clyde and E. George, “Model uncertainty,” Statist. Sci., pp. 81–94, 2004. [22] D. Madigan and A. Raftery, “Model selection and accounting for model uncertainty in graphical models using Occam’s window,” J. Amer. Statist. Assoc., vol. 89, no. 428, pp. 1535–1546, 1994. [23] M. Barbieri and J. Berger, “Optimal predictive model selection,” Ann. Statist., vol. 32, no. 3, pp. 870–897, 2004. [24] M. Chen, J. Ibrahim, Q. Shao, and R. Weiss, “Prior elicitation for model selection and estimation in generalized linear mixed models,” J. Statist. Plann. Inference, vol. 111, no. 1, pp. 57–76, 2003. [25] L. Wasserman, “Bayesian model selection and model averaging,” J. Math. Psychol., vol. 44, no. 1, pp. 92–107, 2000. [26] J. Hoeting, D. Madigan, A. Raftery, and C. Volinsky, “Bayesian model averaging: A tutorial,” Statist. Sci., vol. 14, no. 4, pp. 382–417, 1999. [27] D. M. A. Raftery and J. Hoeting, “Bayesian model averaging for linear regression models,” J. Amer. Statist. Assoc., vol. 92, no. 437, pp. 179–191, 1997.
[28] A. Merlise, “Bayesian model averaging and model search strategies,” in Bayesian Statistics 6, J. M. Bernardo, A. P. Dawid, J. O. Berger, and A. F. M. Smith, Eds. Oxford, U.K.: Oxford Univ. Press, 1999, vol. 6, p. 157. [29] L. A. Dalton and E. R. Dougherty, “Optimal classifiers with minimum expected error within a Bayesian framework—Part I: Discrete and Gaussian models,” Pattern Recogn., vol. 46, no. 5, pp. 1288–1300, 2013. [30] L. A. Dalton and E. R. Dougherty, “Optimal classifiers with minimum expected error within a Bayesian framework—Part II: Properties and performance analysis,” Pattern Recogn., vol. 46, no. 5, pp. 1301–1314, 2013. [31] E. R. Dougherty, Random Processes for Image and Signal Processing, ser. SPIE/IEEE Ser. Imag. Sci. Eng. Bellingham, WA, USA: SPIE, 1999. [32] V. S. Pugachev, Theory of Random Functions and Its Applications to Control Problems. Oxford, U.K.: Pergamon, 1965, (Distributed by Addison-Wesley: Reading, MA, USA. [33] A. M. Grigoryan and E. R. Dougherty, “Optimization of linear filters under power-spectral-density stabilization,” IEEE Trans. Signal Process., vol. 49, no. 10, pp. 2292–2300, 2001.
Lori A. Dalton (S’10-M’13) received the B.S., M.S., and Ph.D. degrees in electrical engineering from Texas A&M University, College Station, TX, USA, in 2001, 2002, and 2012, respectively. She is currently an Assistant Professor in the Department of Electrical and Computer Engineering and the Department of Biomedical Informatics at The Ohio State University, Columbus, OH, USA. Her current research interests include pattern recognition, classification, clustering, and error estimation. Dr. Dalton was awarded an NSF Graduate Research Fellowship in 2001 and the Association of Former Students Distinguished Graduate Student Masters Research Award in 2003.
Edward R. Dougherty (M’05-SM’09-F’12) received the M.S. degree in computer science from Stevens Institute of Technology, the Ph.D. degree in mathematics from Rutgers University, and has been awarded the Doctor Honoris Causa by the Tampere University of Technology, Finland. He is a Distinguished Professor in the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA, where he holds the Robert M. Kennedy ’26 Chair in electrical engineering and is Director of the Center for Bioinformatics and Genomic Systems Engineering. He is the author of 16 books, the editor of 5 others, and author of more than 300 journal papers. Dr. Dougherty is a Fellow of SPIE, has received the SPIE President’s Award, and served as the Editor of the SPIE/IS&T Journal of Electronic Imaging. At Texas A&M University, he received the Association of Former Students Distinguished Achievement Award in Research, and was named Fellow of the Texas Engineering Experiment Station, and Halliburton Professor of the Dwight Look College of Engineering.