VISUAL FEATURES FOR IMAGE QUALITY ASSESSMENT WITH REDUCED REFERENCE Mathieu CARNEC ∗ , Patrick LE CALLET and Dominique BARBA IRCCyN/IVC, Rue Christian Pauc, La Chantrerie, NANTES, France ∗ since September 2004 : L2TI, Institut Galil´ee, Universit´e Paris 13, 99, avenue Jean-Baptiste Cl´ement, VILLETANEUSE, France ABSTRACT
2. IMAGE QUALITY CRITERION
This paper focuses on the definition and selection of features to design a reduced description image quality assessment method. For such methods, the main problem is the choice of the features which constitute this reduced description. The best features are the ones which enable to produce the highest correlation between produced quality scores and subjective ones. So we test several feature types and measure their impact for image quality assessment. We show that the use of structural information and the combination of different feature types permit to get high performances.
We have set up the HVS model presented in figure 1. For further explanations, the reader is invited to refer to [1].
1. INTRODUCTION The quality of a distorted image is often computed with respect to its reference (original) image. In many cases, the full reference (FR) image is unavailable at the time of quality assessment but the reference image can be briefly described by a set of parameters. In such a case, a reduced reference (RR) quality criterion is employed. Many FR criteria have been developed in literature taking benefit of a human visual system (HVS) model. The extension of such an approach for RR criterion is not obvious. In [1], we have presented a way to include HVS properties in a RR criterion in order to overcome this problem. Reduced descriptions of images are built by extracting local features in a perceptual space. This principle is quite general but leads to a key question : which are the features that best able to capture distortions in order to assess image quality ? In this paper, we present a study that provides guidelines to choose the ad hoc features. We first briefly remind the HVS model we used to design our RR criterion and the overall operating scheme of our RR criterion is described. Then, the different types of features we tested are presented and performances in terms of visual quality prediction are given and discussed. At last, we present a set of applications freely available on the Internet that enable to assess image quality in a transmission context.
movement 7a detection by angular parietal FEF + VIP direction cortex Left MST posterior frontal thalamus Eye parietal cortex V5 cortex LGN eyes (=MT) visual movements 90% of the attention control Dorsal pathway ganglion V1 V2 Ventral pathway cells features topographic V3 occipital cortex detection organisation 10% of the + angular consciousness + V4 ganglion selectivity discrimination cells between objects mesencephalic visual centers infero objects recognition look Superior temporal PIT with visual memory orientation Colliculus cortex pupil control and Pretectum accomodation AIT short duration visual memory (context) Right Eye
Fig. 1. Human Visual System model
Based on this model, we designed and developed an image quality criterion that simulates important stages of vision : perceptual representation, features extraction and comparison between descriptions. First, a reduced description is built from an image by transforming it and extracting features, as described in figure 2. Image Display device gamma function Perceptual colorspace Luminance range normalization CSF Subband decomposition Masking effect model Features extraction Reduced description
Screen Retina Eye mainly V1 mainly V1 mainly V1 V1 V2
Fig. 2. The steps used to build the reduced description of an image (and the corresponding stages in the HVS)
Then, the reduced description (RDDistorted ) of the distorted image and the reduced description of its reference image (reduced reference : RR) are compared to produce the quality score of this latter. This processing is shown in figure 3. Reduced descriptions are generic (not related to a specific image distortion system).
Reference Image
Distorted Image
Reduced description construction
Reduced description construction
RR
Comparison similarity measurement
Subjective quality assessment tests with human RD Distorted observers
correlation
Mean Opinion Score
(performance)
Fig. 3. Block diagram of the quality criterion
2.1. Building a reduced description of an image First, RGB pixels values are converted into luminances using the gamma function of the screen in order to get luminances hitting the eye. Then RGB luminance values are converted to ACr1Cr2 components using Krauskopf’s colorimetric space, validated as a perceptual space in [2]. The A component (achromatic) contains a lot of structural information whereas Cr1 and Cr2 components (chromatic) mainly contain low spatial frequencies. So the A component is the most important to us (to assess image quality) and is now going to be processed alone (Cr1 and Cr2 components will not be processed any more since the types of features that we are going to extract from these components do not require the same processings). The A component is normalized so that its values correspond to a range of 0.7-70 cd/m2 (to simulate the HVS light adaptation in front of standard CRT TV monitor). Then the A component is divided by the mean value of A to produce the contrast component Co. Daly’s contrast sensitivity function (CSF) [3] is then applied on this contrast component in order to weight its values with respect to perception. Since psychophysics has demonstrated that the perception of a visual stimulus depends on its spatial frequency and orientation, we use a subband decomposition resulting from experiments made in our lab (IRCCyN/IVC). This decomposition uses 17 filters tuned in frequency and orientation. After subband decomposition, Daly’s masking effect model [3] is also employed in order to take in account the action of the neighborhood in the perception of a localized stimulus. Once the perceptual representation has been built (17 subband images for the A component and 2 images for Cr1
and Cr2 components), we select locations (called ”characteristic points”) in the subband representation in order to extract features on them. Each characteristic point has coordinates (x, y, sb) where (x, y) are spatial coordinates and sb indicates the extraction subband index. These points are located on concentric ellipses, centered on the image center. We choose a fixed number of points per ellipse so these points are more concentrated in the image center (which generally gathers the objects of interest). This fixes the (x, y) coordinates. For the sb coordinate, we choose the subband presenting the extremum value at location (x, y) (the low frequency subband is excluded for this search since we are looking for contrasts). At last, we extract several features on each characteristic point i. First, a linear structure is extracted by a ”stick growing” algorithm (as described in figure 4). This structure is described by its orientation Oi (in radians), its length Li , its width Wi and its maximum magnitude Mi (located at the center of the linear structure). Basically, this algorithm tries to build a centered segment in all the directions and keeps the longest one. Each segment must represent the structural information it lies on. This processing is faster than using more common methods like filter banks for example. A similar processing is used in [4].
Fig. 4. left: characteristic points locations on image ”lighthouse1” (22 ellipses of 16 points each); right: linear structure extracted by a stick growing algorithm in a subband
We also extract the mean local values of components Co, Cr1 and Cr2 (noted Coi , Cr1i and Cr2i ), computed on a circular neighborhood of radius 5 pixels around the characteristic point i. All these extracted features constitute the reduced description of the image. 2.2. Using reduced descriptions to assess image quality To produce the quality score of a distorted image, the reduced description RDDistorted of the image under test is
compared to the reduced description RR of its reference. The computation of an image quality score is done using several steps. At first, we compute a correspondence coefficient, noted C(FR,i , FD,i ) which indicates the similarity between the feature FR,i (from RR) and the feature FD,i (from RDDistorted ). Features FR,i and FD,i must belong to the same feature type : for example we compute the correspondence coefficient between two lengths (in this case F stands for L) or between two local values of contrast (F represents Co). They also have to be extracted from the same location in the image plane (reference one or distorted one), which means from the same characteristic point i. A correspondence coefficient C(FR,i , FD,i ) between two features (FR,i from RR and FD,i from RDDistorted ) is computed from the absolute difference between them, normalized by the magnitude of the feature from the reference image. FR,i − FD,i ) (1) C(FR,i , FD,i ) = max(0, 1 − FR,i For features O (orientation of the linear structures), there would be of no sense to normalize by the orientation of the linear structure from the reference image. So correspondence coefficient computations have been modified for this type of features. Since the biggest difference between two orientations is π/2, we use the function (of period π) shown in figure 5 (orientations θ and θ + π are the same). Then, for each characteristic point i, several correspondence coefficients C(FR,i , FD,i ), and so several feature types, are combined to produce a local similarity measure LSk (i). We defined the local similarity as the arithmetic or geometric average of correspondence coefficients. We tested 13 local similarity measures which include the features presented in table 1, each similarity measure leading to a particular RR criterion noted Ck . k 1 2 3 4 5 6 7 8 9 10 11 12 13
Used features in LSk (i) Oi (linear structure orientation) Li (linear structure length) Wi (linear structure width) Mi (linear structure maximum magnitude) Coi (contrast mean local value) Cr1i (Cr1 mean local value) Cr2i (Cr2 mean local value) 1 (Oi + Li + li ) 3√ 3 Oi ∗ Li ∗ li 1 (O + L i + li + M i ) i 3√ 4 Oi ∗ Li ∗ li ∗ Mi 1 i + Li + li + Mi + Coi + Cr1i + Cr2i ) 7 (O p 7 Oi ∗ Li ∗ li ∗ Mi ∗ Coi ∗ Cr1i ∗ Cr2i
Table 1. Features used in the 13 tested correspondence measures (and combination : arithmetic or geometric average)
For example, we have the following relations for LS1 (i) and LS8 (i) : LS1 (i) = C(0R,i , 0D,i ) (2) C(0R,i , 0D,i ) + C(LR,i , LD,i ) + C(WR,i , WD,i ) 3 (3) Then, we define the global similarity Sk as the mean value of local similarities LSk (i) computed on each characteristic point i (i = 1..N ). LS8 (i) =
PN
LSk (i) (4) N Correspondence coefficients, local similarities and global similarities range from 0 to 1. To produce an objective quality score N obj, we use a linear transformation of Sk : Sk =
i=1
N obj = λk ∗ Sk + γk
(5)
Parameters λk and γk are optimized in order to minimize the RMSE between N obj and the Mean Opinion Score (MOS) given by human observers during tests and ranging from 1 (lowest quality) to 5 (best quality). The training scored images base is composed of images (and corresponding MOS) which are different from the images of the scored images bases used to measure the performances. C(OR,i , OD,i) 1
|Or,i-Od,i|
0 0
Pi 2
Pi
3.Pi
2.Pi
2
Fig. 5. Function used to compute the correspondence coefficient for features O(i) (orientation)
3. PERFORMANCES VS FEATURES To measure the performances of our quality criteria, we compute the coefficient correlation CC between N obj and MOS (to study monotony). We also compute the quality prediction standard error deviation, the error being the difference between N obj and MOS (indicating accuracy). Table 2 presents the performances for each of the 13 tested RR criteria. These results were obtained with N = 352 and on 3 scored images bases of 100 images each : the IRCCyN/IVC base (multi-degradations), the JPEG images and the JPEG2000 images of the LIVE base [5]. These bases are respectively noted a, b and c. Parameters λk and γk were optimized using respectively 50, 104 and 98 images.
k 1 2 3 4 5 6 7 8 9 10 11 12 13
CCa 0.883 0.886 0.886 0.908 0.903 0.474 0.351 0.899 0.892 0.905 0.898 0.918 0.885
CCb 0.934 0.957 0.949 0.928 0.893 0.876 0.567 0.953 0.958 0.948 0.952 0.961 0.962
CCc 0.930 0.937 0.931 0.937 0.860 0.804 0.501 0.940 0.942 0.940 0.943 0.950 0.948
σa 0.629 0.642 0.656 0.556 0.631 1.075 1.201 0.617 0.603 0.588 0.590 0.504 0.574
σb 0.331 0.263 0.320 0.354 0.384 0.424 0.739 0.289 0.274 0.305 0.290 0.243 0.237
σc 0.317 0.299 0.340 0.305 0.452 0.479 0.739 0.305 0.300 0.302 0.297 0.253 0.259
Table 2. Performances of each criterion on 3 images bases
Results show that criteria C1 to C5 can produce quite good quality scores (CC ranges from 0.883 to 0.957). So the linear structure parameters and the mean value of the contrast component can be important features for quality assessment. On the opposite, criteria C6 and C7 produce objective quality scores which are not enough correlated with subjective ones. Thus, the mean local values of Cr1 and Cr2 are of poorer interest in our criteria. C8 to C13 use a combination of different feature types. The choice between a geometric or an arithmetic average is not crucial since both provide equivalent results. Combinations of feature types can improve performances and criterion C gets very good results (CC ranging from 0.918 to 0.961). Standard error deviations show that our criterion performs better when it is limited to only one type of degradations (JPEG or JPEG2000) and when the training scored images base is important. At last, all criteria (except C6 and C7 ) produce a quality prediction error which is very close to the standard deviation of observers quality scores for a given image. So our quality criteria show the same performance as a real human observer does, chosen randomly among the observers.
Reference image Compression
Display distorted image
Image client
Image server
Quality score of the distorted image RRmake
Reduced description of Compressed image the reference image (reduced reference) (distorted image)
RRuse Reduced reference
RRpack
RRunpack
Reduced description of the distorted image RRmake
Compressed reduced reference Network (like Internet)
Fig. 6. ”RR tools” deployment in a transmission context
5. CONCLUSION We have presented several visual features types and their performances for image quality assessment. On three different scored images bases, we have shown that structural information can be useful to precisely assess image quality. The combination of several feature types enables to produce high correlation coefficients between produced quality scores and subjective ones (from 0.918 to 0.961 depending on the images base). Performances can also be improved by training and limiting the distortions to one given type. At last, we introduced freely available software to assess image quality in a transmission context. 6. REFERENCES [1] Mathieu Carnec, Patrick Le Callet, and Dominique Barba, “An image quality assessment method based on perception of structural information,” in Proceedings of ICIP (IEEE International Conference on Image Processing), Barcelone, Espagne, 14-17 septembre 2003. [2] Abdelhakim Saadane, Laurent Bedat, and Dominique Barba, “Perceptual quantization of chromatic components,” in Human Vision and Electronic Imaging III, Proc of SPIE, Vol. 3299, January 1998.
4. FREELY AVAILABLE APPLICATIONS Our quality metric has been implemented in four applications called ”RRmake”, ”RRuse”, ”RRpack” and ”RRunpack”. These tools are freely available on the Internet at the URL http://www.dcapplications.t2u.com. They are gathered under the name of ”RR tools” and should be deployed as described in figure 6. These applications use the criterion C12 with 176 characteristic points (11 ellipses of 16 points each). Each feature W (i) is quantized on a single byte, which leads to a RR size of 2655 bytes (for any image size). This configuration gives CCa =0.877, CCb =0.950, CCc =0.945, σa =0.587, σb =0.279 and σc =0.284.
[3] Scott Daly, The Visible Differences Predictor : An Algorithm for the Assessment of Image Fidelity, chapter 14, pp. 179–206, MIT Press, 1993. [4] Galen C. Hunt and Randal C. Nelson, “Lineal feature extraction by parallel stick growing,” in Proceedings of ICIP Third International Workshop on Parallel Algorithms for Irregularly Structured Problems, Santa Barbara CA, August 1996. [5] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik, “Live image quality assessment database,” http://live.ece.utexas.edu/research/quality.