Real-time Image Recovery Using Temporal Image Fusion André Mora, José Manuel Fonseca, Rita Ribeiro Center of Technology and Systems, Uninova, Monte da Caparica, Portugal Dept. of Electrotechnical Engineering, Faculty of Sciences and Technology, UNL, Monte da Caparica, Portugal
[email protected],
[email protected],
[email protected] Abstract— In computer vision systems an unpredictable image corruption can have significant impact on its usability. Image recovery methods for partial image damage, in particular in moving scenarios, can be crucial for recovering corrupted images. In these situations, image fusion techniques can be successfully applied to congregate information taken at different instants and from different points-of-view to recover damaged parts. In this article we propose a technique for temporal and spatial image fusion, based on fuzzy classification, which allows partial image recovery upon unexpected defects without user intervention. The method uses image alignment techniques and duplicated information from previous images to create fuzzy confidence maps. These maps are then used to detect damaged pixels and recover them using information from previous frames. Keywords—Image fusion, fuzzy confidence, spatial-temporal fusion, image registration
I.
OVERVIEW
In image processing scenarios where real-time information extraction is required and unpredictable image corruption can occur, image fusion techniques can be of significant importance, especially for unmanned vehicles guidance, such as, aerial vehicles [1], [2], mobile robots [3], [4] or even for planetary landing [5–7]. For example, in the image corruption scenario, shown in Figure 2, image fusion techniques can be applied by gathering information from the previous instants to recover information for current damaged pixels. Image corruption may be caused by three main factors: originated by dust in the image sensor causing sharp dark spots over the image in fixed positions; dirty lens causing blurred regions on the image; or light reflections inside the lens (also known by light flares), which produce light rounded spots that can be moving on the resulting image. The main objective of image fusion is to reduce uncertainty and redundancy while maximizing relevant information, by combining different image representations of the same scene [8]. The general procedure is to use several images of the same scene, provided by different sensors, and combine them to obtain a complete view of the scene, not only in terms of position and geometry, but also in terms of semantic interpretation. In this work, it is proposed a spatial-temporal image fusion architecture with the aim of recovering corrupted images. Image fusion algorithms can be divided into pixel, feature, and symbolic levels. However, feature and symbolic algorithms have not received the same level of attention as pixel-level
algorithms. Pixel-level algorithms are the more common and they work either in the spatial domain or in the transform domain. Although pixel-level fusion is a local operation, transform domain algorithms create the fused images globally. Feature-based algorithms (symbolic) typically segment the images into regions and fuse the regions using the images intrinsic properties [9], [10]. This study is focused on pixellevel methods and techniques. A general architecture for a spatial-temporal image fusion technique is discussed in this paper, which gathers information along several consecutive images to produce pixel-level confidence maps and then enabling fixing damaged pixels. The technique is divided into three main steps: image registration process; temporal fuzzy confidence map generation; and image recovery process. II.
IMAGE FUSION ARQUITECTURE
The image fusion architecture proposed can be considered as a spatial-temporal image fusion technique, since it gathers information along several consecutive images to produce pixellevel confidence maps and, afterwards, fix damaged pixels. The diagram in Figure 1 shows the overall architecture. It is divided into three main steps. The first step is image alignment which is responsible by adjusting zoom, rotation and panning of the new image using image registration techniques or using navigation information (altitude, gyros, etc.) that provides the current camera viewpoint. The objective is to have the last k images aligned with the current image. The second step is the fuzzy confidence map generation. The inputs are the registered image, the previous images and their respective confidence maps. These inputs will be analyzed to detect damaged pixels and producing as output the current confidence map. The pixel’s confidence will reflect qualitatively the probability of being a damaged pixel. Temporal Fusion
Images ( t-3; t-2; t-1;tk)
Confidence ( t-3; t-2; t-1;tk)
Navigation info
Image
Image alignment
Image Registered
Fuzzy Confidence
Confidence map
Image Fuzzy recovery
Figure 1. Image spatial-temporal fusion architecture.
Time
T-1 T-3
T-2
T-3
T-2
T-1
T
T
Figure 2. Sequential image acquisition during an unmanned helicopter landing operation (images from google earth). In this example the captured images are corrupted by fixed sensor noise
Finally, in the last step the pixels from the registered image with low confidence are fused with information from previous images. The inputs are the previous and current images and their corresponding confidence maps and the output is the fused recovered image. III.
IMAGE REGISTRATION
For the image alignment, a pre-processing step involving image registration is required to achieve spatial correspondence between the images. This image registration process is a crucial step to evaluate the geometrical transformations needed to align two or more images. Four groups of image registration techniques have been identified, depending on the image acquisition process [11]: multiview analysis (images taken from different viewpoints), multitemporal analysis (images taken in different times), multimodal analysis (images acquired by different sensors) and scene to model registration (the image is aligned to a previous model). The multitemporal analysis is the one applicable to our scenario. For a better alignment, the camera distortion model should also be available for the geometrical projection of previous images on the current fieldof-view. IV.
FUZZY CONFIDENCE
The detection of incoherent situations by pixel confidence level evaluation is an important feature of any autonomous computer vision system. Fuzzy logic techniques [12], [13] are well suited for measuring qualitatively this confidence level since it can be derived from its pixel intensity, its neighbors’ intensities and/or from its previous confidence levels. For its computation temporal fusion, using previous images, will be taken into consideration (Figure 1). The algorithm parameters are defined with fuzzy membership functions to identify what is a low, medium or high confidence level. These functions can be obtained either empirically or by using any learning techniques such as neural networks [10], [12].
Predicting the next image tk based on Kalmann filters is also envisaged to be studied in order to improve the fuzzy confidence classifier. However, the trade-off between the advantages of using more complex techniques and the computational overhead load should be considered. The inputs to calculate the fuzzy confidence Ct(x,y) for the pixel It(x,y) are: •
It(x,y) intensity
•
intensity and It-n(x,y)..It-1(x,y) confidence Ct-n(x,y)..Ct-1(x,y) levels
•
It(x,y) neighbors average intensity
•
It+1(x,y) predicted intensity
correspondent
These inputs represent both temporal (by using the previous and the next image) and spatial information (through the use of the neighbors’ intensity) used. Spatial information is important to achieve better noise independence [13]. In Figure 3 we present an example of the noise detection algorithm and corresponding pixel confidence level generation. As it can be seen, the system will use a set of archived images (a1 to a3) that in each cycle will be projected to the coordinates of the newly acquired image, obtaining the aligned images (b1 to b3) that should match each other. However, when significant noise is present, images are substantially different from their projected counterparts, creating suspicion about some areas (c1 to c3). The evaluation of the different images will be consolidated on a single difference image (d1) that is the basis for the creation of the pixel confidence levels (d2). In the pixel confidence image (Figure 3.d2) the colors white, yellow and red represent respectively high, medium and low confidence pixels.
1
2
3
a)
b)
c)
d)
Figure 3. Pixel confidence levels generation; a1 to a3) Image obtained in t-2, t-1 and current cycle; b1) image equivalent to a1 obtained from the projection of the image t-3; b2) idem with a2 and t-3; b3) idem with a3 and t-3; c1 to c3) difference between a1 and b1, a2 and b2, a3 and b3; d1) total images differences (sum of c1 to c3) d2) final pixel confidence levels (white is high confidence, yellow medium confidence and red low confidence).
V.
IMAGE RECOVERY: ILLUSTRATIVE EXAMPLE
The image corruption presented can produce dark spots on the image, thus generating shadows inexistent on the real scenario (Figure 2). In a moving application such as UAV landing one, these shadows will remain fixed within the image layout, but the real view of the terrain can be updated as it gets out from the dark areas. As shown in the proposed architecture (Figure 1) in the recovery process a small subset of historic images (typically 3 to 4 images and confidence levels) will be combined (fusion) with the current one into a “recovered”, improved image. Having access to the past images and their respective confidence information, provide the means for recovering low confidence regions using either classical interpolation
techniques or novel approaches such as specialized aggregation operators [14]. Now we illustrate a fusion recovery process using a simple interpolation technique. For the recovery algorithm a small subset of images, typically 3 to 4 images, will be stored in memory among their correspondent confidence levels, mainly for determining confidence levels. In Figure 4, the presence of low confidence levels (b2) on the newly acquired image (b1) motivates a recovery algorithm that begins by projecting the archived image (a1), that also has low confidence pixels (a2) to the coordinates of the new image resulting on image (c1) and corresponding confidence levels (c2). The confidence levels of (a1) are adjusted (in this case they are downgraded from high to medium) because the image is interpolated to produce an image with the same size as (b1) with the same transformation applied to its confidence levels.
a1)
b1)
a2)
b2)
d1)
c1)
c2)
d2)
Figure 4.Image recovery based on confidence levels. a1) Archived image with noise; a2) Confidence levels of image a1; b1) Newly acquired image; b2) Confidence levels of image b1; c1) Image a1 projected to fit b1; c2) Projected confidence levels of image c1; d1) Recovered image from b1 and c1; d2) Confidence levels of recovered image (on the confidence level images white means high confidence, yellow medium confidence and red low confidence).
The two images are compared and low confidence pixels of (b1) are replaced by the corresponding pixels on (c1) if they have a medium or high confidence level. The recovered image is presented on (d1) with its confidence levels on (d2). As it can be seen, a reconstructed image can be obtained by this process with automatic detection of noisy areas. It must be stressed that the level of reconstruction will depend on the type of noise, the performance of the alignment between the images and the number of archived images considered for the reconstruction process. In this example the recovered image (d1) keeps some noisy areas because they coincide on both the captured and projected images and the confidence levels (d2) show a limited trust on the reconstructed pixels. Other approaches could be included as recovery process (step 3). An interesting novel approach for fusing images and information is based on upward and downward reinforcement operators [14]. These specialized operators have already proven to be effective for data fusion of hazard maps in spacecraft safe landing [14], [15] and they will be considered in future developments of this work. Finally, the reconstructed/fused image together with its pixel confidence levels, are then ready to be passed to the following stages of any computer vision system.
VI.
CONCLUSIONS
In this paper, we presented an architecture for temporal image fusion, which allows the detection and correction of faulty pixels/regions. In some real-time applications repairing faulty image acquisition areas cannot be done due to the difficulty in getting to the equipment or because it is in critical operation. Therefore, real-time fusion processes that can detect and correct or overcome such defects are important. The detection of faulty pixels is not a trivial operation. It requires the evaluation of several conditions, such as the pixel value should not change during several iterations, but it also can be due to noise. Therefore, we discussed a novel image fusion architecture, which includes a fuzzy decision classifier to evaluate pixels’ confidence level. The recovery step (fusion of images) will calculate the confidence level from the current and the previous images. The low confidence pixels on each image are replaced by pixels in the same position on the previous images whose confidence was high. Using this technique damaged pixels can be recovered providing an improved image to the computer vision system, hopefully free of damaged pixels. In future works we will consider image fusion using specialized reinforcement aggregating operators for the recovery process, as discussed in section V, because they were successfully applied in spacecraft safe landing site selection with hazard avoidance.
REFERENCES [1] A. E. R. Shabayek, C. Demonceaux, O. Morel, and D. Fofi, “Vision Based UAV Attitude Estimation: Progress and Insights,” Journal of Intelligent & Robotic Systems, vol. 65, no. 1–4, pp. 295–308, Aug. 2011. [2] F. Lin, X. Dong, B. M. Chen, K.-Y. Lum, and T. H. Lee, “A Robust Real-Time Embedded Vision System on an Unmanned Rotorcraft for Ground Target Following,” IEEE Trans. Ind. Electron., vol. 59, no. 2, pp. 1038–1049, Feb. 2012. [3] J. Xue and L. Xu, “Autonomous Agricultural Robot and its Row Guidance,” in 2010 International Conference on Measuring Technology and Mechatronics Automation, 2010, pp. 725–729. [4] T. Lee, W. Bahn, B. Jang, H.-J. Song, and D. Dan Cho, “A new localization method for mobile robot by data fusion of vision sensor data and motion sensor data,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2012, pp. 723–728. [5] A. Da Costa, A. Davighi, S. Bernardi, and A. Finzi, “Hazard avoidance during planetary landing by on-line neural network images analysis,” in 28th ANNUAL AAS GUIDANCE AND CONTROL CONFERENCE, 2005. [6] J. Feng, C. Hutao, and C. Pingyuan, “Autonomous hazard detection and landing point selecting for planetary landing,” in 2010 3rd International Symposium on Systems and Control in Aeronautics and Astronautics, 2010, pp. 1292–1296. [7] L. Simoes, C. Bourdarias, and R. Ribeiro, “Real-time planetary landing site selection—a non-exhaustive approach,” Acta Futura, vol. 5, pp. 39–52, 2012. [8] A. Ardeshir Goshtasby and S. Nikolov, “Image fusion: Advances in the state of the art,” Information Fusion, vol. 8, no. 2, pp. 114–118, Apr. 2007. [9] G. Piella, “A general framework for multiresolution image fusion: from pixels to regions,” Information Fusion, vol. 4, pp. 259–280, 2003. [10] S. S. Hsu, P. P. Gau, I. I. Wu, and J. J. Jeng, “Region-Based Image Fusion with Artificial Neural Network,” in World Academy of Science, …, 2009, vol. 53, pp. 156–159. [11] B. Zitová and J. Flusser, “Image registration methods: a survey,” Image Vision Comput., vol. 21, no. 11, pp. 977–1000, Oct. 2003. [12] T. J. Ross, Fuzzy Logic with Engineering Applications, Third Edition. Wiley, 2010, p. 606. [13] K.-S. Chuang, H.-L. Tzeng, S. Chen, J. Wu, and T.-J. Chen, “Fuzzy cmeans clustering with spatial information for image segmentation.,” Comput. Med. Imag. Grap., vol. 30, no. 1, pp. 9–15, Jan. 2006. [14] R. A. Ribeiro, T. C. Pais, and L. F. Simões, “Benefits of FullReinforcement Operators for Spacecraft Target Landing,” in Preferences and Decisions, vol. 257, S. Greco, R. A. Marques Pereira, M. Squillante, R. R. Yager, and J. Kacprzyk, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 353–367. [15] C. Bourdarias, P. Da-Cunha, R. Drai, L. F. Simões, and R. A. Ribeiro, “Optimized and flexible multi-criteria decision making for hazard avoidance,” in Proceedings of the 33rd Annual AAS Rocky Mountain Guidance and Control Conference, 2010.