Neurocomputing 137 (2014) 71–78
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
3D scene reconstruction enhancement method based on automatic context analysis and convex optimization My-Ha Le, Andrey Vavilin, Kang-Hyun Jo n Graduate School of Electrical Engineering, University of Ulsan, Ulsan, Republic of Korea
art ic l e i nf o
a b s t r a c t
Article history: Received 2 January 2013 Received in revised form 2 April 2013 Accepted 8 May 2013 Available online 3 March 2014
This paper proposes a method for increasing accuracy of the scene model generation. Reconstruction of 3D scene model is strongly affected by moving objects both artificial and natural. In the proposed method context analysis is applied as a pre-filtering operation used to detect and remove objects which could negatively affect reconstruction process. Additionally, robust global rotation constraints are computed based on correspondence of image pairs. These constraints are fed to the model generation procedure. Finally, in contrast with using only canonical bundle adjustment which yields unstable structure in critical configuration and local minima, the proposed method utilized known-rotation frame work to compute the initial guess for bundle adjustment process which overcomes the drawback above. Moreover, the patch-based multi-view stereopsis is applied to upgrade the reconstructed structure. The simulation results demonstrate the tidy of structures reconstructed by this method from scene images in outdoor environment. & 2014 Elsevier B.V. All rights reserved.
Keywords: Context analysis SIFT Correspondence RANSAC Convex optimization PMVS
1. Introduction 3D reconstruction or modeling of scene is an important issue in various applications of virtual environment, scene planning and navigation of autonomous mobile robot. Although, some progresses have been made in the field of 3D reconstruction during the last few decades, still there are no methods satisfying the requirement of high accurate as well as stable results for different kind of dataset. Moreover, some of these methods require a large amount of work done manually or apparatus, such as laser, radar, and airborne light detection and ranging. They are usually expensive and require much more time for data acquisition. In recent years many algorithms have been developed for 3D reconstruction and motion estimation, which can roughly be divided into several categories. Namely: methods using bundle adjustment (BA) [1]; methods based on factorization [2–4] and hierarchical methods [5,6]. In the first group, multi-view structure from motion started from estimating the geometry of two views. This structure will be used to estimate the pose of the adjacent camera. The quality of the reconstruction strongly depends on the initial structure of first camera pairs [7,8]. Another disadvantage of this method is the drift problem [9]. It also has expensive computation cost and suffers from accumulated errors when the
n
Corresponding author. Tel.: þ 82 52 259 1664. E-mail addresses:
[email protected] (M.-H. Le),
[email protected] (A. Vavilin),
[email protected] (K.-H. Jo). http://dx.doi.org/10.1016/j.neucom.2013.05.065 0925-2312 & 2014 Elsevier B.V. All rights reserved.
number of image is increased. In the second group, the missing data and sensitiveness to outliers is the significant drawback. It is well studied by some authors, e.g., [4]. In the third group, the input images must be arranged in the hierarchical tree processed from root to the top. Without using any additional electro-magnetic device out of monocular camera, this proposed method overcomes the disadvantages mentioned above. The flow chart of the proposed method is described in Fig. 1. Using perspective camera, images are acquired in the outdoor scene which contain the complicated object's structure. The problems of reconstruction from scene images are how to detect and remove outliers as well as optimize the results with highest accuracy. One of the outlier problems caused by unnecessary objects appears in image and the object distortion. So, the pre-process named scene analysis should be performed. Some typical methods using neuron network as presented in [10–12] can be applied for object classification and recognition problem. Remark that, in most of outdoor image the sky and cloud often appear as image background and humans, cars, … etc. appear as moving objects. Here, proposed solution for removing that kind of object is similar as that of [13,14]. In the next step, the robust method for global camera rotation estimation based on pair-wise constraint will be performed. In order to find the robust constraints of images, SIFT algorithm [15] is applied to find invariant feature for each pair of views in combinatorial form. The global camera rotation is computed according to graph-based sampling scheme [16]. After obtaining camera rotation matrix in global coordinate, high accuracy point clouds will be generated by
72
M.-H. Le et al. / Neurocomputing 137 (2014) 71–78
Input images Context analysis and object filtering
Fig. 2. Semantic scene structure.
Classified scene
Filtered image
residual error under known-rotation frame work combined BA. PMVS method is briefly summarized in Section 4. The experiments are shown in Section 5. Finally, paper concludes with conclusions and point out future works discussion in Section 6.
Global rotation estimation 2. Context analysis
Scene structure and camera motion reconstruction Stereo mapping
Main purpose of this step is to remove objects which may have negative effect for scene reconstruction process from the further processing. Natural objects such as sky, clouds are difficult for 3D model reconstruction and may cause inconsistence in reconstructed scene due to false matching between points from different frames. To overcome this problem, a context analysis is used. Some typical features for classification are presented in [20,21]. In this work, we consider a scene consisting of three semantic layers (see Fig. 2): first layer contains sky and clouds information; second layer contain buildings, ground and trees; and last layer contains moving objects. Objects of different layers are separated into classes as follows: Layer 1: sky, clouds. Layer 2: building, grass, trees, road and pavement etc. Layer 3: cars, trucks, humans etc.
Dense scene Fig. 1. General proposed scheme.
applying known-rotation frame work [17]. Instead of using only canonical BA which may gain unstable structure of critical geometry configuration and trapped in local minima, the result of known-rotation frame work is treated as initial guess for BA process. This idea is inspired from the original approach as proposed in [18]. However, the model have not created completely until this step. The dense upgrading process is needed. Here, the patch based multiple views stereopsis method (PMVS) [19] is applied to create the final dense scene model. This paper is organized as follows: Section 2 describes the preprocess: context analysis of scene images. Section 3 presents global camera rotation estimation as well as structure and motion recovery method. We also explain how to measure and minimize
Objects of first and third layers should be removed in preprocess of scene reconstruction. The proposed method of scene analysis consists of two steps: first, system is trained to select optimal feature subset for separating buildings from other objects based on set of labeled images. On the second step, these features are used to remove unwanted objects form the input images. Filtering process is based on recursive segmentation of image into rectangular blocks which are classified according to the features selected on the training step. Blocks which could not be classified with a high probability are separated into four subblocks and process repeats. Process stops after all blocks bigger than predefined minimum size are classified. Isolated unclassified blocks and blocks belonging to unwanted classes are filtered out. This process is illustrated in Fig. 3. More details on proposed context analysis could be found in [13,14].
M.-H. Le et al. / Neurocomputing 137 (2014) 71–78
Input image Scene conditions probabilities
Scene condition estimation
Segmenting image into 64 rectangular regions
Feature database
Sub-region features
Classifying each segment according to selected scene conditions Unclassified blocks
Image description Dividing blocks into 4 sub-blocks
Removing blocks classified as unwanted
Filtered image Fig. 3. Image filtering process.
3. Scene reconstruction under critical cases The critical cases in three dimension reconstruction often cause the unstable results. This problem has been mentioned early in [22,23]. There are two main critical cases: critical configuration and critical motion sequences. The typical critical configurations, e. g., short baseline problems are solved to improve the stability of structure which is discussed as limitation in [19]. According to proof in [24], camera translations change drastically with respect to small baseline while rotations are still stable. Moreover, the demonstrations of known-rotation frame work [17] point out that: only global rotations are needed to generate 3D structure and translation. So, once the robust camera rotations are obtained, the sparse model also will be obtained. In the next step, the global rotation estimation is described.
correspondent points of image pairs. The SIFT algorithm are described through these main steps: scale-space extrema detection; accurate keypoint localization; orientation assignment and keypoint descriptor. The result of correspondence points will be used to compute fundamental matrix described in the next step. 3.1.2. Pair-wise camera motion constraint The result of correspondence point in previous step will be used to compute fundamental matrix. The epipolar constraint represented by a 3 3 matrix is called the fundamental matrix. Also, this method is based on two-view geometry theory which was studied completely [28]. According to the theory, once the intrinsic parameters of the cameras are known, the fundamental epipolar constraint above can be represented algebraically by a 3 3 matrix, called the essential matrix. This matrix is used to recover rotation constraint. 3.1.3. Global camera rotation estimation Some common approaches for camera registration were proposed by several authors. As in [29], authors used cycles in the camera graph and Bayesian frame work for incorrect pair-wise detection. Another linear solution based on least square method was presented in [30]. Whereas in [31] branch-and-bound search over rotation space was used to determine camera orientation. In this paper, we apply a robust rotation averaging method as proposed in [16]. The results proclaim that graph-based sampling scheme efficiently removes outliers in the individual relative motions based on RANSAC scheme. A short description for this method is presented as follows: given the relative rotation Rij, how to find a robust method to compute a set of all camera rotations Rk in the global coordinate, e.g., Ri ¼ Rij Rj
3.1.1. Features extraction and matching There are many kind of features that are considered in recent research in feature extraction and matching problem including Harris corner [25], SURF [26], GHOL [27], etc. SIFT [15] was first presented by David G. Lowe in 1999 and it is completely applied in pattern recognition problem in 2004. As described in Section 5, the proposed algorithm is very invariant and robust for feature matching with scaling, rotation, or affine transformation. According to those conclusions, we utilize SIFT feature points to find
ð1Þ
According to the reference paper, the algorithms of this method might be summarized as below. Algorithm 1. RANSAC algorithm for robust motion averaging Input: fRij1 ; Rij2 ; …; Rijn g (n relative motions) Distance threshold D0 and number of trials T Output: Rg ¼ fR2 ; R3 ; …; RN g (N image global motion) –Set G: view-graph of relative motions –Generate minimum spanning trees MSTe¼MST(G) –Solve for global motion Rmst using MSTe –Count the number of relative motions within distance D0 of Rmst –Repeat for T trials and select MST with maximal count –Discard relative motions that are outliers for this MST –Using the inliers solve for Rg using Algorithm 2.
3.1. Global camera rotation estimation In order to compute the global camera rotation, first, we find the essential matrix which describes the relative position and orientation of image pair-wise. This matrix is estimated using correspondences point with outlier removal and intrinsic parameters of camera. Second, the global rotation constraints will be computed based on these local pair-wise constraints by averaging motion method.
73
Algorithm 2. Relative motion averaging Input: fRij1 ; Rij2 ; …; Rijn g (n relative motions) Output: Rg ¼ fR2 ; R3 ; …; RN g (N image global motion) Set Rg to an initial guess Repeat ΔRij ¼ Rj 1 Rij Rj Δr ij ¼ log ðΔRij Þ ΔV ij ¼ vecðΔr ij Þ
Until
Δδ ¼ D† ΔV ij 8 k A ½2; N; Rk ¼ Rk expðΔV k Þ jjΔδjj o e
where D† is the pseudo-inverse.
74
M.-H. Le et al. / Neurocomputing 137 (2014) 71–78
3.2. Structure and motion recovery
where f 1 ðxÞ2 , f 2 ðxÞ2 and λðxÞ2 are affine functions in x and with coefficients determined by u and P.
In this section, triangulation with known rotation consistency will be recast as quasiconvex optimization problem. Some authors proposed methods using L1-norm [32] or L1-norm [33] instead of L2-norm in minimizing the residual error of measured feature and back-projection of 3D points. It is easy to figure out that solving the L2-norm for more than two cameras is a hard non-convex problem. It can yield local minima instead of single global minimum if error is minimized in L1-norm. Here, the solution which is similar in [17] is applied. However, the BA should be performed afterwards to improve the final structure and motion result. The next section will formulate and solve the triangulation problem by using bisection convex optimization method.
Remark 1. The problem min maxi dðui ; P i UðxÞÞ has some convexity properties. Thus, this problem can be solved by quasiconvex optimization method.
3.2.1. Problem formulation Let Pi, i ¼1, 2, …, m are the m known cameras and ui are the projection of point U in 3D space (both are expressed in homogeneous coordinates). The problem of finding U given the camera matrices and image points is triangulation. In the ideal case (absence of noise), the triangulation is ordinary. In the noise case, the back-projection of point U to image plane does not coincide with ui. Thus, we must find point U that its projection is nearest ui, i.e., minimizes the cost function: m
∑ dðui ; P i UÞ2
ð2Þ
i¼1
min γ Subject to jjf 1i ðxÞ; f 2i ðxÞjj rγλi ðxÞ λi ðxÞ 4 0; i ¼ 1; 2; …; m
ð5Þ
if γ is considered unknown, Eq. (5) can be rewritten in second order cone programs (SOCP) feasibility problem in the form: find x Subject to jjf 1i ðxÞ; f 2i ðxÞjj rγλi ðxÞ λi ðxÞ 4 0; i ¼ 1; 2; …; m
ð6Þ
n
Assume that the optimal γ is lower than some threshold of γu pixels, then evidently γ n A ½0; γ u . The typical convex feasibility problem solving is applied. The algorithm is presented shortly as below. Algorithm 3. Bisection based quasiconvex optimization solver n
here d( U; U) represents the geometric distance between two points in the image. Arguments which are presented in [14] point out that the L2-norm error of this cost function in three view triangulation creates three local minima whereas the L1-norm create single minimum. The known-rotation problem will be described in detail as follows: considering the camera matrix P we try to solve the minimization problem: min maxi dðui ; P i UðxÞÞ subject to λi ðxÞ 4 0; i ¼ 1; 2; …; m
ð3Þ
here λi ðxÞ is the depth of the point in image i. It is easy to realize that the square image distance is a rational function of x: dðu; PUðxÞÞ2 ¼
3.2.2. Bisection based quasiconvex optimization solver Suppose that γ is an upper bound of the objective function in problem Eq. (3). According to theory in [34], this problem can be formulated again with the form:
f 1 ðxÞ2 þf 2 ðxÞ2 λðxÞ2
ð4Þ
Given: optimal value f 0 A ½γ l ; γ u and tolerance ε 4 0 Repeat 1. γ : ¼ ðγ l þ γ u Þ=2 2. Solve the convex feasibility problem 3. if feasible γ u ¼ γ, else γ l : ¼ γ Until γ u γ l r ε
4. Stereo mapping In recent years, PMVS is considered as state of the art for dense modeling. In this algorithm, the seed points or initial spare point cloud are generated from BA which is optimized based on L2-norm. The problem is that the result of pre-PMVS i.e., BA, cannot find the global minimize back-projection error, it is local
Fig. 4. Point clouds of “Sangjing statue”: (a) and (b) some views of dataset and point clouds; (c) back-projection of point clouds into image (marked in red color “o”) and original image points (marked in blue color “n”). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
M.-H. Le et al. / Neurocomputing 137 (2014) 71–78
minimum values. Moreover, BA method strongly based on the initial result of first image pair constrains and performs incremental reconstruction. In some special critical configuration of image data like pure translation or pure rotation the initial geometry constrains result may not be stable. So, the final result of scene model may far differ with the ground truth. In this work, dense scene models are also generated in the same processes but the seed points and camera poses are fed from previous global optimal computation.
5. Experiments In this section, the simulations are presented to evaluate effectiveness of proposed method. The main objects are medium and large scale scenes in outdoor environment. The images are acquired by digital perspective camera (Casio EX-Z90). In the first experiment, authors try to reconstruct a medium scale scene
75
which has three statues stand in front of three pillars located on campus square. This data is named “Sangjing” statue and 30 images of size [2048 1536] were used. For accurate demonstration of this method, pre-filtering operation is first evaluated. The performance of context analysis and scene object classification was tested separately. A dataset composed from 100 manually labeled images were used. Each image could contain one of the following classes: buildings; road; sky; trees, grass; cars; clouds; and pavements. Object was considered as correctly detected if the number of non-overlapping pixels in detected and labeled object was less than 5% of its area. The total number of labeled objects in dataset was 472. Among them 438 (92.7%) were correctly recognized. The demonstration of robustness in critical cases and high accuracy of point clouds is also evaluated in second process. Fig. 4(a) and (b) are one of the views of dataset image as well as point clouds of “Sangjing” statue. It is easy to recognize that the back-projection of 3D point into image quite near together as in Fig. 4(c), i.e., this
Fig. 5. (a) Back-projection error: (a) the zoomed views of back-projection error and (b) background rectangular are image pixels.
Fig. 6. Dense model: (a) one of 30 input images; (b) the model without pre-process and (c) the model including context analysis in pre-process.
76
M.-H. Le et al. / Neurocomputing 137 (2014) 71–78
Fig. 7. Dense model: (a) one of 36 input images; (b) background removal; (c) side view of scene model and (d) top view of scene model.
proposed method demonstrated high accuracy in point cloud triangulation. The zoomed view of this error can be seen clearly in Fig. 5. According to this figure, the error distance is less than one pixel. Thirdly, the effectiveness of applying context analysis in dense scene modeling is shown. Based on the optimal spare point clouds, the dense upgrading is performed according to PMVS algorithm (cf. Fig. 6). Here, Fig. 6(b) is the dense upgrading without context analysis in pre-process step. It is easy to see that the outliers appear in this model (points from sky region). In contrast, the result consists of pre-filtering (cf. Fig. 6(c)) showing the better model without outliers. In the second experiment, the images were taken in larger view. The number of image in these simulations is 36, 41, and 322. The result of dense model is presented as in Figs. 7, 8, and 9 respectively. In Fig. 7, we focus on the center statue. The reconstructed structure is clear. In Figs. 8 and 9, the number of input
image also affect to the density of 3D patch. When the quantity of images increases, the structure is denser.
6. Conclusions Outdoor scene reconstruction from multiple views based on context analysis and known-rotation frame work combined with BA method is presented in this paper. Some advantages were pointed out through our experiments. Firstly, the pre-process was performed to remove unnecessary objects which often cause the outliers. Secondly, the method avoids using BA alone which generate unstable structures in critical geometry configuration. Moreover, BA algorithm used L2-norm that may trap in local minima. Instead, the convex optimization combined with BA utilizing known-rotation
M.-H. Le et al. / Neurocomputing 137 (2014) 71–78
77
Fig. 8. Dense model: (a) one of 41 input images and (b–d) several angle of view of scene model.
Fig. 9. Dense model: (a) one of 322 input images and (b–d) several angle of view of scene model.
frame work creates robust initial guess for structure generation process. Thirdly, in the global camera rotation estimation, graphbased sampling scheme according to RANSAC yields robust estimation results. Our future works focus on the comparison of this
method with L1-norm. It is expected to improve this method for Omni-directional camera in outdoor scene. The lasting target of this method is constructing real time scene understing and visual SLAM system.
78
M.-H. Le et al. / Neurocomputing 137 (2014) 71–78
Acknowledgment This work was supported by the 2013 Research Fund of University of Ulsan.
References [1] P.M. Triggs, R. Hartley, A. Fitzgibbon, Bundle adjustment – a modern synthesis, 2000. [2] C. Tomasi, T. Kanade, Shape and motion from image streams under orthography: a factorization method, in: Proceedings of the European Conference on Computer Vision, 1992. [3] P.F. Sturm, B. Triggs, A factorization based algorithm for multi-image projective structure and motion, in: Proceedings of the 4th European Conference on Computer Vision – Volume II, Springer-Verlag, 1996, pp. 709–720. [4] J.-P. Tardif, A.B.M. Trudeau, N. Guilbert, S. Roy, Algorithms for batch matrix factorization with application to structure from motion, in: Proceedings of the Conference on Computer Vision and Patern Recognition, 2007. [5] D. Nister, Reconstruction from uncalibrated sequences with a hierarchy of trifocal tensors, in: Proceedings of the European Conference on Computer Vision, 2000. [6] R. Gherardi, M. Farenzena, A. Fusiello, Improving the efficiency of hierarchical structure and motion, in: Proceedings of the Conference on Computer Vision and Pattern Recognition, 2010. [7] T. Thormaehlen, H. Broszio, A. Weissenfeld, Keyframe selection for camera motion and structure estimation from multiple views, in: Proceedings of the European Conference on Computer Vision, 2004. [8] P. Torr, A. Fitzgibbon, A. Zisserman, The problem of degeneracy in structure and motion recovery from uncalibrated image sequences, Int. J. Comput. Vision 32 (1) (1999) 27–44. [9] K. Cornelis, F. Verbiest, L. Van Gool, Drift detection and removal for sequential structure from motion algorithms, IEEE Trans. Pattern Anal. Mach. Intell. 26 (10) (2004) 1249–1259. [10] D.S. Huang, Radial basis probabilistic neural networks: model and application, Int. J. Pattern Recognit. Artif. Intell. 13 (7) (1999) 1083–1101. [11] D.S. Huang, Ji-Xiang Du, A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks, IEEE Trans. Neural Netw. 19 (12) (2008) 2099–2115. [12] D.S. Huang, W.-B. Zhao, Determining the centers of radial basis probabilities neural networks by recursive orthogonal least square algorithms, Appl. Math. Comput. 162 (1) (2005) 461–473. [13] Andrey Vavilin, My-Ha Le, Kang-Hyun Jo, Optimal feature subset selection for urban scenes understanding, in: Proceedings of the URAI, 2010. [14] Andrey Vavilin, Kanghyun Jo, Moon-Ho Jeong, Jong-Eun Ha, Dong Joong Kang, Automatic context analysis for image classification and retrieval, Lect. Notes Comput. Science 6838 (2011) 377–382. [15] D. Lowe, Distinctive image features from scale-invariant interest points, Int. J. Comput. Vision 60 (2004) 91–110. [16] V. Govindu, Robustness in motion averaging, in: Proceedings of the European Conference on Computer Vision, 2006. [17] R. Hartley, F. Kahl, Multiple view geometry under the L1-norm, IEEE Trans. Pattern Anal. Mach. Intell. 30 (9) (2008) 1603–1617. [18] C. Olsson, A. Eriksson, R. Hartley, Outlier removal using duality, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010. [19] J. Ponce, Y. Furukawa, Accurate, dense, and robust multi-view stereopsis, IEEE Trans. Pattern Anal. Mach. Intell. 32 (8) (2009) 1362–1376. [20] Z.-Q. Zhao, D.S. Huang, B.Y. Sun, Human face recognition based on multiple features using neural networks committee, Pattern Recognit. Lett. 25 (12) (2004) 1351–1358. [21] Xiao-Feng Wang, D.S. Huang, Huan Xu, An efficient local Chan–Vese model for image segmentation, Pattern Recognit. 43 (3) (2010) 603–618. [22] P. Sturm, Critical motion sequences for monocular self-calibration and uncalibrated Euclidean reconstruction, in: Proceedings of the Conference on Computer Vision and Pattern Recognition, 1997. [23] F. Kahl, R. Hartley, K. Astraom, Critical congurations for n-view projective reconstruction, Comput. Vision Pattern Recognit. 2 (2001) 158–163.
[24] O. Enqvist, F.K., Fredrik, Carl Olsson, Non-sequential structure from motion, in: Proceedings of the Workshop on Omnidirectional Vision, Camera Networks and Non-Classical Cameras, 2011. [25] M.S. Harris, A combined corner and edge detector, in: Proceedings of the Alvey Vision Conference, 1998, pp. 147–151. [26] Herbert Bay, T.T. Luc Van Gool, SURF: speeched up robust features, Eur. Conf. Comput. Vision 3951 (2006) 404–417. [27] K. Schmid, C. Mikolajczyk, A performance evaluation of local descriptors, Trans. Pattern Anal. Mach. Intell. 27 (10) (2005) 1615–1630. [28] R.I. Hartley, A. Zisserman, Multiple view geometry in computer vision, 2004. [29] C. Zach, M. Khalid, M. Pollefeys, Disambiguating visual relations using loop constraints, in: Proceedings of the Conference on Computer Vision and Pattern Recognition, 2010. [30] T. Pajdla, D. Martinec, Robust rotation and translation estimation in multiview reconstruction, Comput. Vision Pattern Recognit. (2007) 1–8. [31] F. Kahl, R. Hartley, Global optimization through rotation space search, Int. J. Comput. Vision 82 (2009) 64–79. [32] K., Sim, R., Hartley, Removing outliers using the L1-norm, in: Proceedings of the Conference on Computer Vision and Pattern Recognition, 2006, pp. 485–492. [33] R. Keriven, A. Dalalyan, L1-penalized robust estimation for a class of inverse problems arising in multiview geometry, Neural Inf. Process. Syst. (2009). [34] L. Vandenberghe, S. Boyd, Convex Optimization, Cambridge University Press, New York, NY, USA, 2004.
My-Ha Le received his B.E. and M.E. degrees from Department of Electrical and Electronic Engineering of Ho Chi Minh University of Technology, Viet Nam, in 2005 and 2008, respectively. Since 2007, he has been serving as a faculty member in the Department of Electrical and Electronic Engineering, Ho Chi Minh University of Technology and Education, Ho Chi Minh City, Viet Nam. He is currently a Ph.D. candidate at the Graduate School of Electrical Engineering and Information Systems, University of Ulsan, Ulsan, Korea. His research interests include 3D computer vision, pattern recognition, and vision based robotics.
Andrey Vavilin has obtained his bachelor degree in Applied Mathematics from Novosibirsk State Technical University (Russia) in 2004. He received his M.S. and Ph.D. degrees in Electrical Engineering from Ulsan University (Korea) in 2007 and 2011, respectively. Since 2011 he has joined Intelligent Systems Laboratory in University of Ulsan as a Post-Doctorate Fellow. His research interests include computer vision, pattern recognition, 3D scene analysis, intelligent transportation systems and super resolution.
Kang-Hyun Jo received his B.E. degree in Mechanical and Precision Engineering from Busan National University, Korea and his M.E. and Ph.D. degrees in Computer Controlled Machinery Engineering from Osaka University, Japan, in 1989, 1993, and 1997, respectively. He is currently a Professor at the Faculty of Electrical Engineering and Information Systems, University of Ulsan, Korea. His research interests include computer vision, human–computer interaction, robot applications in town and health care and intensive intelligent systems. He is actively participating as a member in various professional research societies, like IEEE, IEEK, ICROS, KRS, KIPS, KIISE and KSAE.