Crowdsourcing for Cropping

Report 3 Downloads 148 Views
Crowdsourcing for Cropping Sally Ahn Computer Science Department at UC Berkeley [email protected]

Abstract The composition of a photograph plays a vital role in determining its aesthetic quality. Cropping is a relatively simple way to improve a photograph’s composition. In contrast to this simple task, however, the underlying theory of visual aesthetics are complex and fuzzy. What determines humans to prefer one special composition over another is still an open question in psychology, but

Introduction The composition of a photograph plays a vital role in determining its aesthetic quality, and one of the most simple and effective ways of improving a photograph is cropping. However, although cropping is a relatively simple task, the underlying theory of visual aesthetics are complex and fuzzy. It is easy to chop off some edges of a photograph, but it takes considerable skill and practice to know exactly where and how much to cut off to yield a more pleasing photograph. Meanwhile, with the growing usage of mobile devices, the number of digital photographs, many of which are amateur photographs, is growing faster than ever. Thus, the motivation is high for an automated system that can efficiently improve the quality of these photographs through simple means like cropping. Although there has been much previous work in automated cropping algorithms, very few base their algorithms on human psychology. What determines humans to prefer one composition over another is still an open question in psychology, and there has been recent studies on spatial aesthetics that suggest that general principles like the “Golden Rule of Thirds” are unfounded and fail to describe the outcomes of their experiments with real humans [ref]. Thus, we are interested in the following question: Can we gain better measures for evaluating the composition of a photograph by analyzing real humans' cropping patterns? The emergence of a marketplace in crowdsourcing provides us with a platform on which we can gather a large

amount of data quickly and cheaply. This enables us to let real people show us what they consider to be a “good” crop. This is significantly different from presenting an automatically generated crop to humans for approval, a common approach to evaluating prior works in automated crops. One of the great challenges in automated cropping algorithms is the failure to detect significant semantics within a photograph that a human would instantly recognize. For example, a photograph of a dead soldier on a barren landscape holds significant meaning in the photograph, but to a computer vision program, a horizontal gray figure may easily be overlooked as the main subject. The purpose of this paper is to describe our approach for gathering data to help us to formulate a more robust theory on composition aesthetics.

Related Work

(a)

(b)

Figure 1. These are sample photos from our experiment. (a) is a landscape photograph without a single subject. The subject of (b) is small and dark, and could easily be missed by computer vision algorithms. There has been much previous work in automated cropping algorithms. Suh et al. uses low-level saliency detection to automatically create thumbnail croppings [ref]. Such methods evaluate regions of photographs based on lowlevel features such as brightness, color, etc. As we mentioned earlier, such features often miss regions in the photograph that are of semantic importance. In a different

approach, Santella et. al. finds "regions of interest" in the photograph by following the users' gaze for that photograph [ref]. This overcomes the loss of semantic information, but it places burden on the user by requiring inputs of their gazing pattern. Luo presents a method for detecting the main subject, which it uses to create a belief map about the photograph content, and then finds the optimal window for that subject. However, this method requires a distinct subject, and many photographs, such as landscapes, lack a single subject; in such cases, the photograph as a whole is the subject [Figure 1 (b)].

Method This section describes the design of our experiment. We use Mechanical Turk to post our cropping tasks. For the cropping interface, we use JCrop [ref], a simple, effective, and free web-based tool that allows user to freely reposition and resize their crop before submitting their final decision. After the cropper has submitted their crop, we also ask a survey to gather some demographic information: age, gender, photography experience, and art experience (for the last two, we provide a 1-5 scale with 1 meaning “novice” and 5 meaning “expert”). Such information about the croppers will provide us with additional data that we can analyze for patterns in correlation with their cropping decisions.

the cropper) to vote between the original and cropped version. This helps us to eliminate nonsensical submissions by lazy workers. For higher quality, we also follow [ref]’s majority vote paradigm by requiring no less than two matching votes before declaring the crop as approved or rejected. Furthermore, we ask the voters the same demographic survey that we asked the croppers. We tried to vary the photographs we asked the workers to crop. Specifically, each photograph was tagged with the number of subjects and whether it belonged to one of the following categories: artistic, documentary, commercial, landscape. We chose these categories because they provided a great variety of photographs that also contained high semantic values. TODO: describe data in detail (how many photographs total per category; how many results from each, etc.).

Evaluating Data Crowdsourcing enabled us to gather a massive amount of data, but the challenge remained to interpret the hundreds[?] of crops in a meaningful way for clues on general aesthetic preferences. Thus we employ “pixelvoting” on each distinct photograph to aggregate their crops. To elaborate, each original photograph of width and height gets mapped to a score matrix of width x height dimensions. The elements of this score matrix are scalar values si for each pixel pi in the original photograph, where for each crop, we increment si if pi is in the crop and decrement si if it is outside the crop.

Results

Figure 2. Amazon Mechanical Turk is a crowdsourcing platform we use to recruit our croppers. This figure shows the JCrop cropping interface and the survey question that follows.

Crowdsourcing Task Design Although the crowdsourcing platform provides the advantages of relatively fast and cheap recruitment, the lack of control over our croppers in a marketplace environment creates a worry for “garbage” data. There have been many papers and studies on methods for gathering reliable data from Mechanical Turk [refs]. We adopt the “Fix-Verify” idea of Soylent’s Find-Fix-Verify paradigm; that is, after a crop is submitted, a HIT will be created that asks a new worker (who cannot be the same as

TODO: This section will describe the results of the aggregation algorithm described above. What do the visualizations reveal about the crops? Common ratio aspects? Patterns across demographic information? Patterns across photograph categories?

References Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarow- icz, A., White, B., White, S., and Yeh, T. Vizwiz: Nearly real-time answers to visual questions. UIST (2010). Liu, L., Chen, R., Wolf, L., and Cohen-Or, D. Optimizing photo composition. Computer Graphic Forum (Proceedings of Eurographics) 29, 2 (2010), 469–478. Palmer, S., and Gardner, J. Aesthetic issues in spatial composition: Effects of position and direction on framing single objects. Spatial Vision (2008). Quinn, A. J., and Bederson, B. B. Human computation: A survey and taxonomy of a growing field. CHI (2011).

von Ahn, L. Human Computation. PhD thesis, Carnegie Mellon University, 2005. J. Luo. Subject content-based intelligent cropping of digital photos. in Proc. ICME, pages 2218-2221, 2007. M. Nishiyama, T. Okabe, Y. Sato, and I. Sato. Sensation-based Photo Cropping. in Proc. ACM International Conference on Multimedia, 2009. A. Santella, M. Agrawala, D. DeCarlo, D. Salesin, and M. Cohen. Gaze-based interaction for semi-automatic photo cropping. in Proc. SIGCHI Conf. Human Factors in Computing Systems, pages 771-780, 2006. B. Suh, H. Ling, B. Bederson, and D. Jacobs. Automatic thumbnail cropping and its effectiveness. in Proc. ACM UIST, pages 95-104, 2003. M. Zhang, L. Zhang, Y. Sun, L. Feng, and W. Ma. Auto Cropping for Digital Photographs. in IEEE Conf. on Multimedia and Expo, July 2005.

Figure 3. One of our original photographs (top) and a cropped photograph from a Mechanical Turk worker (bottom).