Big Data Analytics Guide - SAP.com

Report 5 Downloads 371 Views

SAP Solutions for Analytics. Big Data Analytics Guide. Better technology, more insight for the next generation of business applications ...

SAP Solutions for Analytics

Big Data Analytics Guide

Better technology, more insight for the next generation of business applications

Big Data Analytics Guide 2012

Big Data Analytics Guide 2012

Big Data Analytics Guide

1

Big Data Analytics Guide: 2012 Published by SAP © 2012 SAP AG. All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company. Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase Inc. Sybase is an SAP company. Crossgate, [email protected] EDDY, B2B 360°, and B2B 360° Services are registered trademarks of Crossgate AG in Germany and other countries. Crossgate is an SAP company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. Library of Congress Cataloging-in-Publication Data SAP Big Data Analytics Guide 2012: How to prosper amid big data, market volatility and changing regulations / Edited by Don Marzetta. p. cm. ISBN 978-0-9851539-6-0 1. Big data. 2. Analytics. 3. Databases

2

Big Data Analytics Guide

Welcome to Big Data Analytics Guide 2012

Big Data Means Big Business By Steve Lucas, Executive Vice President and General Manager, Database and Technology, SAP

With more and more people spending much of their existence in the digital world—whether it’s for work, play, learning, or to socialize—the amount of data being generated is truly astounding. Just think about the number of SMS messages and emails sent, phone calls placed, and Facebook updates made every minute, and it boggles the mind how much data is traversing networks around the world. At SAP, we believe this confluence of events is a golden opportunity for enterprises to rethink how they do business, and our goal is to help them do their jobs better than ever. As co-chair of the Big Data commission sponsored by TechAmerica, I’m hearing firsthand how government and private enterprise is evaluating how to embrace a Big Data world. But this requires organizations to take a new approach to data. Innovations in in-memory computing are turning the whole idea of data management on its head by allowing enterprises to get rid of the complexity that’s been encroaching on their systems. That’s why we’ve built the real-time data platform with SAP HANA at its core, giving enterprises the foundation they need to embrace Big Data. But we recognize a platform is not enough. Together, we need to dream new ways to do business by leveraging Big Data insights. And at SAP we’re serious about making this a reality by investing in new businesses through the $155 million venture start-up fund dubbed the ‘SAP HANA Real-Time Fund,’ through the HANA Start-up program, through innovation labs around the world, and most importantly by co-innovating with our customers. To further our path toward better, smarter ways of working with data, we’ve put together a series of articles in the Big Data Analytics Guide. Within these pages you’ll find real solutions, real ways of operating, real results and perhaps most importantly real technology that can be used in your business today. The guide outlines the opportunity and business case for Big Data in the first chapter, and subsequent chapters look at SAP technology innovations, real-world examples, and insights from analytic leaders who are on the forefront of the Big Data market. In the last chapter, you’ll find a set of interesting market research statistics that highlight how C-level executives are using Big Data now and their plans for using it to their advantage in the future. So join in the conversation and co-innovate with us to re-invent business. n

Big Data Analytics Guide

3

Table of Contents 03

Welcome to Big Data Analytics Guide 2012

30

Big Data Means Big Business

30: Data Variety Is the Spice of Analytics

Big Data Opportunity

33: Text Analytics for Speed Reading—Do You Mean What You Say?

By Steve Lucas, Executive Vice President and General Manager, Database and Technology, SAP

06

Analytics Advantage By Amr Awadallah, CTO, Cloudera

06: Measuring the Value and Potential Yield of Big Data Projects

By Seth Grimes, Strategy Consultant and Industry Analyst, Alta Plana

By Dan Lahl, Director of Analytics, SAP

36: Image Recognition, Pattern Identification, and the New Memory Game

08: The Numbers are In: Early Stage ROI and Proof of Concept

By Joydeep Das, Director, Data Warehousing and Analytics Product Management, SAP

By David Jonker, Product Marketing Director, SAP

38: Technology Alone is Not the Answer

10: Analytics in the Cloud: Traversing a Legal Minefield

By Byron Banks, Vice President of Business Analytics Marketing, SAP

By Dr. Brian Bandey, Principal, Patronus

13: Big Data Analytics Earns High Scores in the Field

40

40: What’s All the Hadoop-la About?

By Wayne Eckerson, Principal, BI Leader Consulting

016: Big Data Is Only a Small Part of the Opportunity

By Mike Upchurch, Chief Operating Officer, Fuzzy Logix

18

43: Fast Flowing Decisions Through Streams of Data

By Irfan Khan, Senior Vice President and Chief Technology Officer, SAP Database and Technology

Business Analytics Roadmap 18: Business Value through Operational BI

By Claudia Imhoff, President of Intelligent Solutions, Inc. and Founder of the Boulder BI Brain Trust

45: Age of Influence: Making the Most of Social Networks

By Bruno Delahaye, Senior Vice President Worldwide Business Development, KXEN

21: Real-time Data Platform for a Real-time World By Amit Sinha, Head, Database and Technology Innovation, SAP

48: Embracing a Standard for Predictive Analytics By Michael Zeller, Ph.D., CEO, Zementis

23: How HANA Changes the Database Market By Ken Tsai, Vice President HANA Solution Marketing, SAP

51: How Modern Analytics “R” Done

By Jeff Erhardt, Chief Operations Officer, Revolution Analytics

25: DBTA: Data Marts Can’t Dance to Data’s New Groove

54: Navigating a 4G World

By John Schitka, Senior Product Marketing Manager, Sybase IQ

By Greg Dunn, Vice President, Sybase 365

57: Increasing the IQ of Everyone with Analytics

28: In-Database Analytics: Reducing Travel Time By Courtney Claussen, Sybase IQ Product Manager, SAP

Analytics Innovations

By Jürgen Hirsch, CEO, Qyte GmbH

60

Market Data 60: The Big Deal with Big Data By IDC

68

Big Data Analytics Guide

Company Index

5

Big Data Opportunity

Measuring the Value and Potential Yield of Big Data Projects Where should companies look for the return on investment that determines whether, or how much, Big Data projects pay off?

By Dan Lahl, Director of Analytics, SAP

Data analytics has traditionally been expensive and inefficient, but new analytical platforms optimized for Big Data are heralding a brave new world. Hadoop, an open-source Apache product, and Not Only SQL (NoSQL) databases don’t require the significant upfront license costs of traditional systems, and that’s making setting up an analytics platform—and seeing a return on the investment (ROI)—more accessible than ever before. Costs are coming down, but this is no free lunch. Crunching Big Data analytics still requires hardware, database administrators, developers to build the models, business

6

intelligence tools, and training for the people who will use it to make decisions. Where to Look for ROI Companies can look for ROI in these three ways: do what they’re already doing better, do more of what they’re already doing, and do things they’ve never thought about before. Big Data provides new solutions for current problems, converts incomprehensible data to actionable business recommendations, and makes previously impossible business models possible. The first, and often most compelling, benefits of upgrading to a Big Data platform are in speed and cost savings gains. The new systems allow organizations to do what they are already doing faster, better, and cheaper. What used to take hours or days suddenly only takes minutes. As data volume, velocity, and variety have grown, legacy data warehouse systems have been bogging down—unable to handle bigger data, more users, and increasingly complex queries. Results take longer. Users get frustrated and stop using the data because it takes too long. Many data warehouses are in this predicament today, having hit a performance and scalability

Big Data Analytics Guide

wall. Big Data technologies break through these roadblocks, delivering faster performance and higher availability. The second benefit is the ability to do more. Not only will a Big Data system not bog down at current usage rates, but it can also handle more users, more data, and more complex queries. For example, instead of storing a year’s worth of loan default records, a lender can store 30 years’ worth and perform more detailed analyses, with more accurate results. Additionally, a Big Data system can handle a mixed workload. Instead of processing a few queries from a handful of power users, it can handle a bunch of short technical queries by a large front-line service team, such as a customer lookup that identifies cross-sell and up-sell opportunities in real time. The third piece of Big Data ROI is that it opens up new opportunities. Once the data has been liberated and employees can get at it, they find all kinds of new ways to use Big Data. For example, if a utility company had previously outsourced the analytics from its self-service Web pages and analyzed that data separately from its other customer service channels, Big Data technology can bring all the analysis together, in-house. That opens the possibility to track customer behavior in multiple channels at the same time. This kind of 360-degree visibility provides a more accurate picture of the customer experience and customer satisfaction, providing new and deeper insight into any business. Show Me The Metrics When establishing metrics for any new analytics project, focus in two areas: employee usage of the resulting intelligence and key performance indicators for the processes where analytics will be used. First, create metrics and track data for the project itself. Think of employees as the data team’s customers, and establish measures that show the total amount of information consumed in the organization: who’s using the data, how much are they using it, and what are they using it for? The higher the usage rates, the better. Second, wherever data becomes part of a business process, such as in customer support or sales, companies can measure customer satisfaction, sales figures, and other established metrics. Comparing numbers before and after the Big Data implementation should clearly show what the organization has gained as a result—as well as other

Big Data Analytics Guide

opportunities to collect data and put it to work. Used this way, analytics often deliver a huge ROI, helping a company identify problems early on (or before they happen) and take steps to fix (or prevent) them. Predictive analytics

Wherever data becomes part of a business process, such as in customer support or sales, companies can measure customer satisfaction, sales figures, and other established metrics. can recognize customer care or sales opportunities at the right moment, offering discounts or related products that increase satisfaction and boost sales. Success for the Long Term Once an enterprise decides that its analytics project has legs, there are some well-established tactics to help assure its long-term success. Foremost is communication. Keep everyone in the loop from strategy to deployment—and beyond. Key individuals should never be blindsided by hiccups or new developments in the process. New technology built to handle today’s veritable deluge of data is bringing down the cost of analytics, delivering better performance, and helping companies put data to work in new ways. Measuring the ROI of a Big Data project can be accomplished by establishing company metrics that highlight analytics usage, encouraging data teams and employees to define actionable reports, and incorporating analytics into more decision making throughout the enterprise. n

Dan Lahl has been in high tech for almost 30 years. In addition to bringing to market SAP Sybase SQL Server, SAP Sybase ASE, and SAP Sybase IQ, Lahl has evaluated multiple emerging technology areas leading to EII, ETL, and GRID technology purchases for the company.

7

ENTERPRISES, BUSINESSES, AND BUSINESSES AND GOVERNMENTS ARE SEEING SIZABLE RETURNS ON THEIR INVESTMENT IN BIG DATA ANALYTICS PROJECTS.

The Numbers are In: Early Stage ROI and Proof of Concept By David Jonker, Product Marketing Director, SAP

AOK Hessen, a public health insurance organization in Germany, uses pattern analysis to detect fraudulent invoicing, and it has reclaimed US $3.2 million of unjustified charges.

As new technology helps organizations put today’s massive data sets to use, real-world examples and research are proving that analytics helps cut costs, increase revenue, eliminate waste, and otherwise boost the bottom line.

Lawson HMV Entertainment, a leading retailer in Japan of DVDs, Blu-ray discs, books, and games, integrates in-store and Website data into one massive marketing database that the company uses to fuel its targeted email campaigns to customers. The result has been 3 to 15 times higher purchase rates and double digit revenue growth.

Real Returns Recent reports confirm that organizations around the world are finding an edge by incorporating advanced analytics into business processes, leading to informed decisions around customer satisfaction, new service success rates, and competitive analysis (see Table 1.) American Airlines has saved roughly $1 million annually in fraud detection, helping the company identify forms of fraud it never knew existed and eliminating the loopholes that criminals were exploiting.

Table 1. Big Data Analytics ROI

The Sao Paulo State Treasury Department has so far identified $100 million in untaxed earnings, which allowed tax inspectors to adopt a proactive approach to tax evaders, investigating telltale behavior patterns in its data and taking corrective measures early on, rather than punitive ones after the fact.

Organizations around the world are finding an edge by incorporating advanced analytics into business processes.

8

Who

What

American Airlines

$1 million annual fraud detection

State of Sao Paulo, Brazil

$100 million untaxed earnings

Cell©

$20 million saved on one project

AOK Hessen

$3.2 million fraud detection

HMV Japan

3 to 15 times higher purchase rates

Research Backs Up the Numbers After studying 179 large public companies that use what they call “data-driven decision making,” authors from MIT and the Wharton School concluded that organizations using analytics in their decision-making processes derived 5 to 6% better “output and productivity” than if they had not used analytics. Researchers at the University of Texas studied how analytics affected the finances, customer activities, and operations of 150 Fortune 1000 firms. According to the research, the product development area alone justifies deploying analytics

Big Data Analytics Guide

According to a University of Texas study, product development alone justifies deploying analytics for a typical Fortune 1000 enterprise.

Table 2. Legacy vs. Next Generation Real-Time Analytics in Telecommunications Legacy Analytics Infrastructure Next-Generation Real-Time Analytics Infrastructure Storage cost

High

Analytics

Offline

Real time

Data loading speed

Low

High

Data loading time

Long

Average 50% faster

Administration time

Long

Average 60% faster

Complex query response time

Hours/days

Minutes

Data compression technique

Not mature

Average 40 to 50% more data compression

Support cost

High

Low

for a typical Fortune 1000 enterprise. The study states, “Revenue due to a company’s ability to innovate new products and services increases with data accessibility and specialty products and services, which, in turn, is positively affected by data intelligence.” How positive is that impact? According to the analysis, a $16.8 billion company might see an extra $64 million top-line dollars over five years if analytics are put into the hands of “more authorized employees” who “can make use of information…to better spot trends, demand patterns, improve recommendations for decision making and profile match.” All of those benefits could contribute to new-product revenue. Such firms could also add $14 million in new customer sales annually. The University of Texas report also discovered that the comprehensive use of analytics inside a company improved results in the operational areas of asset utilization, forecasting and planning, and on-time delivery of products or services. For example, wider use of analytics can lead to an 18.5% improvement in planning and forecasting for a typical firm studied.

Big Data Analytics Guide

Low

The sources of these significant ROI metrics vary by company and industry. To examine an example in depth, consider how using analytics for real-time applications has impacted the telecommunications market (see Table 2). You can see how a modern analytics environment leaves legacy decision-support systems in the proverbial dust. n

David Jonker is focused on market strategy for the SAP Data Management and Analytics product lines, including SAP Sybase IQ, ASE, Replication Server, and SQL Anywhere. Jonker’s career includes more than 10 years in software engineering and product management roles before leading the SAP product marketing teams for data management and analytics.

9

TO AVOID LEGAL LIABILITY, ORGANIZATIONS THAT WANT TO REAP THE BENEFITS OF CLOUD-BASED BIG DATA ANALYTICS MUST CAREFULLY VET PARTNER TECHNOLOGY.

Analytics in the Cloud: Traversing a Legal Minefield By Dr. Brian Bandey, Doctor of Law

obligations that go to nondisclosure; but may also restrict the uses to which the data can be put and define what level of security is to be employed.

When a corporation mines the Big Data within its IT infrastructure a number of laws will automatically be in play. However, if that corporation wants to analyze the same Big Data in the cloud—a new tier of legal obligations and restrictions arise. Some of them quite foreign to a management previously accustomed to dealing with its own data within its own infrastructure.

Other data might be owned by the corporation, but identifies living individuals (whether directly or indirectly). Data Protection Law (as it’s generally known) is concerned with the access, use, movement, and the technological safeguards to prevent disclosure of Personal Identifying Information (PII).

A corporation holding Big Data will possess different types of data which the Law will automatically classify and attach lawbased obligations. Some of that data may not be owned by the corporation. It may be a third party’s data which it holds pursuant to a Confidentiality Agreement. Such agreements may not only produce

Often law-based security obligations cannot be delegated to the cloud services provider. Legal responsibility may remain with the data controller.

A corporation will also own secrets about itself which, if disclosed, might cause irreparable damage. Officers owe stakeholders a legally binding ‘duty of care’ to take all reasonable precautions to ensure the security of such information. Due to restrictions on processing, both from Data Protection and Confidentiality Laws, care will need to be taken when building the data warehouse to be analyzed. Certain classes of data may need to be excluded. All of these different types of law intersect over the area of Big Data storage, security, and processing. They produce a matrix of law-based obligations which, in many areas, cannot be delegated or avoided—only met. Security But what happens when we translate that matrix into the cloud? The first matter is that of security. Breaches occasioning the loss of data can cause an abundance of law-based difficulties: from breach of contract, fines under Data Protection Law, uncapped damages due to the release of third-party secrets and so on. But why is this the “first matter”? The corporation cedes actual security to its cloud services provider. Instead of the corporation implementing its own

10

Big Data Analytics Guide

The cloud computing architecture must be able to identify what data is in which jurisdiction and, if necessary, keep it there.

security directly; that role is handed to the cloud services provider. A great deal is said about service level agreements on this subject—their utility and importance. Frankly, I don’t see it that way. What remedies are available to the corporation under a SLA other than contractual remedies? Usually none! In my opinion, it is highly likely that money damages will not put the corporation back in the position it would have been— but for the security/contractual breach. No. What is needed is the choice of a correct cloud security architecture of sufficient robustness. One may ask why is that a legal topic? Surely it is strictly an IT matter? I take the view that one must look to the propensity of the cloud technology itself to cause the corporation legal exposure. The Duty of Care owed by officers to their stakeholders, the corporation’s duty to those persons whose PII it holds, and the contractual obligations it owes with respect to third-party confidential information—all compel the corporation to exercise expertise, care, and prudence in the selection of a technologically secure cloud computing environment. This means that they must look beyond the cloud services provider per se, and discharge the Duty of Care through due diligence on the technology underpinning it. How capable is the architecture of securing the data? Is the architecture built to be secure and resistant to the correct range of security threats? How robust and secure is it against measurable benchmarks? Secondly, there are significant technological differences between a cloud computing environment and a corporation’s ‘owned’ infrastructure. I am referring especially to the integrity of multi-tenancy architectures. A leaky multi-tenancy system must cause a significant probability that the corporation will be in breach of its obligations to many prospective litigants. Thus real attention will need to be given to the architecture that isolates one ‘data set’ from another and keeps it isolated.

Big Data Analytics Guide

These are not matters of academic technical interest–but go to the ability of the corporation to discharge what are often, non-delegable, unavoidable legal duties. Personal Identifying Information Moving on from security; there is the matter that is generally referred to as the trans-border movement of PII. Many countries either restrict or prohibit the exporting of PII. To do so can even be a corporate crime—certainly exposing the wrongful exporter to the likelihood of a hefty fine, adverse publicity, and reputational loss. Thus the problem for our conceptual corporation is the nature of cloud computing itself. By that I mean that the advantages of scalability, flexibility, and economies of scale that are accessed through the technological advantage of distributing data across a number of servers which may not all be in the same country. Thus PII may be automatically exported illegally. There are two avenues open to the corporation to obviate this ‘unlawfulness.’ The first is to choose a Big Data warehousing and analytics architecture which, with certainty, can confine data storage and processing to servers residing in nominated legal jurisdictions. The cloud computing architecture must be able to identify what data is in which jurisdiction and, if necessary, keep it there. The second is to transform the PII so that it no longer constitutes, in law, PII. Data which is not PII cannot be subject to data protection law. For some time now, medical researchers have shared patient information internationally through a process of either anonymize or pseudonymization. Anonymization is a process whereby the identifier sub-data is removed, prior to export, thus enabling any type of processing, anywhere. The data

11

needs to be of a configuration that can still be effectively processed in the absence of identifier sub-data. Where the presence of a form of identifier sub-data is required for processing (or analysis) pseudonymization is used. The aim of these two forms of de-identification is to obscure the identifier sub-data items within the patient records sufficiently that the risk of potential identification of the subject of a patient record is minimized to acceptable and permissible legal levels. Although the risk of identification cannot be fully removed, it can often be minimized so as to fall below the defining threshold. Analytics There is no reason, in law, why Big Data analytics cannot be performed lawfully in the cloud. However, in order to do so, significant attention needs to be directed to the actual software and hardware programming architectures to be employed—and match those to the matrix of laws which operate over the storage, use, processing, and movement of data. It may seem strange that I am advocating an almost technology-centric solution to what is clearly (and perhaps solely) a law-based problem. But as I said before—money

damages in these scenarios will never, in my opinion, be sufficient compensation for the owners of Big Data. Rather, the requirements of the law need to be soundly and accurately matched and, indeed, mapped onto the cloud computing technology at hand. Only then can the minefield of Big Data analytics in the cloud be successfully traversed— without an explosion. n

Dr. Brian Bandey is acknowledged as one of the leading experts on Computer Law and the international application of Intellectual Property Law to Computer and Internet Programming Technologies. His experience in the global computer law environment spans more than three decades. He is the author of a definitive legal practitioners textbook and his commentaries on contemporary IT legal issues are regularly published throughout the world. Dr. Bandey is now well-advanced upon the unique route of studying for a Second Doctorate of Law advancing the current state of the art in the Intellectual Property in Internet and Cloud Technologies with St. Peter’s College at the University of Oxford in England.

In order for Big Data analytics in the cloud to be lawful, the requirements of the law need to be accurately mapped onto the cloud computing technology at hand.

12

Big Data Analytics Guide

INTERNET METRICS, TELECOMMUNICATIONS, AND FINANCIAL SERVICES PROVIDERS ARE USING BIG DATA ANALYTICS TO BOOST PROFITS AND ADD CUSTOMERS.

Big Data Analytics Earns High Scores in the Field While industries vary greatly in what they need from their data, and even companies within the same industry are unalike, virtually every organization in every market has these two related problems: what to do with all the information pouring into data centers every second, and how to serve the growing number of users who want to analyze that data. Massive data sets, even those including variable data types like unstructured data, can be analyzed. Not only are they

In healthcare, the move to electronic medical records and the data analysis of patient information are being spurred by estimated annual savings to providers in the tens of billions of dollars.

Big Data Analytics Guide

ready for analysis, modern Big Data analytic software performs faster and readily scales to handle as many users and as much data as needed. By seeking out Big Data inside and outside your organization and using it to push intelligence deep into the enterprise, organizations can be more responsive, more competitive, and more profitable. At the heart of these endeavors are columnarbased databases and in-database analytics. Companies from many industries are already taking advantage of technological advances in storing and analyzing Big Data to gain business insight and provide better service to customers. These real-world examples prove the value of Big Data analytics.

comScore Stops Counting Visitors and Starts Counting Profits comScore, a cloud-based provider of analytics services and solutions for the eCommerce marketplace, realized when it began operations that the focus of Internet marketing was shifting from visitor counts to profitability. comScore’s Customer Knowledge Platform provides a 360-degree view of

13

customer behavior and preferences as they visit sites throughout the Internet. The service monitors surfing and buying behavior at every site visited by consumers who have opted in to having their Internet behavior analyzed. With millions of Web users signing up to be monitored, the data collected was enormous. comScore applies its analytics to more than 40 terabytes of compressed data, while adding close to 150 gigabytes every week. Despite this volume of data, query-response time is exceptional. “We are able to mine the data and produce results for our customers much more quickly. That helps them market more effectively and generate more business,” says Ric Elert, vice president of engineering at comScore. The company achieves 40% compression ratios using column-store technology. Had they used a traditional approach, comScore says its storage costs would have been much higher. “The compression is extremely important to us because we have fire hoses of data,” says Scott Smith, vice president data warehousing. “We have a tremendous amount of data. More data than most people will ever see.” Suntel Introduces Customized Service Offerings to Sri Lanka As Sri Lanka’s fastest growing telecommunications company, Suntel has 500,000 customers. The company puts the latest technology, innovative thinking, and an unprecedented service commitment into customizing telecommunications solutions for its subscribers.

“We’re exploring ways to exploit this data trove to develop ways to customize the customer experience across different sized customers and to implement programs to cross-sell and upsell our services.” Tarik Markar Director of Information Technology and Solutions Delivery Suntel

14

Suntel’s traditional relational database was unacceptably sluggish. “We reached a point,” explains Tariq Marikar, director of information technology and solutions delivery, “at which we were seeing a 20% overload on our production database, which was unacceptable to us.” “Additionally, we wanted to be able to run reports and queries against years of historical data rather than just a few months. We knew we needed to create a separate repository—a data warehouse—specifically designated and designed for reporting and analytics in order to solve this problem.” Suntel achieved its goal by adopting a column-store data warehouse designed for advanced analytics. “It’s very important to our business to be able to view large volumes of historical data,” says Marikar. As with comScore, the “compression capability has meant that the data residing on our production database only requires about one-third the space.” The new platform can scale as Suntel increases the numbers of users and explores other strategies for tapping into this valuable data. “We’re exploring ways to exploit this data trove to develop ways to customize the customer experience across different sized customers and to implement programs to cross-sell and upsell our services,” says Markar. Airtel Vodafone Makes Better Decisions with Business Intelligence In Spain, Airtel Vodafone created a data warehouse to help it accurately analyze and predict customer activity. The company developed a data warehouse with information generated from multiple departments and organized the data according to the company’s business map. The data warehouse allows Airtel Vodafone to convert data into valuable business intelligence. Query demands on the company’s data warehouse are intense. More than 1,000 employees use the data warehouse for multidimensional analysis. The information has specifically designed structures for the data concerning customers, infrastructures, and company processes. This structure allows users to extract the data to create modeling and simulation processes. Data-mining techniques extract information on customer behavior patterns. Airtel Vodafone’s customer-facing personnel are able to input the information they collect on a daily basis, so that it is integrated with data already stored in

Big Data Analytics Guide

the warehouse. This data is subsequently combined and converted into information structures for inquiries. The data warehouse environment comprises marketing databases, call systems, customer service, GSM network statistical data, invoicing systems, collections and retrievals, and all logistics information. The marketing team uses the same information as those in finance, although they look at it from different angles and use it for different analyses. Having this structured data allows Airtel Vodafone to provide both detailed and summarized information of the company’s activities directly from the data warehouse. These advantages are helping Airtel Vodafone make informed business decisions based on customer activity.

Vertical Industries Reap the Rewards of Data Other industries are waking up and taking advantage of Big Data analytics. In healthcare, the move to electronic medical records and the data analysis of patient information are being spurred by estimated annual savings to providers in the tens of billions of dollars. In the manufacturing sector, outsourcing a supply chain may save money, according to McKinsey & Company, but it has made it even more critical for executives, business analysts, and forecasters to acquire, access, retain, and analyze as much information as possible about everything from availability of raw materials to a partner’s inventory levels. In these and other industries, users are lining up outside the CIO’s door, asking to analyze the incoming flood of data. And when they get access, users want query-response times comparable to what they experience using search engines such as Google and Bing. In some markets, response times that satisfy humans aren’t fast enough. These enterprises demand machine-to-machine speeds for analytics.

technologies necessary to achieve these feats make it possible for only the largest players in the industry to continuously throw faster hardware and network gear at the problem. For everyone else, says Larry Tabb, CEO of the Tabb Group, a financial services technology consultant, you need to be significantly smarter. To compete, Tabb says, “you need to raise the analytical barriers.” In the face of these Big Data challenges, some in the analytics industry caution that companies might be smarter to take a “more is less” approach to analyzing data sets. Sometimes

To speed decision making, firms are applying analytics to business processes for financial transactions executed by computers. Humans once were solely responsible for the decisions, but now only computers can work as fast as the data is moving. you might hear arguments that applying analytics to smaller data sets is, in effect, “good enough.” More often than not, this argument is made by those that can’t analyze large data sets. As Google Chief Economist Hal Varian observes, analyzing a small, random slice of data can indeed yield valid results. But to get a truly random data set, that sliver of information needs to come from a massive amount of information. Without a large enough pool of data to draw from, the validity of your analytics processes can be called into question. In other words, Big Data generates the best valid data. n

According to the publication Wall Street & Technology, financial services companies are under increasing pressure to accelerate decision making from “microseconds to milliseconds to nanoseconds.” To speed decision making, firms are applying analytics to business processes for financial transactions executed by computers. Humans once were solely responsible for the decisions, but now only computers can work as fast as the data is moving. The pricey

Big Data Analytics Guide

15

CONSIDERING BIG DATA ALONE IS INSUFFICIENT; ANALYTICS MUST ALSO BECOME PERVASIVE ACROSS THE ENTERPRISE IN ORDER TO TRULY LEVERAGE THE OPPORTUNITY.

Big Data Is Only a Small Part of the Opportunity By Mike Upchurch, Chief Operating Officer, Fuzzy Logix

The opportunity Big Data represents goes beyond the data and the related new technologies that capture and store it. The real benefit is that organizations can derive better business intelligence from far more sources than ever before, and make it available to decision makers at every level. The keys to success are designing your Big Data analytics to support business goals and enabling decision makers to take action. It’s easy to collect large amounts of data. Knowing what to do with it all—and making changes based on what you learn—is the challenge. We can liken the task to searching for diamonds in a giant pile of sand. Storing the sand is easy, but sifting through it requires a special set of tools, as well as a sufficient understanding of what you’re looking for, why, and what you’re going to do when you find it. Historically, data analysis has been a story of complexity, limited capacity, elaborate tools, cryptic results, and poor distribution. Special equipment was required; only a small number of people knew how to use it; and the demand on

Today, new, powerful data warehouse systems using in-database analytics can quickly ingest and process Big Data wherever it resides.

their time was high. Analysis also required moving data from a database to an analytics server, processing it and pushing it, back. Just moving the data was 80% of the work—akin to trucking our pile of sand 10 miles to sift it. Today, new, powerful data warehouse systems using in-database analytics can quickly ingest and process Big Data wherever it resides. What’s more, business users can now sift through data using familiar reporting tools, gaining easy access to powerful on-demand analytics and allowing data scientists to focus on building models instead of running reports. Best of all, these new solutions generally cost around 20% less to build than traditional platforms and perform more than ten times faster. Start With “Why” Data analysis is more accessible than ever, and it can solve many problems—but not all of them. The key to identifying which problems to tackle is to start with “why.” Why are we analyzing Big Data? First, assess your strategic goals. These could be growing market share, controlling cost and risk, or understanding customer behavior. Then, determine if using analytics will deliver value. There are two important questions to answer: Can the company use data models to derive insight, and can it act on the results? Working through this process will help determine where your organization can realize value from Big Data analytics. Changing Company Culture Companies need a focused plan, great execution, the right technical platform, and the ability to operationalize the results of analysis. Without accompanying cultural change, however, those things will only deliver a fraction of the potential value of Big Data analysis. Let’s go back to the diamond mine one more time. They have new sifting equipment that tells the miners where the highestvalue diamonds are, but the miners aren’t authorized to react to the information. The best equipment can’t make up for broken culture. Employees should be able to run analytics and see actionable answers on demand: a forecast of how close the sales team

16

Big Data Analytics Guide

It’s crucial to create a culture that rewards decisions and encourages analytics innovation, which may require modifying incentive and bonus structures.

is to meeting this month’s numbers, a customer’s credit score, or a report of which advertising keywords to buy today. Armed with information, employees must also be comfortable and confident taking action before the value of the insight diminishes. As a company incorporates the use of analytics, employees will have ideas about how to improve on the original models. Building a culture that encourages constant testing and learning—as well as providing access to a flexible platform that can accommodate new ideas—will greatly improve the value companies can reap from Big Data. It’s crucial to create a culture that rewards decisions and encourages analytics innovation, which may require modifying incentive and bonus structures. Not allowing employees to act is the most common point of failure for analytics projects— don’t make that mistake. It’s rarely mentioned in discussions of Big Data, but it can make or break an analytics initiative. Maximizing Results Many companies are succeeding at their search for value in Big Data. They have the systems and infrastructure to capture and analyze Big Data; they have operational processes in place; and their employees have permission to act on the results. For these companies, the payoff can be dramatic. For example, equity traders may need to buy or sell assets during the trading day to balance their portfolios, but one day’s Opera feed can contain data for 500,000 to 1 million trades. If portfolio risk can only be calculated overnight, then institutions are exposed to an unquantifiable amount of risk during each trading day. With Big Data analytics, traders can get real-time pricing and calculate risk throughout the day. The result is that they can rebalance their portfolios at the expense of less agile traders. Millions of dollars can be won and lost by having better information than the institution on the other side of the trade. Other examples of capitalizing on Big Data include modeling loan default risk on demand, and stress-testing entire portfolios in a fraction of the time required by traditional solutions.

history and the actions of customers with similar histories, an analytics engine can recommend actions that will reduce churn, or suggest products or services that will be the customer’s next likely purchase. One call center leveraged Big Data analytics and saw a 10% reduction in churn, an 8% increase in per call revenue, and a 12% improvement in cross-sale revenue. Health care organizations are using Big Data analytics when evaluating care quality and efficiency. Using traditional methods, analyzing more than 700 million lines of claims data can take six weeks and a dedicated team of analysts, and only produce reports twice a year. With Big Data solutions, risk management teams can now run the models in 22 minutes and take immediate action to improve quality of care, reducing the window during which risk can go unnoticed from six months to less than a week. Big Data analytics are ushering in a new era of predictive insight that is changing how companies operate and engage with their customers, suppliers, and employees. To take advantage of the opportunity, companies must start with the “whys,” align analytics projects with business needs, and quantify the value that can be created. To realize the value, employees must have access to powerful, innovative, and proven technology, participate in the process, understand the results, and be empowered to act. Get all of this right, and your diamonds will shine bright, creating competitive advantage and financial gain. n

Mike Upchurch is responsible for customer acquisition, partnerships, global operations, and corporate culture at Fuzzy Logix. Previously he worked at Bank of America, leveraging trading instruments to create consumer products, mining consumer data to identify trading opportunities, and building and implementing a strategy that grew telephone mortgage lending from $9 billion to $22 billion in four years. He has also held a number of strategy and operational roles at global technology companies.

Call centers use analytics to better serve customers, reduce churn, and cross-sell new products. By analyzing a customer’s

Big Data Analytics Guide

17

Business Analytics Roadmap

Business Value through Operational BI Align operational and real-time business analytics and analytics technology with true business requirements and capabilities to ensure greater success in reaching business and IT goals.

President of Intelligent Solutions, Inc. and Founder of the Boulder BI Brain Trust Excerpted from TDWI Checklist Report, “Delivering Higher Business Value with Operational Business Intelligence and Real-Time Information”

Operational BI (OBI) is a popular topic in most business intelligence (BI) shops these days, and rightfully so. OBI enables more informed business decisions by directly supporting specific business process and activities. OBI has had a dramatic impact on traditional BI environments and on a new audience of BI users. These users now have immediate access to the insights they need when making decisions about customers, products, and even campaigns while these business activities are happening.

18

This Checklist Report helps you determine how to align the implementation of operational and real-time BI and analytics technology with true business requirements and capabilities to ensure greater success in reaching business and IT goals. 1. RECOGNIZE THAT NOT ALL ANALYTICS MUST COME FROM THE DATA wAREHOUSE ENvIRONMENT. The data warehouse (DW) is a key supplier of data analytics, but it’s not the sole supplier of analytics. Other forms of analytics are needed for a fully functioning OBI environment. Because many analytics used in OBI require low-latency or real-time data, organizations try to speed up the overall processes of the DW—trickle-feeding the data, automating analyses, and so on—in an effort to make it the sole supplier of analytics. Although this approach works for some lowlatency analytics, at some point the DW team must turn to other analytical techniques to complete the OBI picture. One of these techniques is event analytics. Event data is created by business activities (generated by banking transactions [ATM], retail operations [POS, RFID], market trades, and Web interactions) or by system events (generated by sensors, security devices, or system hardware or software). Event analytics applications often perform their analyses even before the transactional data is stored in an operational

Big Data Analytics Guide

system. For example, many fraud-detection applications analyze transactions for fraudulent characteristics first and then store them in transactional systems for further processing. Obviously, the DW contributes to the overall OBI environment by generating the fraud models used by the event analytics software. Another technique is to make BI analytics (or its results) available as callable services within an operational workflow. Embedded BI services can be external to the workflow (as a part of a service-oriented architecture) or included within the workflow itself. These services come in two flavors. The first calls a stored analysis or model, uses it dynamically during the workflow, and receives the results before invoking the next activity—for example, calling a stored analysis to dynamically determine a loan applicant’s credit worthiness. The second type retrieves the static results from an earlier analysis; for example, a customer service representative (CSR) retrieves a customer’s lifetime value score or segment ID stored in a DW. Both types are employed by a business process or person to support real-time or near-real-time business decisions and actions. The combination of traditional data analytics, embedded BI services, and event analytics forms the foundation of OBI. All three must come together at appropriate points in the workflow to provide a mature and effective operational decisionmaking environment. 2. MATCH REAL-TIME CAPABILITIES FOR INCREASING BI AGILITY TO ACTUAL BUSINESS NEEDS. There is a lag between the time an event happens and the time a company responds to it. This lag is caused by several factors, such as preparing the data for analysis, running the analysis, and determining the best course of action based on the results—for example, taking action when a campaign sells a product that is about to run out of stock. Clearly, the ability to reduce the time to action here (stopping the campaign or changing the featured product) can have significant impact on a company’s revenues and reputation. This is BI agility. It requires that the action time match the business need. However, there is a trade-off. Is it timely enough for the business or is it actually too fast? Even if the business requires reduced latency, can the business users correctly process the inputs that quickly? Can the operating procedures handle the time frame appropriately to ensure a correct reaction? There are many moving parts in an OBI environment, and any that are out of sync or incomplete can cause an erroneous decision to be made. In this situation, the cost of creating such a low-latency BI environment may be more than the actual benefit the company receives.

Big Data Analytics Guide

Another trade-off is the soundness and flexibility of the architectural infrastructure in terms of allowing for delivery of information in different latency time frames (more on this later). Building an OBI solution that is inflexible or fragile just to meet an arbitrary time frame may spell disaster. If the action time requirement changes (and it almost certainly will) from two hours to one hour, you don’t want to have to rebuild the entire architecture. To avoid this situation, the BI implementers must understand how the business community interacts with OBI, from event

Embedded BI services can be external to the workflow (as a part of a service-oriented architecture) or included within the workflow itself. occurrence to action taken. Interactions must include the impact of the growing usage of tablets and mobile devices. OBI must reach its audience with the appropriate information formatted for the myriad mobile devices available today. 3. DETERMINE THE PROPER INFRASTRUCTURE FOR BUSINESS-CRITICAL OPERATIONAL BI. Although traditional BI processing is often critical to business operations, a temporary failure of the BI system will not typically affect short-term business operations. Also, given that the BI system is separated from operational processing, it means that BI processing has little effect on operational performance except during the capturing of operational data. The situation with OBI is different from traditional BI because it is closely tied to the daily operations of the business. A failure in an OBI system could severely impact business operations. This risk is especially relevant for OBI applications that support close to real-time decision making, such as fraud detection. There are several approaches to supporting OBI, including embedding BI in operational processes, accessing live operational data, and capturing operational data events and tricklefeeding them to a DW. All of these approaches have the ability to affect the performance of operational systems. It is very important, therefore, that the infrastructure of the BI system, its underlying DW environment, and related operational systems be capable of providing the performance,

19

scalability, availability, and reliability to meet OBI service levels. The cost of providing such an infrastructure increases as these service levels approach real time, and these costs must be balanced against the business benefits achieved and the ability of the organization to exploit a more agile decision-making environment. 4. UNDERSTAND THAT OPERATIONAL BI IS NOT JUST A TECHNOLOGY SOLUTION. It’s critical that BI implementers be able to tie BI applications to operational applications and, even more importantly, with operational processes. Yes, technology is important, but perhaps just as important are the standard operating procedures (SOPs) that must be followed by business personnel. Many BI implementers do not realize that their OBI solution impacts how people perform their jobs. Without understanding how SOPs will be affected, the OBI team can cause severe problems with operations or, worse, find their solutions being ignored or circumvented. As a first step, the BI team should study, understand, and document the full business workflow using the new BI applica-

As a first step, the BI team should study, understand, and document the full business workflow using the new BI application tion. OBI applications can cause big changes to processes and procedures. When they do, the team must determine how the SOPs must change. For instance, will they need to be rewritten or enhanced to include the new OBI application? What impact will this have on the workforce? Who will create and maintain the new SOP? The team must also determine which personnel will be affected by the new procedures and what training they will need. The team must study how these personnel make decisions, how they access and use information, and how they monitor the impact of their decisions on the company. Training must be ongoing and flexible to accommodate the inevitable turnover in operational personnel. Some of the workforce may immediately grasp this new paradigm; others may not.

20

5. UNDERSTAND THAT OPERATIONAL BI IS MORE THAN SIMPLY CAPTURING MORE TIMELY DATA. It is often assumed (incorrectly) that OBI simply involves capturing more timely data. Certainly data consolidation (ETL), data replication, and data federation (enterprise information integration [EII]) technologies have advanced to the point that we can capture data and make it available in a far more timely fashion than ever before. For example, using logbased changed data capture (CDC) has distinct advantages for speeding up data integration and processing for a DW. Without doubt, real-time or low-latency data is an important feature of OBI processing. In addition, there are other factors that need to be considered when improving BI agility and supporting faster decision making. Once operational data has been captured, it needs to be analyzed and the results delivered to the BI consumer, which may be a business user or another application. The time it takes to analyze the data increases the time (the action time) it takes for a business user or an application to make a decision. It is important, therefore, that the actual queries used in the analysis are optimized for good performance. It is also important that the underlying query processing engine is optimized for efficient analytical processing. In some instances, the analytical results may be precalculated to reduce action times (customer lifetime value scores, for example). The efficient delivery of results to the BI consumer is also important for OBI success. The delivery medium used (dashboard, portal, mobile device, action message) must be selected to match the action time requirements of the business. The availability of automated decision-making features such as alerts, recommendations, and decision workflows can help business users make faster decisions. In near-real-time decision-making situations (fraud detection, for example), fully automated decision-making features may be employed. n This contribution was extracted from “Delivering Higher Business Value with Operational Business Intelligence and Real-Time Information.” To read the entire document, go to: http://tdwi.org/research/2011/11/tdwi-checklist-reportdelivering-higher-business-value-with-operational-bi-andreal-time-information.aspx

Claudia Imhoff, Ph.D. is an analyst and speaker on business intelligence and the infrastructure to support these initiatives. She is the president of Intelligent Solutions, Inc., a data warehousing and and founder of the Boulder BI Brain Trust. She has co-authored five books on these topics and writes articles and research papers for technical and business magazines. Big Data Analytics Guide

A NEw APPROACH IS NECESSARY IN TODAY’S ALwAYS-ON wORLD. SAP IS DELIvERING A PORTFOLIO FOR THE REAL-TIME BUSINESS.

Real-time Data Platform for a Real-time World By Amit Sinha, Head of Database and Technology Innovation, SAP

While business is happening faster, many IT departments are still using traditional data management tools designed in the 1980s when the pace of life and business was slower, and the amount of data was much smaller.

Professor Richard Wiseman, author of Quirkology, compared the ‘pace of life’ in 31 countries by studying how fast people walk. The study definitely fits the title of his book! More interesting is that the overall pace of life increased by 10% over a 10-year period, and it’s only getting faster. Smartphones, wireless networks, and an ‘always on’ lifestyle is further accelerating the pace of people’s lives and of business and generating vastly more data at the same time.

The challenge is that today enterprises are looking to analyze terabytes or petabytes of data in the moment, instead of days or weeks in the past. Yet, the underlying infrastructure has remained status quo, with enterprises being forced to spend time ‘shoehorning’ old technology into their data centers to address new problems. And many of them have reached the breaking point.

Instead, a new approach is required that can not only mine the information and make sense of it, but do it in real time.

Big Data Analytics Guide

Instead, a new approach is required that can not only mine the information and make sense of it, but do it in real time. To empower organizations to remain competitive in today’s constantly evolving market, SAP has committed to helping them unleash the value of Big Data through a new approach to data management. It all starts with a foundation based on the SAP HANA database, a state-of-the-art in-memory platform, which allows enterprises to cut out the complexity that’s crept into IT environments. SAP HANA’s extreme performance and

21

innovation for the next generation of applications is redefining the database market by helping customers access and deliver information at speeds up to 100,000 times faster than previously available. Surrounding HANA, the centerpiece of SAP’s real-time data platform, are several components that bring the best of database innovation forward. Sybase IQ, the # 1 column database on the market, offers enterprises the best overall total cost of ownership by reducing administration by 75% and reducing data storage volumes by more than 70% through advanced data compression algorithms. SAP Sybase Adaptive Server Enterprise (ASE) is the #1 transactional database and in use by most Wall Street firms. SAP Sybase ASE delivers top performance for enterprises, reduces risk due to security breaches or system failures and increases efficiency by simplifying administration and efficiently using hardware and storage. Another piece of the real-time database puzzle is SAP Sybase SQL Anywhere, the #1 mobile and embedded database that supports advanced synchronization, out-of-the-box performance with little to no DBA support and the ability to enable applications in remote locations. Lastly, enterprise information management (EIM) solutions from SAP enable enterprises to rapidly ingest, cleanse, model, and govern data in order to improve the effectiveness of Big Data across operational, analytical, and governance initiatives.

These solutions together are just the beginning of SAP’s goal of providing customers with a single, logical, real-time platform for all transaction and workloads. By leveraging the industry-leading SAP Sybase data management products, customers will be able to transact, move, store, process, and analyze data in real time while reducing overall costs. The old way of doing things is no longer acceptable. The new world of data needs a new data platform, and SAP is committed to helping enterprise IT departments evolve from complex, slow-moving entities into a more simplified architecture that enables Big Data, cloud services, as well as analytic, transactional, and mobile applications while preserving investment in existing applications in a non-disruptive way. n

Amit Sinha leads marketing for SAP’s technology platform, data management, and real-time applications. Prior to this role, he led the market introduction of SAP HANA. Previously, as Vice President of Business Network Transformation, Amit was responsible for driving the phenomenon of collaboration across business boundaries through innovations in SAP’s portfolio. He has worked with customers on new cloud-based collaborative applications that empower people and communities to collaborate, leverage information for collaborative decision making, and ultimately enhance the company’s business model. Amit is a graduate of the Indian Institute of Technology (IIT) Bombay and the Haas School of Business at the University of California, Berkeley.

By leveraging the industry-leading SAP Sybase data management products, customers will be able to transact, move, store, process, and analyze data in real time while reducing overall costs.

22

Big Data Analytics Guide

ADvANCEMENTS TO IN-MEMORY DATABASES, LOwER MEMORY COSTS, AND THE COMBINATION OF TRANSACTIONS AND ANALYTICS MOvE HANA INTO A CLEAR LEADERSHIP POSITION.

How HANA Changes the Database Market By Ken Tsai, Vice President of HANA Solution Marketing, SAP

In the 2011 spy thriller Page Eight, the director general of Britain’s domestic intelligence agency, MI5, played by veteran actor Michael Gambon, utters a lament expressed by many a corporate CEO. “This building is swimming in information,” he complains. “We have information coming out of ears.” What’s difficult, he adds, is to determine whether something is important or not.

Given that CPU memory is at least 50,000 times faster than accessing data on a mechanical disk drive, with memory being that cheap the reasons not to use in-memory databases have vanished.

Big Data Analytics Guide

Decision-makers “swimming in information” need a database designed to navigate data’s deep waters. While some databases can be a life raft, helping an organization stay afloat, the SAP HANA in-memory database gives a company fleet command over oceans of Big Data. This advanced, highperformance database is dramatically changing the market. The cost of memory is one sign of market change. When traditional databases were first designed memory was extremely expensive. The big database vendors traded the speed of memory for more cost efficient storage on disk. But that has changed and dramatically. In 1990 a terabyte of memory cost more than $100 million. Today a terabyte of memory costs under $5,000. In three years it’s estimated that price will fall to one quarter of that and by 2018 users will pay one-thirtieth. Given that CPU memory is at least 50,000 times faster than accessing data on a mechanical disk drive, with memory being that cheap the reasons not to use in-memory databases have vanished. Combining analytics and transactions is another change upending the market. HANA provides the power needed in both analytics and transactions to streamline businesses activities. In fact, SAP HANA portends the end of the

23

separation of online application processing (OLAP) and online-transaction processing (OLTP) database functions in large organizations, providing instead a single, massive data store for both transactional and analytical database activity with performance levels previously unimaginable by decision makers. Such a combination of business functions will be revelatory for corporate leaders. Hasso Plattner, in his 2009 paper “A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database,” concludes that when the merging of the two processes occurs “that the impact on management of companies will be huge, probably like the impact of Internet search engines on all of us.” Change for the Better SAP HANA is a 100% in-memory database software appliance designed to run on Intel processors and optimized for specific advances in chip design such as multi-core processors, shared memory, and multi-socket topology. According to Intel, “SAP HANA enables real-time decision making by bringing all the data in your enterprise within the reach of decision makers in seconds, not weeks or months, in an easy-to-use format so your company can run smarter and perform better.” The company concludes that SAP HANA delivers “an unprecedented robustness in real-time business analysis.” Cisco, for example, has applied HANA to its seasonality analysis of customer purchase sentiment to a mere five seconds no matter filters it applies to the report. While Lockheed Martin has improved its labor utilization report by 1,131x in responsiveness. And fashion and fragrance leader PUIG, with iconic brands such as Prada and Paco Rabanne, are now able to predict sales trends in real-time for new products and markets with 400x boost in report execution.

SAP HANA’s columnar architecture is data agnostic, ideally suited for the variety of Big Data pouring into organization’s today. There’s no practical limit to the capacity of SAP HANA.

24

SAP HANA scales linearly along with the growth in the volume and velocity of a company’s information sources. SAP HANA’s columnar architecture is data agnostic, ideally suited for the variety of Big Data pouring into organization’s today. There’s no practical limit to the capacity of SAP HANA. Most important, SAP HANA is fast. Not just whiteboard-theory fast, but real-world business fast. Take Liechtenstein-based Hiliti Corp., a global provider of value-added products to the construction and building maintenance industry. Its application of the SAP HANA database merged transactional and analytic functions to improve the sales and support process by many orders of magnitude; in one case, improving the response time for analyzing 53 million customer data records to two to three seconds from what once took two or three hours. In Japan, Mitsui & Co. Ltd.’s retail operations experienced a stunning 400,000 times performance improvement in its inventory management application with SAP HANA over the prior database’s performance. And Germany’s T-Mobile implemented SAP HANA to analyze huge data volumes in seconds—up to 1 billion rows and a 300 trillion record set in as little as 16 seconds, dynamically modifying its marketing and promotions vehicles to deliver more effective results. Leading through Innovation The arrival of SAP HANA has already changed the market landscape. Competitors are following SAP’s lead and are announcing in-memory databases in an attempt to stay in the performance game. However, because SAP began its development years ago, it has a long head start and will be able stay in the lead for the foreseeable future as it continues to innovate. However, the biggest opportunity SAP HANA creates will be for business. It will unleash powerful and innovative applications that exploit the wealth of knowledge within a company’s trove of Big Data. It will improve the capabilities and responsiveness of operations, finance, marketing, engineering, and virtually all areas of business. No longer will CEOs feel like they are swimming in information. Rather, they will be sailing across it, fully in control and charting new opportunities for increased growth and profitability. n

Ken Tsai is the head of SAP HANA product marketing team at SAP and is responsible for driving marketing, communication, and adoption of SAP HANA in-memory data platform worldwide. Tsai has 17 years of experiences with application development, middleware, database, and enterprise applications. He has been with SAP for the past 7 years and is a graduate of University of California, Berkeley.

Big Data Analytics Guide

TO PROvIDE BUSINESS INTELLIGENCE FOR EvERYONE IN AN ENTERPRISE, DATA DELIvERY AND ANALYSIS MUST BECOME MORE NIMBLE THAN DATA MARTS CAN BE.

DBTA: Data Marts Can’t Dance to Data’s New Groove By John Schitka, Senior Product Marketing Manager, Sybase IQ

Limitations in scalability and business demand for analytics are causing IT departments to rethink the traditional data warehouse/ data mart strategy in favor of a powerful, centralized business analytics information grid. Few things in the world are changing as dramatically as data. Data has tapped out a powerful rhythm to keep time with technology’s bleeding edge, leaving many technologies struggling to keep up. It should come as no surprise, then, that many of the data strategies that IT departments developed—

Business leaders are looking for ways to gain deeper insights from data, to enable more business users to search for these deep insights, and to directly embed these insights into core business processes.

Big Data Analytics Guide

and still widely rely upon—are no longer sufficient for today’s needs. You can put data marts near the top of that list. Data marts were a reaction to the extreme performance limitations of traditional enterprise data warehouses. The data warehouse itself, which came of age in the 1990s, represented a tremendously enticing vision—offering to virtually every department across the enterprise an opportunity to see its performance metrics and find out what’s working and why. That is, data warehouses would have answered all of those questions, if only users could get to the data. Most organizations quickly discovered that data warehouses—with their centralized, brittle architecture—performed abysmally under unpredictable workload demands. Even the load of just a few users could degrade performance precipitously. It quickly became clear that if they wanted to scale the data warehouse, organizations would need to replicate and distribute the data locally. Thus, data marts were deployed. The Power of Prediction While data marts were never a perfect solution, they adequately addressed businesses’ urgent need to let stakeholders from across the organization explore the data and uncover the insights they hold. But while data mart deployments have largely continued unabated for the past decade, business has

25

changed dramatically: Global competition, mobility, social media, and the accelerating pace of business are forcing enterprises to re-evaluate how they think about data. In this fast-paced business climate, it’s no longer enough to use the data warehouse to find out what happened in the past; today’s businesses need real-time data—data capable of making credible predictions about what will happen in the future. Business leaders are looking for ways to gain deeper insights from data, to enable more business users to search for these deep insights, and to directly embed these insights into core business processes. Such predictive analytics can have a tremendously uplifting effect on business, especially when they can be embedded into the workflows and applications that power key business processes. For example, analytics can be used on the fly to determine the likelihood of fraud for any transaction, identify cross-sell opportunities, or to single out particularly influential customers. Imagine the power of being able to alert callcenter operators or branch agents to such conditions at the earliest moments in the customer contact. AOK Hessen saved $3.2 million by using predictive analytics to identify fraudulent insurance claims. HMV Japan used it to better predict the interests of its customers and increased per-transaction revenue by 300 percent as a result.

Intelligence for Everyone Thus, at many organizations today, IT is under mounting pressure to abandon these wallflowers—traditional data warehouses (and data marts)—in favor of a quick-footed modernized architecture that • Can answer complex questions using massive volumes of data • Can scale massively to support the analytics needs of all enterprise users • Can embed advanced analytical models into end-processes to help increase revenue and limit risk. These new business demands are driving recognition of a number of critical technology challenges: 1. Big Data: Today’s businesses are deluged with a massive volume of data, created in part by the recognition that all the data available to an enterprise can be analyzed. 2. Data type diversity: There are dozens of structured and unstructured data types that must be included in the data warehousing effort, including numeric, text, audio, video, high-resolution imagery, SMS, RFID, clickstream, and more. 3. Complex questions: The requirement for in-depth knowledge discovery means the solution must be capable of recognizing and adapting to data anomalies, recognizing data clusters and trends, identifying influencing factors and making reliable assumptions.

According to a University of Texas study, product development alone justifies deploying analytics for a typical Fortune 1000 enterprise

26

Big Data Analytics Guide

4. Decision velocity: Enterprises are looking to make decisions in seconds and minutes, and not days or weeks. The solution must be able to answer user questions at the speed of thought, and in some cases remove the user from the equation entirely. 5. Many users: Today’s business analytics environment must support decision-making at all levels of the organization: tactical, operational, and strategic. Furthermore, enterprises are increasingly looking to incorporate analytics directly into business operations. While each is undoubtedly a critical requirement of the new architecture, the fifth challenge, servicing many users, is perhaps the one that will most definitively set apart successful solutions from those that fail to live up to their potential. After all, even the most insightful conclusions are of limited value if the data isn’t seen by the right people at the right time. For this reason, the next frontier in analytics is delivering intelligence for everyone. The secret to delivering business analytics to the whole organization is to harness smart parallelization techniques. Whereas traditional data warehouses use a shared-nothing architecture, forcing users to wait in queues while resources are locked by other queries, a high-performance business analytics information grid will instead employ a sharedeverything architecture. This will make it possible to:

The Death of the Data Mart Building an enterprise data warehouse is generally viewed as a long-term investment. And yet, traditional solutions have proved to be surprisingly brittle—inadequate for the business needs of tomorrow and unable to learn the steps to data’s new groove. Tomorrow’s massively scalable, grid-style architectures provide an opportunity to create truly flexible and predictive business analytics while solving the very problem data marts were invented to address in the first place: a central place for all business users to access and analyze all enterprise data. An analytics-optimized information grid is the right dance partner for today’s data. It will not only usurp the departmental data mart but it will take inflexible, flat-footed data warehouses of the past along with it. n

John Schitka, currently works in Solution Marketing at SAP focusing on database and technology products. A graduate of McMaster University, he also holds an MBA from the University of Windsor. He has worked in product marketing and product management in the high tech arena for a number of years and has taught at a local college and co-authored a number of published text books. He has a true love of technology and all that it has to offer the world.

• Share resources, making all data accessible to any server or a group of servers, allowing many simultaneous users with diverse workloads • Scale out independently and heterogeneously across resources with or without private clouds • Provide a “self-service” methodology that supports data analysis from anywhere, including from specialized applications, through the Web or on mobile devices.

Big Data Analytics Guide

27

IN-DATABASE ANALYTICS ELIMINATES THE DATABASE-TO-ANALYTICS SERvER TRAvEL OF TRADITIONAL METHODS, PROvIDING FASTER, MORE SECURE, AND MORE ECONOMICAL RESULTS.

In-Database Analytics: Reducing Travel Time By Courtney Claussen, Sybase IQ Product Manager, SAP

Traditionally, data analysis has required data to commute from home to work and back again. When a business asked a question of its data, someone in the IT department had to move a data set out of the database where it resided, into a separate analytics environment for processing, and then back. This “commute” comprised the bulk of the time and the work of the analysis, often causing frustration and delays on the business side as it waited for results. It doesn’t have to be this way, however. It’s now possible to build analytic logic into the database itself, or automatically

Just as working from home saves travel time and fuel costs, performing analysis where the data resides saves time and money.

port models to the database. Called in-database analytics, this technology eliminates the need to move data and significantly reduces the time and effort required for processing. In-database analytical capabilities aren’t new. They have been available commercially for nearly 20 years, but only recently have they begun to gain popularity. To run in a database, an analytics model first must be translated from the “native language” of its development environment to something its destination database can understand. Until recently, the way to do that would be to recode the model from scratch, which could take weeks, months, or longer—rendering the final results so late that they might no longer be useful. For this reason, in-database analytics simply didn’t deliver much of a benefit over traditional methods. Under the triple threat of burgeoning data volumes, sped-up business transactions, and more data use in the enterprise, the back-and-forth analytical process has become unbearable for many organizations. The PMML Catalyst Predictive model markup language (PMML) is a big reason why in-database analytics has become a viable option. PMML, a flavor of XML and an industry standard, makes it easy to transfer complex analytic models between different environments. In practical terms, it means that the “translation” process that was once measured in days or weeks can now be completed in minutes or seconds, which dramatically reduces model implementation time. Just as working from home saves travel time and fuel costs, performing analysis where the data resides saves time and money. In-database analytics provide accurate results up

28

Big Data Analytics Guide

Predictive model markup language (PMML) is big reason why in-database analytics has become a viable option.

to 10 times faster than traditional methods, for roughly 20 percent less cost. Another advantage is security: Corporate information never leaves the protection of the data warehouse. Further, when reporting tools have the capability to run analytic models inside the database, business users can use familiar tools to get the answers they need. These types of systems give decision makers easy access to powerful analytics on demand. In-Database In Action In-database processing makes data analysis more accessible and relevant for high-throughput, real-time applications including fraud detection, credit scoring, risk management, trend and pattern recognition, predictive analytics, and ad hoc analysis, which allows business users to drill deeper into existing reports or create new ones on the fly. Predictive analytics applications use in-database processing to fuel behavior-based ad targeting and recommendation engines, such as those used by retail Web sites to encourage upsell and cross-sell (How about some batteries to go with that flashlight?) and by customer service organizations to determine next-best actions. The largest mortgage database in the United States uses in-database analytics to assemble ad hoc reports from billions of records, delivering fast results over the Web—around the clock. Customers receive information to make buy decisions on mortgage securities 5 to 10 times faster than before.

so they can stay competitive. In an industry where fractions of seconds can mean success or failure, in-database analysis of historical data and streaming feeds provides fast query execution and immediate risk understanding across multiple business units, so traders can make split-second decisions. A private stock exchange in Asia uses in-database analytics to establish a comprehensive system to detect abusive trading patterns to detect fraud. Credit card companies rely on the speed and accuracy of in-database analytics to identify possible fraudulent transactions. By storing years’ worth of usage data, they can flag atypical amounts, locations, and retailers, and follow up with cardholders before authorizing suspicious activity. For enterprises around the world, in many industries, in-database analytics are providing a competitive advantage. When data doesn’t have to commute to work and back, it can deliver faster insights that help businesspeople make informed decisions in real time—for less expense than traditional data analysis tools. n

Courtney Claussen is a product manager at SAP, concentrating on SAP’s data warehousing and analytics products. She has enjoyed a 30-year career in software development, technical support and product marketing in the areas of computer aided design, computer aided software engineering, database management systems, middleware, and analytics.

Financial institutions use real-time analytics solutions to continuously assess risk positions and market opportunities

Big Data Analytics Guide

29

Analytics Advantage

Data Variety Is the Spice of Analytics Today’s data is as varied and diverse as the entities that create it. Organizations that learn to roll with the dynamic nature of complex data open a new world of business opportunity.

By Amr Awadallah, CTO, Cloudera

Great decisions are usually the result of lots of data. But for many years, businesses that wanted to leverage unstructured or complex data to aid their decision making were limited to what they could glean from extract, transform, load (ETL) processes. Basically, if you couldn’t store it in a structured database, it wasn’t a decision resource.

New heights of profitability and dozens of new business models are based entirely on insights that were previously inaccessible.

30

That is changing rapidly. Thanks to technologies such as Hadoop, businesses today can store raw data in a boggling array of formats and combine it all together in comprehensive analysis. Making sense of this variety is certainly an IT challenge, but it is also the source of great opportunity. When organizations properly instrument their analytics infrastructure to combine varied sources of data and react on the fly to changes in data attributes and schemas, the results can be game-changing. The insights made possible by this dynamic approach to complex, aggregated data are unprecedented in business. For example, the data can tell you—with a high degree of accuracy—what to sell, where to advertise, or when to try something new. They are responsible for new heights of profitability and dozens of new business models based entirely on insights that were previously inaccessible. The New Data Reality We are surrounded by data and yet, until recently, most of it was of little use in its raw form. Each data type is typically so unlike the next in its syntax and structure that comparing— or even storing—such records side-by-side in a relational

Big Data Analytics Guide

paradigm was impossible. Some examples of nonrelational data types include: • sensor output • mobile device data • machine logs • images and video • social media

Today, our most powerful tool for leveraging the potential of data variety is Hadoop. Hadoop makes no attempt to understand data as it is copied. Rather it uses a schema-on-read methodology: It parses data and extracts the required schema only when data is read for processing. Because no development cycle is required to accommodate new values or columns, agility and flexibility are maximized.

These data are very different in character and structure than any data that was previously suitable for analysis. They are complex. They lack schemas, fixed names, or fixed types. They have nested structures rather than tables.

Solving New Problems Processing data in this way is a bit like alchemy. While individually inert, these data combine to become something far larger than the sum of the parts. At Cloudera we have helped dozens of customers create powerful competitive advantages simply by seizing the opportunity that lies dormant in their complex data. These businesses span the gamut of industries and include agriculture, finance, manufacturing, and many more.

But what is most notably different about these data types compared to traditional structured data is that these data frequently change form. The data characteristics that we care about today may not be the same ones that we value tomorrow; thus, individual attributes are often added, dropped, or modified. This dynamic property of new data types is a 180-degree shift from data processing in the past. Traditional structured data doesn’t change form very often, and the set of analytical procedures performed on this data are very well defined. Therefore, when building out relational database systems, organizations could afford to develop a static schema and react to infrequent changes on an ad hoc basis. But in the new data reality, organizations need instrumentation that adapts quickly to change, enabling them to answer questions they didn’t anticipate or build into their data models.

Consumer Goods. A maker of consumer products collects consumer preference and purchasing data extracted from surveys, purchases, web logs, product reviews from online retailers, phone conversations with customer call centers, even raw text picked up from around the Web. Their ambitious goal: to collect everything being said and communicated publicly about their products and extract meaning from it. By doing this, the company develops a nuanced understanding of why certain products succeed and why others fail. They can spot trends that can help them feature the right products in the right marketing media.

Figure 1. Amr_Awadallah-Cloudera article figure caption text here.

Hadoop Use Cases Hadoop Use Cases

Two Core Use Cases Applied Across Verticals Industry Term

Advanced Analytics

Social networking analysis

Vertical Web

Industry Term Clickstream sessionization

Content optimization

Media

Engagement

Network analytics

Telco

Mediation

Loyalty and promotions analysis

Retail

Data factory

Fraud analysis

Financial

Entity analysis

Federal

Sequencing analysis

Bioinfomatics

2

Trade reconciliation

Data Processing

1

Signals Intelligence (SIGINT) Genome mapping

Source: Cloudera

Where to Use Hadoop: In every vertical there are data tasks with which Hadoop can assist. These tasks have different terms depending on the industry but they all come down to either advanced analytics or data processing.

Big Data Analytics Guide

31

Agriculture. A biotechnology firm uses sensor data to optimize crop efficiency. It plants test crops and runs simulations to measure how plants react to various changes in condition. Its data environment constantly adjusts to changes in the attributes of various data it collects, including temperature, water levels, soil composition, growth, output, and gene sequencing of each plant in the test bed. These simulations allow it to discover the optimal environmental conditions for specific gene types. Finance. A major financial institution grew wary of using third-party credit scoring when evaluating new credit applications. Today the bank performs its own credit score analysis for existing customers using a wide range of data, including checking, savings, credit cards, mortgages, and investment data. A Multifaceted Advantage While a nonrelational approach is great for encapsulating and drawing inferences from multivariate data, there are other advantages as well: Scale: Hadoop is the only technology proven to scale to 80 petabytes in a single instance, making the size of the data challenge moot for most organizations.

Economy: Designed from the ground up to deal intelligently with commodity hardware, Hadoop can help organizations transition to low-cost servers. Conservation: Keeping data in a merged, isolated system provides business intelligence benefits and is both financially and ecologically sound. These are compelling advantages, but for many organizations that have been wishing for years to perform analytics on corporate data that can’t be normalized, they are icing on the cake. By transforming data of every type from a cost and management burden into a critical asset, a nonrelational approach to data is raising the efficiency of business around the globe. n

Amr Awadallah is Co-Founder and CTO of Cloudera, where he is responsible for all engineering efforts from product development to release, for both open-source projects and Cloudera’s proprietary software. Prior to Cloudera, Amr served as Vice President of Engineering at Yahoo, and led a team that used Hadoop extensively for data analysis and business intelligence across the Yahoo online services. Amr holds bachelors and master’s degrees in electrical engineering from Cairo University, Egypt, and a doctorate in electrical engineering from Stanford University.

In the new data reality, organizations need instrumentation that adapts quickly to change, enabling them to answer questions they didn’t anticipate or build into their data models.

32

Big Data Analytics Guide

TEXT SEARCH, TEXT ANALYTICS, SEMANTICS, AND OTHER ANALYTICS STRATEGIES HELP MACHINES UNDERSTAND PEOPLE AND EXPOSE MEANINGFUL BUSINESS INFORMATION.

Text Analytics for Speed Reading—Do You Mean What You Say? By Seth Grimes, Strategy Consultant and Industry Analyst, Alta Plana

Text analytics sounds abstruse, but the central idea is simple. Text analytics turns human communications—news articles, blogs, social status updates and online reviews, corporate filings, e-mail, and survey responses—into data that can be crunched to support fast, accurate, optimal business decision making.

Text analytics seeks answers, not the links and documents retrieved by most search systems. Answers are found in the information content of documents.

Big Data Analytics Guide

Potential applications for customer service and support, market research and competitive intelligence, life sciences and clinical medicine, financial services and capital markets, law enforcement and intelligence, and for other business tasks and in other industries can readily leverage text analytics. Businesses that do not apply these new strategies are missing out on valuable insight. The algorithms are quite interesting. They apply statistics, linguistics, and machine learning to discern and exploit information captured in an array of textual sources. Automated solutions take adopters far beyond the capacity and speed, and sometimes the pattern-recognition acuity, achievable via human analyses. Text analytics is part of every comprehensive business intelligence (BI) program, which is a component of any complete analytics strategy. Why Text Analytics? Text conveys both quantitative and qualitative information; it records and communicates events, data, facts, and opinions.

33

Think of all the text that individuals, businesses, governments, and communities of all stripes generate:

From Information to Insight Text analytics involves a few basic, transformative steps:

• A corporate 10-K includes financial data tables and extensive narrative describing products and services, market conditions, operations, and outlook.

1) Collect and select source material, whether via Web retrieval or via hooks into an e-mail system, document repository, database, or file system.

• A news article or opinion piece covers and provides context for events—it may describe people, organizations, and relationships, as well as locations, products, and actions—in narrative form, intended to inform or inspire.

2) Extract the business-relevant information you need.

• A warranty or insurance claim details product or service defects and damage, with significant quality, liability, customer-relationship, and reputation implications; a close reading may also expose fraud.

Information sourcing often starts with search, with a news feed, or via a software connector or application programming interface (API) into an external system. Search alone, it should be noted, is rarely sufficient in today’s fast-paced business environments. Text analytics seeks answers, not the links and documents retrieved by most search systems. Answers are found in the information content of documents, especially in the ensemble of linked documents and databases.

• E-mail traffic is part of corporate decision-making processes, but e-mail use creates risk of inadvertent (or intentional) exposure of sensitive or proprietary information, with compliance repercussions.

3) Apply BI and data-mining techniques that automate text processing and help you generate insights.

Text analytics extracts features such as: • A hotel or restaurant visitor posts likes and dislikes, visible to the world, to an online forum such as TripAdvisor or Yelp. These reviews, and similar examples such as Amazonposted product reviews, expose preferences and flaws and influence consumers’ choices. The richness and diversity of text sources creates both opportunities and challenges. How do you systematically get the information you need and filter the noise? How do you turn text-sourced information into insights that can drive better decision making?

• Entities, such as people, places, companies, products and brands, ticker symbols • Patterned information such as telephone numbers, e-mail addresses, and dates • Topics, themes, and concepts • Associations, facts, relationships, and events, such as a person’s job title, a stock closing price, a vote in Congress • Opinions, attitudes, and emotions, such as sentiment about the spectrum of entities and topics

Semantic computing overlays meaning on data objects and enables content enrichment, advanced categorization, and classification.

34

Big Data Analytics Guide

Semantic information is of particular interest because this information helps us interrelate text-sourced entities and link them to database records and into the emerging semantic Web. Semantic computing overlays meaning on data objects. It enables content enrichment, advanced categorization and classification, dynamic data integration, and semantic search that extend beyond keywords to cover concepts, patterns, and relationships. Text analytics generates the semantic information that fuels the linked-data Web as well as next-generation BI and data mining. There’s huge analytical lift in adding semantic, textsourced information to the enterprise analytics mix, enabling integrated analysis of text and data. Text as Big Data A text-analytics solution can systematically, accurately, and quickly extract whatever information content interests a business. Text’s volume, velocity, and variety—the three Vs of Big Data—do pose a challenge, however. Text is produced 24 hours a day: online, via social media, within the enterprise, in informal chatter and formal settings (such as science labs, courts, and corporations), and in dozens of business-relevant languages around the globe. Fortunately, text-analytics software can handle large data volumes via parallelized, distributed computing frameworks such as Apache Hadoop. The software runs in-memory to cope with data velocity and semantics, and the solutions are tailored to text’s many forms and languages to help discover opportunity in data variety.

Big Data Analytics Guide

Users benefit from a choice of installed, hosted, and cloud implementations. They can access the data via data-analysis workbenches, as a service (via APIs), and embedded in

Text-analytics solutions are tailored to text’s many forms and languages to help discover opportunity in data variety. line-of-business applications. Options available are both commercial and free, with open source distributions. Challenges still exist, but the benefits are huge and the barriers to getting started are low. Text analytics is here-and-now, meeting a spectrum of business needs. The idea is simple: Text analytics helps machines understand people. It erases enterprise data boundaries and is a key source of competitive advantage in 2012 and beyond. n

Seth Grimes is an analytics strategy consultant with Alta Plana Corporation, located near Washington, DC, and a leading IT industry observer focusing on business intelligence, text analytics, and decision support. Grimes is a longtime InformationWeek contributing editor and founding chair of the Sentiment Analysis Symposium and the Text Analytics Summit.

35

WITH FACIAL RECOGNITION SOFTWARE, ANYONE CAN SORT AND IDENTIFY A SINGLE FACE FROM AMONG HUNDREDS, EVEN THOUSANDS, OF POSSIBILITIES.

Image Recognition, Pattern Identification, and the New Memory Game By Joydeep Das, Director, Data Warehousing and Analytics Product Management, SAP

Luckily for society, image-recognition technology has become widely available to automate image analysis quickly. Without automated systems, the torrents of image data encountered daily would overwhelm us. Detecting subtle differences among thousands, even millions of images within reasonable time constraints is beyond human abilities. Simply put, professionals in a variety of enterprises could not get their jobs done without computer-aided image recognition. Take facial recognition technology. As with any image recognition system, it relies on compute-intensive algorithms to determine a person’s unique features—eyes, mouth, nose, and more—to positively identify him or her. But it’s no longer the rarefied task of highly skilled professionals, using banks of servers to crunch enormous amounts of data, to identify a single face. It’s become an everyday tool for consumers, business, and government to improve their personal lives, target customers better, and deliver services to citizens more efficiently.

Without automated systems, the torrents of image data encountered daily would overwhelm us.

36

I See You Today, consumers with off-the-shelf desktop computers use facial recognition software embedded in their photomanagement applications to sort through thousands of their digital photographs. With it, they can quickly organize images of family members and friends with ease. Popular social networking sites, such as Facebook and Google+, include automated “tagging” services that can detect individuals within uploaded photographs by comparing their faces to previously identified people in a member’s image portfolio. Perhaps more impressive, people now carry facial recognition technology in their pockets. Users of iPhone and Android smartphones have applications at their fingertips that use facial recognition technology for various tasks. For example, Android users with the remembAR app, can snap a photo of someone, then bring up stored information about that person based on their image when their own memory lets them down—a potential boon for salespeople. iPhone users can unlock their device with RecognizeMe, an app that uses facial recognition in lieu of a password. If deployed across a large enterprise, this app could save an average of $2.5 million a year in help-desk costs for handling forgotten passwords. Marketers have begun to use facial recognition software to learn how well their advertising succeeds or fails at stimulating interest in their products. A recent study published in the Harvard Business Review looked at what kinds of advertisements compelled viewers to continue watching and what turned viewers off. Among their tools was “a system that analyzes facial expressions to reveal what viewers are feeling.” The research was designed to discover what kinds of promotions induced watchers to share the ads with their social network, helping marketers create ads most likely to “go viral” and improve sales.

Big Data Analytics Guide

The ubiquity of facial recognition technology has raised some thorny social and legal issues.

Air passengers with ePassports in Australia and New Zealand can get first-class service through customs with the deployment of facial recognition technology called SmartGate. Instead of waiting in line for a border control officer, eligible travelers use a kiosk that uses facial recognition software combined with data held on the ePassport to process the person through customs. More than 1 million travelers have used SmartGate, and officials have deemed it “an unqualified success.” State of the Art SAP Sybase IQ is a platform for facial recognition technology in a variety of applications. It stores the image data and executes the image-processing functions inside the database through the User Defined Function interface.

combined with widespread adoption of social networks. For example, in what has been called an example of “digital vigilantism,” after the London riots in summer 2011, a Google group called London Riots Facial Recognition attempted to identify lawbreakers by matching their images caught on CCTV cameras with those on Facebook pages. The ad hoc group abandoned their effort when tests proved disappointing. In Canada, authorities attempted to combine Facebook information and ICBC image data with photos taken during the riots in Vancouver after the local team failed to win the Stanley Cup. But a court ruled that a warrant would be needed by the authorities to use facial recognition tools in this instance.

In one implementation, every new image is represented by numeric data, called “Image DNAs,” which can be compared with the DNA of stored images. First, an IQ database is loaded with a set of training images having particular characteristics. Then a batch of new images is compared against the set of training images to filter out those with similar characteristics, resulting in a much smaller image set to be analyzed further.

Despite the murky legal and privacy ramifications of facial recognition, the technology is now widely available and will not vanish. It will be an increasingly vital tool to help consumers manage their digital selves while helping business and government to deliver improved products and services more efficiently and securely. n

A filtered and processed image is shown with people’s faces outlined with colored boxes. Users select a particular face, click on a “Search” button, and within a few seconds all images in the database that include that specific person will appear. Such a tool can be used for everything from law enforcement to deleting unwanted photos of yourself on Facebook.

Joydeep Das has worked in the field of enterprise databases for over 20 years in leadership roles in engineering and product management. As an engineer, he led several research and development projects in leading DBMS firms. In his product management role, Das has been a strong advocate of SAP’s data warehousing product line in setting its product and business strategy and managing its day-to-day operations. He frequently speaks at tradeshows, user conferences and webcasts.

Face the Facts However, the ubiquity of facial recognition technology has raised some thorny social and legal issues, especially when

Big Data Analytics Guide

37

WHEN IT COMES TO BIG DATA, PEOPLE UP AND DOWN HIGHWAY 101 IN SILICON VALLEY TALK ABOUT TECHNOLOGIES SUCH AS NOSQL, HADOOP, AND MAPREDUCE AS THOUGH THEY’LL SOLVE ALL OF OUR PROBLEMS. WHILE THESE ARE CERTAINLY EXCITING NEW CAPABILITIES, TECHNOLOGY ALONE IS ABOUT 10% OF THE ANSWER TO BIG DATA.

Technology Alone is Not the Answer By Byron Banks, Vice President, Business Analytics, SAP

I’m Byron Banks and have more than 20 years of experience with enterprise applications. Currently I manage a solution marketing team at SAP that is focused on enterprise information management (EIM) and data warehousing (DW) solutions, so I am not new to the challenges that organizations are facing for improving business results by providing integrated, accurate, and trusted information throughout the enterprise. Hence, I do not want to discount the contribution of technologies to solving the Big Data dilemma. Hadoop, MapReduce, and other recent innovations are helping companies deal with ever-increasing amounts of data, whether they’re working with traditional rows of transactional data inside enterprise applications or information in documents,

For example, technology has allowed people to spend entire days reading e-mail, scanning Facebook, and staying current with what’s happening on the Web.

38

images, video, and the whole universe of social media out on the Web. From a technology point of view, we’ve made a lot of progress with providing people with better tools to solve the technical challenge of dealing with this massive amount of ever-changing data, but what we also need to do is help companies leverage this data to support business objectives—be it to improve the efficiency and operations of a business area or make better business decisions of almost any kind. For example, technology has allowed people to spend entire days reading e-mail, scanning Facebook, and staying current with what’s happening on the Web. But does this make them more productive at work? Yes and no. No in the sense that the access to more information in itself doesn’t necessarily make you a better employee and access to everything at once can actually be overwhelming. But having insight to the “right” information can help you close a deal or deliver better customer service if it comes to you in a way that gives you insight to the task at hand. Big Data Equals More Business Insight At SAP, we are focusing not only on the technology dimension of Big Data but also on how to integrate these new innovations into business solutions that help the individual lines of business and industries identify what pieces of the data stream people need access to and how to turn the data into actionable information that the business can understand and use. The real opportunity with Big Data is it gives the business users more sources of knowledge to tap into, to combine with sales and inventory data stored in traditional data warehouses, and thereby get a better, more complete understanding of how their customers perceive them and their brand, what products and services are most appealing, and perhaps what the competition is up to. This better, more complete picture of the market then informs the

Big Data Analytics Guide

But having insight to the “right” information can help you close a deal or deliver better customer service if it comes to you in a way that gives you insight to the task at hand.

business user on what they should be doing next—such as when to run promotions, adjust pricing, or maybe plan new product enhancements. For example, let’s say you are a product manager in athletic footwear and you’re trying to decide what’s going to be the next update of your product. You’re designing next season’s running shoe, and you need to figure out when would be the right time to introduce the next version and what design changes you may want to incorporate. Part of that decision will be based on how well the current version is selling, the inventory level, and the cost and profitability of the current version. A lot of that information is easily accessible in enterprise systems you already have. But once you come to the conclusion that inventory is low, or that price discounting is increasing due to competition, then maybe it’s time to start planning a product update. What will that update entail? Big Data on the Run With running shoes, one popular trend for the past few years has been a very minimal, lightweight running shoe. Based on just past sales data, you could see the recent strong demand and say we need to create another minimalist shoe with a new color and pattern, and that’s your update. Good product managers would also go to industry events, read the relevant press and magazines, and maybe work with a consultant, for more knowledge about industry trends, or even conduct a focus group or two. That’s how you would have proceeded in the past. But wouldn’t it be better to augment that with more insight and analysis based on hard data? By leveraging these new Big Data technologies and integrating them with existing business solutions and processes, innovative organizations can now give that product manager a lot more insight to validate some of the decisions being contemplated. In the realm of athletic footwear, there’s a huge amount of discussion occurring in online communities, blogs, expert commentary, and online magazines. The challenge is that there is so much discussion going on, it is more than a person, or team of people, can read and analyze on their own.

Big Data Analytics Guide

However, by using these new technologies to do the monitoring, aggregating, and analyzing of these numerous communities, tracking running publications, and even following influential runners and coaches on their blogs and Twitter feeds, the application can detect patterns and highlight trends in millions of individual postings. One trend discovered could be that a segment of the market—maybe the “weekend warrior” runner–is encountering Achilles heel injuries, which is a serious injury for runners, when they wear the minimalisttype of running shoe. With this type of information in hand, the product manager can now make more informed business decisions as to how to plan the full product line and associated marketing campaigns so that there will be products and advertising that appeals to the type of runner that will do well with minimalist footwear, and they will also still retain traditional running footwear styles and promotional spending to go after the people not suitable for the “barefoot” runner trend. Used effectively, Big Data combined with your existing enterprise data can help you get closer to your market and your business, to shift traditional conversations around pricing and profitability to one that considers a holistic view of not only what happened yesterday, but also what is happening now, next week, and next month. By doing this, you don’t replace your current best practices for say product planning; you just augment them with additional information sources so that you can ask questions and discover new trends and insights you may have not realized about your business. n

Byron Banks has more than 20 years of experience with enterprise applications. He currently manages a solution marketing team that is focused on enterprise information management (EIM) and data warehousing (DW) solutions that enable organizations to improve business results by making integrated, accurate, and trusted information available throughout the enterprise.

39

Analytics Innovations

What’s All the Hadoop-la About? Hadoop can bring value to Big Data analysis projects, but it’s not the solution to every need.

By Wayne Eckerson, Principal, BI Leader Consulting

There are two types of Big Data in the market today. There is open source software, focused largely around Hadoop, which eliminates upfront licensing costs for managing and processing large volumes of data. And then there are new analytical engines, including appliances and column stores, which provide significantly higher price-performance than the general-purpose relational databases that have dominated the market for three decades. Both sets of Big Data software deliver higher returns on investment than previous generations of data management technology, but in vastly different ways.

While each server may not cost a lot, collectively the price adds up.

40

Hadoop and NoSQL Free Software. Hadoop is an open-source distributed file system that is capable of storing and processing large volumes of data in parallel across a grid of commodity servers. Hadoop emanated from companies such as Google and Yahoo, which needed a cost-effective way to build search indexes. Engineers at these companies knew that traditional relational databases would be prohibitively expensive and technically unwieldy, so they came up with an alternative that they built themselves. Eventually, they gave it to the Apache Software Foundation so others could benefit from their innovations. Today, many companies are implementing Hadoop software from Apache as well as third-party providers such as Cloudera, Hortonworks, EMC, and IBM. Developers see Hadoop as a cost-effective way to get their arms around large volumes of data. Companies are using Hadoop to store, process, and analyze large volumes of Web log data so they can get a better feel for the browsing and shopping behavior of their customers. Previously, most companies outsourced the analysis of their clickstream data or simply let it “fall on the floor” since they couldn’t process it in a timely and costeffective way. Data Agnostic. Besides being free, the other major advantage of Hadoop software is that it can handle any type of data.

Big Data Analytics Guide

Unlike a data warehouse or traditional relational database, Hadoop doesn’t require administrators to model or transform data before they load it. With Hadoop, you simply load and go. This significantly reduces the cost compared to a data warehouse. Most experts assert that 60 to 80% of the cost of building a data warehouse, which can run into the tens of millions of dollars, involves extracting, transforming, and loading (ETL) data. Hadoop virtually eliminates this cost.

people have the skills or experience to run it efficiently in a production environment. These specialists are hard to find, and they don’t come cheap. Members of the Apache Software Foundation admit that Hadoop’s latest release is equivalent to version 1.0 software, so even the experts have a lot to learn since the technology is evolving at a rapid pace. Nonetheless, Hadoop and its NoSQL brethren have opened up a vast new frontier for organizations to profit from their data.

As a result, many companies are starting to use Hadoop as a general-purpose staging area and archive for all their data. So, a telecommunications company can store 12 months of call detail records instead of aggregating that data in the data warehouse and rolling the details to offline storage. With Hadoop, they can keep all their data online, eliminate the cost of data archival systems, and feed the data warehouse with subsets or aggregates that a majority of users want to view. They can also let power users query Hadoop data directly if they want to access all the details or can’t wait for the aggregates to be loaded into the data warehouse.

Analytic Platforms The other type of Big Data predates Hadoop and NoSQL variants by several years. This version of Big Data is less a “movement” than an extension of existing relational database technology optimized for query processing. These analytical platforms span a range of technology, from appliances and columnar databases to shared-nothing, massively parallel processing (MPP) databases. The common thread among them is that most are read-only environments that deliver exceptional price-performance compared to general-purpose relational databases originally designed to run transactionprocessing applications.

Hidden Costs. Of course, nothing in technology is ever free. When it comes to processing data, you either “pay the piper” upfront, as in the data warehousing world, or at query time, as in the Hadoop world. Before querying Hadoop data, a developer needs to understand the structure of the data and all of its anomalies. With a clean, well understood, homogenous data set, this is not difficult. But few corporate data sets fit this description. So a Hadoop developer ends up playing the role of a data warehousing developer at query time, interrogating the data and making sure its format and contents match expectations. Querying Hadoop today is a “buyer beware” environment. Moreover, to run Big Data software, you still need to purchase, install, and manage commodity servers (unless you run your Big Data environment in the cloud, say through Amazon Web Services). While each server may not cost a lot, collectively the price adds up. But what’s more costly is the expertise and software required to administer Hadoop and manage grids of commodity servers. Hadoop is still bleeding-edge technology, and few

Sybase (now SAP) laid the groundwork for the analytical platform market when it launched the first columnar database in 1995. Teradata was also an early forerunner, shipping the first analytical appliance in the early 1980s. Netezza kicked the current market into high gear in 2003 when it unveiled a popular analytical appliance and was soon followed by dozens of startups. Recognizing the opportunity, all the big names in software and hardware—Oracle, IBM, HP, and SAP— subsequently jumped into the market, either by building or buying technology, to provide purpose-built analytical systems to new and existing customers. Although the price tag of these systems often exceeds a million dollars, customers find that the exceptional priceperformance delivers significant business value, in both tangible and intangible form. XO Communications recovered $3 million in lost revenue from a new revenue assurance application it built on an analytical appliance, even before it had paid for the system. It subsequently built or migrated a dozen applications to run on the new purpose-built system, testifying to its value.

Most experts assert that 60 to 80% of the cost of building a data warehouse, which can run into the tens of millions of dollars, involves extracting, transforming, and loading (ETL) data. Hadoop virtually eliminates this cost.

Big Data Analytics Guide

41

To gain a competitive edge for its online automobile valuations, Kelley Blue Book purchased an analytical appliance to run its data warehouse, which was experiencing performance issues. The new system reduces the time needed to process hundreds of millions of automobile valuations from one week to one day. Kelley Blue Book now uses the system to analyze its Web advertising business and deliver dynamic pricing for its Web ads. Challenges. Given the upfront costs of analytical platforms, organizations usually undertake a thorough evaluation of these systems before jumping on board. First, companies must assess whether an analytical platform sufficiently outperforms their existing data warehouse database, which requires testing the systems in their own data centers using their own data across a range of queries. The good news is that the new analytical platforms usually deliver jaw-dropping performance for most queries tested. In fact, many customers don’t believe the initial results and rerun the queries to make sure that the results are valid. Second, companies must choose from more than two dozen analytical platforms on the market today. They must decide whether to purchase an appliance or a software-only system, a columnar database or an MPP database, or an on-premise system or a Web service. Finally, companies must decide what role an analytical platform will play in their data warehousing architectures. Should it serve as the data warehousing platform? If so, does it handle multiple workloads easily or is it a one-trick pony? If the latter, which applications and data sets should be offloaded to the new system? How do you rationalize having two data warehousing environments instead of one?

Today, we find that companies that have tapped out their SQL Server or MySQL data warehouses often replace them with analytical platforms to get better performance. However, companies that have implemented an enterprise data warehouse on Oracle, Teradata, or IBM database systems often find that the best use of analytical platforms is to sit alongside the data warehouse and offload existing analytical workloads or handle new applications. By offloading work, the analytical platform can help organizations avoid a costly upgrade to their data-warehousing platform, which might easily exceed the cost of purchasing an analytical platform. Summary The Big Data movement consists of two separate, but interrelated, markets: one for Hadoop and open source data management software and the other for purpose-built, analytical engines with SQL databases optimized for query processing. Hadoop avoids most of the upfront licensing and loading costs endemic to traditional relational database systems. However, since the technology is still immature, there are hidden costs that have thus far kept many Hadoop implementations experimental in nature. On the other hand, analytical platforms are a more proven technology, but impose significant upfront licensing fees and potential migration costs. Companies wading into the waters of the Big Data stream need to evaluate their options carefully. n

Wayne Eckerson has been a thought leader in the business intelligence field since 1995. He has led numerous research studies and is a noted speaker, blogger, and consultant. He is the author of the best-selling book Performance Dashboards: Measuring, Monitoring, and Managing Your Business, and is currently working on a new book that profiles analytical leaders. He is principal consultant at BI Leader Consulting and director of research at TechTarget.

XO Communications recovered $3 million in lost revenue from a new revenue assurance application it built on an analytical appliance, even before it had paid for the system.

42

Big Data Analytics Guide

IN SOME COMPANIES, BIG DATA IS CAUSING INFORMATION OVERLOAD FOR DECISION MAKERS. FOR OTHERS THAT LEVERAGE CEP TECHNOLOGY, IT’S OFFERING A COMPETITIVE ADVANTAGE.

Fast Flowing Decisions Through Streams of Data By Irfan Khan, Senior Vice President and Chief Technology Officer, SAP Database and Technology

Today, Big Data is swarming over the Internet at punishing volumes and velocities. And the smartest enterprise executives in a variety of markets are pushing their organizations to embrace Complex Event Processing (CEP) as well as in-memory database technologies to analyze and act upon vast amounts of data in the blink of an eye. They do so knowing that for their companies to gain or retain competitive advantage it is imperative that they be able to make fast-flowing decisions through the onrushing streams of data.

Eventually, nearly every large organization will confront Big Data in its own way. But some businesses will face it sooner. Big Data Analytics Guide

When Real Time Is Real Money Eventually, nearly every large organization will confront Big Data in its own way. But some businesses will face it sooner. Take the wireless telecommunications sector. Carriers are already drowning in data. Yet, with the arrival of 4G LTE networks, their current data deluge may very well look like a trickle. According to Cisco, mobile data traffic will grow 26 times between 2010 and 2015, reaching 6.3 exabytes per month by 2015. That’s a stunning compound annual growth rate of 92%. Mobile IP traffic will jump 300% faster than fixed broadband IP levels and will be 8% of the total IP traffic by 2015. But even those amazing growth figures may be too low. Researchers at the New Paradigm Resources Group say Cisco underestimated the impact of 4G LTE on IP traffic levels by a factor of 10. Crunching this data is critical to wireless operators. Real-time CEP analytics as well as in-memory database technologies can be used to predict how these huge traffic volumes, with their attendant massive capacity spikes, will affect resource consumption. They can be used to allocate bandwidth on the fly and target the deployment of resources to avoid network meltdown. Carriers that cannot detect critical changes to their infrastructure in real time will risk rising call-failure rates, increased customer churn, and unfulfilled corporate servicelevel agreements that inevitably lead to nasty surprises on their balance sheets. In the end, it’s all about money.

43

And speaking of money, in the financial services sector the growth in data and its speed of arrival are equally staggering. This sector has long been a leader in pushing the technology envelope to analyze huge amounts of data arriving at machine speed. Still, in the Big Data era, the information pours in at faster rates from numerous sources, potentially overloading existing conventional business intelligence systems. For example, one of many such data sources, The Options Price Reporting Authority (OPRA), provides the details on completed trades and current options, among other vital aspects of the daily markets critical to many financial enterprises. It produces a vast amount of data each and every second. And its volume of information is growing at phenomenal rates. In 1995 OPRA delivered 500 peak messages per second (MPS) to its clients. By 2005 that increased to a whopping 83,000 peak MPS. And in 2010, in only five years, the OPRA feed ballooned to 2,200,000 peak MPS. Each MPS averages 120 bytes, translating to 264 megabytes per second coming from OPRA’s data fire hose alone.

IRONIC CONVERGENCE Among the ironic convergences in technology history, the year 1962 stands out. It was then, according to the Oxford English Dictionary, that the phrase “information overload” found its way into the language. It is also the year that the Computer History Museum assigns to the origin of the Internet. Of course, 1962 is notable for being neither a time of information overload nor of the Internet. For that convergence we had to wait until our day, the era of Big Data However, the 1960s were when computer-generated data began to be used in formal “decision support” systems, the precursors to our modern analytics platforms. Basic quantitative models were developed to analyze information that, at the time, was considered too much for business managers to sift through. It’s amusing to consider that in those days data sets were measured in kilobytes, and 300-baud modems were the high-speed interconnects. Yet, even back then, with relatively paltry amounts of information to evaluate over leisurely time frames, savvy business leaders understood that there was value to be gleaned from the data. There’s no point in wishing we were back in simpler times when “information overload” was a printed green-bar report. Nostalgia has no place in our Big Data era. Luckily, CEP does.

And, as noted, OPRA is merely one of dozens of sources pumping out huge quantities of data. Yet, accepting this data is merely the cost of doing business for most financial services companies. Being able to apply CEP technology and instantly and effectively analyze and intelligently act on the data is the difference between a profit and a loss on each of the millions of daily transactions.

Attempts to blame the volume and velocity of the data for failing to meet customer needs is not a valid excuse. CEP and in-memory database tools exist to conduct real-time analytics in a variety of business environments. Companies that use them will prosper. Companies that don’t will muddle or fail. n

Prospering With CEP Organizations that are unprepared for the data deluge descending upon them are ripe for disaster. If they are unable to glean insight from all the information they are gathering, they will be battered in the marketplace. Consider retailers during the past holiday season that were unable to predict product demand, took orders they could not deliver, or prematurely changed customer service plans, resulting in widespread customer dissatisfaction and hits to their balance sheets and public images.

As Senior Vice President and Chief Technology Officer, Irfan Khan oversees all technology offices in each of Sybase’s business units, ensuring market needs and customer aspirations are reflected within the company’s innovation and product development. Khan is also responsible for setting the architecture and technology direction for the worldwide technical sales organization.

Each MPS averages 120 bytes, translating to 264 megabytes per second coming from OPRA’s data fire hose alone.

44

Big Data Analytics Guide

FROM RETAIL TO LAW ENFORCEMENT, SAVVY COMPANIES ARE FIGURING OUT THAT THE CONNECTIONS WE MAKE TO ONE ANOTHER ARE A VALUABLE SOURCE OF MARKETING INTELLIGENCE.

Age of Influence: Making the Most of Social Networks By Bruno Delahaye, Senior Vice President Worldwide Business Development, KXEN

In a world where media channels are saturated with noise, it’s more difficult than ever to be heard through traditional marketing. Marketers have learned that shouting louder is not effective. Reaching a smaller number of highly influential people—and letting them distribute your message—can be far more effective at a lower cost.

Not all social networks consist of individuals who know one another. That’s a misconception fueled by the popularity of social networking Web sites such as Facebook and LinkedIn.

Big Data Analytics Guide

Welcome, then, to the age of influence. Whether it’s a movie recommendation, fashion advice, or simply directing others’ attention to an unusual YouTube video, influence is a hot commodity. But finding influential people is not as easy as it sounds. It requires advanced mathematical modeling techniques and high-powered computer networks. Despite these challenges, organizations are beginning to learn that social network analysis (SNA)—the process of modeling the relationships between connected people to find patterns—is a powerful source of customer insight. A Sizable Challenge The value of influence is hardly a well-kept secret. Ever since researcher Stanley Milgram published his study on the “six degrees of separation” in 1967, organizations have been trying to figure out how to reach the most influential people. But finding hubs of social influence is a complicated mathematical problem requiring advanced network analytics technology. Historically, such analysis has been expensive and complicated to support. By their nature social networks are inclusive, involving individuals within and outside of the organization’s customer base. They also attempt to describe the nature of the relationship between individuals (nodes) in the model. Thus, the resulting

45

Social Network Analysis

A Social Network Model: SNA provides useful visualizations of the interconnectedness of people. data sets are very large; it’s not unusual to have billions of records in a social network. Just a few years ago, the computational time frame to run such a model would take weeks or months, to say nothing of the cost to store the data. Today the falling cost of storage and computer processing power has finally made SNA an achievable—even practical— undertaking. Today’s state-of-the-art in-memory technologies

SNA can create communities based on product interests and buying patterns. For example, a community of strangers who buy designer handbags can be mined for other preferences and commonalities. allow for a virtually unlimited amount of information to be held in memory, providing a real systematic solution for the massive analytic requirements of SNA.

46

At the same time, new computational techniques have emerged to handle large volumes of social network data more efficiently. With less expensive infrastructure and better mathematic principles impelling it forward, SNA is poised to make big improvements in business intelligence quality for a wide variety of companies. Tuning Predictive Power There’s no such thing as a universal network. Every individual belongs to a range of different networks—for example, one for work, one for home; or perhaps weekday versus weekend behaviors—and within each of those networks the individual plays a different role. A person’s amount of influence can vary wildly from network to network. For example, he or she may be very influential in a technology-related network but be an advice seeker in a network related to fashion or fine art— having little or no influence on others. For marketers the trick is to leverage these differences by optimizing their analyses on a case-by-case basis. Social networks are especially useful when the results are used to amend existing predictive analytic efforts, thereby improving the results.

Big Data Analytics Guide

One of the most valuable uses of SNA is in predicting purchasing behaviors. SNA can show that communities of users tend to change their loyalties in a somewhat predictable manner. For example, in a close community of phone users, it frequently takes only one high-volume phone user to change carriers in order for other friends and family members to follow suit. This is powerful knowledge for a marketing organization: For example, marketers can offer targeted incentives to prevent the exodus before it happens. SNA can also help organizations make sense of the way that viral messages or campaigns diffuse throughout a community. While these messages are often passed among direct friends and acquaintances, many bloggers and Twitter users possess sufficient influence to spur a viral event without direct, person-to-person contact. By adding social network interactions to predictive analytics, you can extract results that tell about user behavior based on who they are connected with, not just about the individual in isolation. Birds of a Feather Not all social networks consist of individuals who know one another. That’s a misconception fueled by the popularity of social networking Web sites such as Facebook and LinkedIn, where individuals establish friends or follow one another directly, which should not be confused with SNA. Direct social networks—where the individuals in a given community all know one another—are generally only available to a few companies. These include phone network operators and transportation companies, which have unique access to telephone call data and travel itineraries. But social networks don’t have to be direct communities of friends and acquaintances. Plenty of companies are getting high value out of indirect social networks—ones where the connections are measured in interest or similarity rather than familiarity. Some examples: Retail. SNA can create communities based on product interests and buying patterns. For example, a community of

strangers who buy designer handbags can be mined for other preferences and commonalities. Using predictive analytics, you may be able to figure out what purchases certain customers are likely to make soon. Fraud/risk. Individuals or companies that engage in fraud are likely to have social connections to other fraudulent entities. Once you’ve identified one customer as a fraudster, SNA can help you identify other potential frauds through transactions, such as wire transfers or medical claims. Assessing risk of nonpayment or bankruptcy can be done in a similar fashion. Law enforcement. Social networks can help investigators locate dangerous individuals through the people they are connected to. In fact, SNA was used by the U.S. government to track, and ultimately locate, Osama bin Laden. Powerful Returns SNA isn’t just a cool idea riding the coattails of today’s most popular online pastime. Regardless of how you use it, the results can be meaningful. A European telecom company improved the accuracy of its predictive analytics by 50%. An online auction site used SNA to offer users recommendations based on past interests and purchases, and saw a 30% lift in clicks as a result. Interconnectedness of people is an old idea that is finally making the technology marriage it needs to reach its business potential. Whether the payoff is higher sales, preventing losses, or keeping customers more engaged and loyal, SNA is a breakthrough technology with broad applicability. n

Bruno Delahaye is responsible for managing and developing strategic partnerships worldwide. Delahaye brings extensive management and technical experience to partner relationships, with over 10 years of experience in providing high return on investment from data mining with partners in sectors like telecommunication, finance and retail.

A European telecom company improved the accuracy of its predictive analytics by 50%.

Big Data Analytics Guide

47

PREDICTIVE MODEL MARKUP LANGUAGE PROVIDES A STANDARD LANGUAGE TO INTEGRATE WITH MANY DATA MINING TOOLS, AND TO AUTOMATE DEPLOYMENT AND EXECUTION OF PREDICTIVE MODELS INSIDE AN ANALYTICS SERVER.

Embracing a Standard for Predictive Analytics By Michael Zeller, Ph.D., CEO, Zementis

Data—its growing volume, velocity, and variety—is driving the rapid adoption of data mining and predictive analytics across all industries. While collecting and using data to make better decisions and understand customer behavior has historically been complex and expensive, it is becoming more standardized and affordable as the market matures. The Predictive Model Markup Language standard, or PMML, is one of the key reasons. PMML is an XML-based language used to define data mining and statistical models. Just as HTML is the standard language for the Web, and SQL is the standard for databases, PMML provides one common framework to address data mining and statistical analysis, making it easy to transfer models between systems and solutions from different vendors. PMML also

Driven by lower cost of data storage and processing, combined with a standardsbased solution stack, the total cost of ownership for predictive solutions is rapidly decreasing.

48

reduces the time it takes to implement and deploy operational data-mining models. Supported in all major commercial and open-source data-mining tools, PMML also extends to business intelligence, database platforms, and Hadoop. Predictive Analytics Predictive analytics employs a variety of statistical techniques to analyze current and historical data to predict future events. When faced with large data volumes that may elicit complex structures and dependencies, it allows an organization to embed real-time intelligent decisions into many missioncritical business processes. Predictive solutions have historically been very specialized and costly to implement. Used frequently for credit-card fraud detection, for example, predictive models can identify unusual patterns (such as an unusually large amount charged in a foreign city), deny the charge, and recommend a follow-up call to the merchant. Until recently, the application of predictive analytics only made sense in scenarios where the potential damages were big enough to justify the investment in related systems and processes. However, as Big Data brings about a huge increase in the amount and use of data, interest, applications, and implementations are growing. Driven by lower cost of data storage and processing, combined with a standards-based solution stack, the total cost of ownership for predictive solutions is rapidly decreasing. As this trend continues, it will open new opportunities for applications that optimize business processes through smarter decisions. We are only at the beginning of this revolution. The Importance of Standards The Data Mining Group (DMG), an independent, vendor-led consortium, first released the PMML standard in 1999.

Big Data Analytics Guide

Having the capability to execute predictive models within existing infrastructure helps organizations quickly capitalize on opportunity while minimizing risk and cost.

Companies have been using data and business intelligence applications to make better decisions for well over a decade. So why is PMML suddenly interesting? Why should you embrace it? The industry is rapidly transitioning from a stage in which each company developed its own custom solution or was content with a single vendor to a point where a standardized framework is increasingly considered a best practice, and vendor-independent, best-of-breed solutions are required to stay competitive. In recent years there has been a surge in interest in predictive analysis, and as more organizations use it, there is a greater need to adopt the standard. PMML provides a common process that results in an immediately deployable predictive model. Through the standard, everyone can speak the same language across the enterprise, external service providers, partners, and vendors. There is no more worrying about custom code or incompatible formats. Documentation of complex statistical models becomes much easier, which is an additional benefit to industries subject to regulatory requirements. That is especially important given today’s typical multivendor, cross-platform data center environment. Database administrators need to leverage existing architecture, skill sets, and software and hardware from different vendors, and they need to deploy analytic models across a heterogeneous infrastructure. PMML shines in that regard: alleviating various friction points, not only allowing models to easily move between different IT systems but also facilitating the communication between various project stakeholders. Having the capability to execute predictive models within existing infrastructure helps organizations quickly capitalize on opportunity while minimizing risk and cost. This is where the PMML standard makes a big difference in accelerating time to market and time to value.

Big Data Analytics Guide

IN-DATABASE PREDICTIVE ANALYTICS: PMML IN ACTION Without a common standard, there is often a disconnect between a scientist’s completed predictive analytics model and its intended use case in the business context. The time required to move predictive models from a development environment to operational deployment can result in costly and frustrating delays, forcing some business decisions into limbo until the transition is complete. With the PMML standard, predictive models can instantly be deployed directly inside a database platform. No custom code or manual transition is required. PMML enables all major commercial and open-source data-mining tools to export models as a standard XML file, which can be efficiently executed inside the database on large data volumes. The advantages of in-database predictive analytics based on the PMML standard include: • Direct integration of advanced analytical algorithms for high-performance scoring • Minimization of data movement to enable efficient processing of very large data sets • Instant execution of predictive models from all major commercial and open-source data-mining tools • Lower total cost of ownership from streamlined, vendor-neutral, platform-independent data mining processes

49

Big Data Opportunities Data is poised to become an organization’s new competitive advantage. Rather than treating data as an afterthought, it is important to recognize the value it can provide. Create awareness in an organization by first systematically capturing essential data and then consistently analyzing stored data to identify patterns and other knowledge it may contain. Data

The industry is awakening to the tremendous benefits of a common language and process. Once businesses see predictive analysis in action, they immediately recognize its value. can tell you how to optimize business processes and make more informed, timely decisions. Predictive analytics uses algorithms that can “learn” and detect complex patterns in data that a human may never see, uncovering hidden value that would have gone undiscovered otherwise. With predictive analytics, many day-to-day decisions can be fully automated. Rather than creating more reports that still require the business user to review and decide, a more intelligent system minimizes manual tasks, allowing the executive to focus on the important decisions that truly require human intervention. Smarter decisions make for better customer experiences, because systems can remedy problems before they occur or recommend the right products at the right time. This can be seen as the next logical step in the evolution of business intelligence. Organizations have the data—often more data than they know what to do with—and the good

50

news is that the tools are already available to turn this data into action right inside your existing database infrastructure. The closer to “real-time” that a business scores customers for cross-sell or upsell recommendations, the more accurate and valuable such recommendations will be. For example, say a customer goes to a Web site and looks at a DVD player one week. The next week, the same customer returns to buy diapers. Real-time scoring will help that Web site identify the customer, recommend the right products at the right time, and take into account new information that was not part of the customer’s profile before, all while balancing the recommendation with underlying business goals to maximize revenue. This is how predictive analysis leverages data in real-time to identify new trends and deliver better customer experiences. More companies are taking advantage of the opportunity to automate processes and become more efficient. The industry is awakening to the tremendous benefits of a common language and process. Once businesses see predictive analysis in action, they immediately recognize its value. Most important, the operational deployment and integration of predictive analytics, which used to be a monumental task, is rapidly becoming easier and more affordable, thanks in part to the PMML standard. n

Michael Zeller, Ph.D., is the CEO and co-founder of Zementis, a software company focused on predictive analytics and advanced enterprise decisionmanagement technology. He has extensive experience in strategic technology implementation, business process improvement, and system integration. Previously he served as CEO of OTW Software and director of engineering for an aerospace firm.

Big Data Analytics Guide

RECENT ADVANCES IN ANALYTICS APPLICATIONS DELIVER BETTER PERFORMANCE IN CRUNCHING MASSIVE DATA SETS.

How Modern Analytics “R” Done By Jeff Erhardt, Chief Operations Officer, Revolution Analytics

For companies across a broad spectrum of industries data represents a new form of capital. Not surprisingly, only a fraction of the valuable data available is ever put to use because most of the tools built to analyze large amounts of data are slow, expensive, and old. Moreover, they were designed for use almost exclusively by specialists with advanced degrees in statistical analysis.

Modern analytic technologies can handle very large volumes of data at very high speeds. Processes that used to take days to perform can now be accomplished in minutes.

Big Data Analytics Guide

Legacy analytic tools were designed to run on legacy hardware. Now, newer techniques enable companies to substitute multiple commodity servers for expensive legacy processors. These “commodity cores” are much less expensive to acquire and operate than single “superprocessors.” The newer multiprocessor methodology is faster, more flexible, and more in step with today’s real-world technology architectures than traditional solutions. New Technologies for a New Era The era of legacy analytic tools is ending, and a new era is beginning. This new era offers solutions that are faster, more cost-effective, more user friendly, and more extensible. These modern analytic technologies can handle very large volumes of data at very high speeds. Processes that used to take days to perform can now be accomplished in minutes. While most organizations have invested heavily in firstgeneration analytics applications, recent advances in second-generation “Big Analytics” platforms have improved both analytical and organizational performance. Big Analytics platforms are optimized for Big Data and utilize today’s massive data sets, thanks to recent performance advancements coupled with innovative analytic techniques. In addition to the analytic routines themselves, data visualization techniques and the way the analytics are executed on various hardware platforms have drastically improved and increased capabilities.

51

Welcome to the World of R The newer, faster, and more powerful technologies that make it possible to find needles of insight in haystacks of data are based on an open-source programming language called R. With more than 2 million users, R has become the de facto standard platform for statistical analysis in the academic, scientific, and analytic communities. If you are part of the data management team at a large global organization, chances are good that you are already developing programs using R. The adoption of R as the lingua franca of analytic statistics is creating a deep pool of fresh talent. Among students, scientists, programmers, and data managers, R is the accepted standard. R represents both the present and the future of statistical analytics. Unlike other programming languages used to crunch large data sets, R is not inextricably tied to any single proprietary system or solution. Because the R programming language is an open-source project, it evolves continually through the contributions of a global community. The Trend Toward Adopting R A “perfect storm” of events is now pushing R beyond its original core audience of students, scientists, and quantitative analysts, and transforming the analytics industry. Two conditions are driving this widespread adoption. The first driver is the data deluge, and the consensus that the companies that most effectively gain insight and predictions from their data will have a competitive edge.

The second driver is the fact that applying predictive models to data is no longer a “secret art.” In universities and colleges worldwide, a new generation of data analysts has been trained in the analytic methods that offer competitive advantage. And the training tool of choice for the majority of those students is the R language. Finally, the economic opportunity is unmistakable: The market for data management and analytic technologies currently generates about $100 billion and is growing at a pace of 10% annually. The market leaders in data analysis software today are based on decades-old technology unable to meet current demands for analysis of huge data sets within an easy-to-use user interface. Benefits and Challenges Open-source software development models offer many benefits—and pose many challenges. The benefits include faster development cycles and lower development costs; the challenges include lack of controls, lack of clear accountability, and lack of support. For many businesses, especially those operating in complex or highly regulated markets, open-source software can be impractical or threatening. The commercial potential of R, however, has led to a surge of interest in developing enhanced “enterprise-grade” versions of R software. These newer applications address the key issues that have prevented R from realizing its full potential as a mainstream enterprise technology. The two primary obstacles facing many R users today involve capacity and performance. For example, most R software cannot currently handle the kind of enormous

The market for data management and analytic technologies currently generates about $100 billion and is growing at a pace of 10% annually.

52

Big Data Analytics Guide

data sets that are generated routinely by large multichannel retailers, consumer packaged good marketers, pharmaceutical companies, global finance organizations, and national government agencies.

R You Ready? The R revolution is just beginning. As it spreads, it will become common practice for business leaders to rely on knowledge

The capacity of R-based solutions is limited by the requirement that all the data has to fit in memory in order to be processed. The algorithms simply won’t scale to accommodate Big Data. This capacity limitation then forces analysts to use smaller samples of data, which can lead to inaccurate or suboptimal results.

Analytics performance can be improved dramatically by distributing the work across a network of computers, reducing processing time from hours to minutes or mere seconds.

The second issue involves the inability of many R applications to read data quickly from files or other sources. Speed is critical in all areas of modern life, and it seems unreasonable to wait weeks or months for a computer to crunch through larger sets of data. Although some software packages claim to address these issues, what’s usually missing is an overarching framework for analyzing Big Data easily and efficiently. Typically, analysts find themselves struggling with a collection of software tools that can create more problems than they solve. This capacity problem can be overcome by using an external memory framework that enables extremely fast chunking of data from large data sets, which typically include billions of rows and thousands of columns. But even the fastest data processing can take hours if it is performed sequentially. Overcoming this performance obstacle requires the capability to distribute computations automatically among multiple cores and multiple computers through the use of parallel external memory algorithms.

generated through rigorous numerical analysis of large data sets. Fact-based decision-making will become the norm instead of the exception. n

Jeff Erhardt, Chief Operations Officer at Revolution Analytics, is an executive with extensive and diverse experience at Fortune 400 companies in technology, operations, finance, strategy, and M&A. He began his career at Advanced Micro Devices where he was responsible for the development and commercialization of leading-edge semiconductor devices. Erhardt graduated with a B.S. in Engineering, Cum Laude, from Cornell University, and an M.B.A. with honors from The Wharton School.

For example, a computer with four cores can perform analytic calculations very quickly because one core reads the data while the other three cores process the data. Performance can be improved even more dramatically by distributing the work across a network of computers, reducing processing time from hours to minutes or mere seconds.

Big Data Analytics Guide

53

MOBILE PROVIDERS ARE TURNING TO ADVANCED ANALYTICS TO CONTROL COSTS AND ENSURE QUALITY OF SERVICE IN THE FACE OF VORACIOUS CUSTOMER DEMAND.

Navigating a 4G World By Greg Dunn, Vice President, Sybase 365

In a recent survey of 100 mobile service providers by Heavy Reading, almost half the respondents (46%) expected a fifth of their customers (20%) to be using a 4G LTE device by 2014. Three-quarters of the respondents expected that many of their subscribers to be on 4G LTE by 2016. Clearly, mobile service providers see a strong trend toward 4G in their industry. During that time, the 80% of their users with older devices will require providers to keep managing multistandard, multiband, and even multimode networks. This requirement, combined with growth in subscribers and bandwidth use, makes managing 4G LTE networks inordinately complex. That complexity can threaten a network’s quality-of-service levels.

The shift to 4G LTE networks will be an obvious boon to mobile customers. And it has the potential to create new revenue opportunities for network operators.

54

In addition, these highly intricate networks will increase the operational costs while competitive pressures continue to decrease average revenue per user (ARPU) for service providers. With these factors in place operators will be scrambling to employ high levels of automation within their networks to offset the rising operational costs. One of the most effective tools for bolstering existing network management systems will be the use of advanced analytics. Used strategically, advanced analytics can give operators the ability to more accurately plan network capacity while streamlining resource optimization—both vital to effectively managing operating expenses. In the 4G LTE era, advanced analytics will be critical for service providers for identifying and making informed decisions in near-real-time about subscribers’ usage, and acting to offset anomalies or exploit opportunities. Opportunity and technology are converging to transform the telecommunications industry, and Big Data, with its attendant advances in analytics, is driving the change. Nowhere is the impact of Big Data greater than among telecommunications because nowhere is Big Data bigger than in this industry—a status that will increase by orders of magnitude with the ongoing rollout of 4G LTE. Opportunity and Challenge To accommodate the needs of smartphone users, wireless carriers are rapidly making the move from a voice-oriented, circuit-switched design to a full-fledged, IP-based dataoriented architecture. The data-consumption habits of these users are pushing the old 3G architectures to the breaking

Big Data Analytics Guide

In the 4G LTE era, advanced analytics will be critical for service providers to identify and make informed decisions in near-real-time.

point. One study of 42 countries—encompassing nearly threequarters of the world’s mobile subscribers—shows that some locales have surpassed 50% smartphone penetration, and another 20 nations already have 30% of mobile subscribers using smartphones. The telcos have virtually no choice but to embrace 4G LTE, because its higher performance is essential to sustaining business growth. ARPU is on the decline for 3G voice and short message service (SMS), while revenue from data use by smartphone users is on the rise. This pressures carriers to deploy 4G LTE as fast as possible to remain competitive. Optimizing Operations Rolling out a 4G LTE system is not only expensive to implement, it is complex to manage. One promising solution is to create multimode networks using much more complex—but much more effective—small cell stations. This type of environment has greater potential for service interruptions than a “pure” architecture. Savvy mobile providers are using intelligent tools to forecast everything from anticipated traffic loads to catastrophic equipment failures. Advanced analytics software, once considered primarily a front-office investment, is becoming an operational necessity. IDC estimates that carriers will get a 277% ROI boost in operational efficiency from applying analytics to operations.

CASE STUDY: BIG SAVINGS WITH BIG DATA Facing a flood of Big Data, a major wireless telecommunications carrier in the EMEA region reviewed its options and adopted a purpose-built analytics engine. Company policies dictated that the telco retain six months of data capable of being analyzed through hundreds of standardized reports, as well as countless daily ad hoc queries. Its analytics data warehouse would contain more than 600 TB of uncompressed data on average. With the indexes and summaries of the raw data, the total amounted to more than a petabyte. But the analytics engine’s columnar approach to storing information dramatically reduced the size of data to be stored, to just under 105 TB, delivering immediate and dramatic savings. On average, 400 users access 95 TB of information daily. At least 10 billion of rows of data are processed every day, and 60 data streams pour information into the data center from the carrier’s network infrastructure. Within 30 minutes this binary deluge goes through an extract, transform, and load (ETL) process, making it ready for real-time queries. As with any large wireless carrier, innumerable changes to the data occur every second of every day. The analytics solution can immediately isolate and flag problems and inconsistencies in the network to assure, for example, that every call, message or data transfer is charged appropriately. In this area alone, the company estimates it saves millions of dollars a year.

Telcos are taking advantage of the latest in analytics technology—such as in-database analysis, MAPReduce, Hadoop, columnar architectures, and event stream processing (ESP). With these capabilities, sensor and machine data are analyzed in real-time conditions.

Big Data Analytics Guide

55

Figure 1. Higher performance 4G LTE networks are operationally more complex to run than previous generation mobile networks. Here is another art item that will go with the Telco article. No author is assigned just yet. It will be in Ch 4/Analytic Techniques.

Higher Performance 4G LTE Networks Higher performance 4G LTE networks Network

Data Center Packet core control and data plane

Network

Service enablers

Access/packet backhaul

Macro cells Micro cells

Multimode Base transceiver station

LTE eNodeB

RNC Base station controller

Data center Service enablers App servers Authentication, Domain authorization, name accounting system

Serving GPRS service node Gateway GPRS support node

Signaling gateway

Pico cells WiFi APs

Virtualized environment

GRX

Mobile packet core

Packet Data Network Gateway LTE eNodeB

App server environment

Content delivery network

Home subscriber server

Mobility management entity Roaming

Cloud servers

Policy and charging function

Policy servers

Internet offload

Source: IDC

Figure 1. Higher performance 4G LTE networks are operationally more complex to run than previous generation mobile networks. For example, using ESP lets carriers process multiple streams of data in real time, while filtering the data with custom business logic. This offers continuous insight with millisecond latency on hundreds of thousands of events per second. Alerts can be created when conditions warrant, and automated responses can be applied to predefined situations. Big Data Is No Big Deal There has been much handwringing about how Big Data is overwhelming some organizations. But it does not have to be so. Even telecommunication carriers—awash in data storms arising from new technologies like 4G LTE or emerging regulations—have little to fear from Big Data deluges. With a purpose-built analytics engine, Big Data is truly no big deal. It is merely the ongoing business environment, albeit a challenging one.

56

In fact, the era of Big Data is an opportunity for carriers to exploit competitive advantages. The knowledge within it can be used to develop new revenue-generating services, identify unnecessary costs, improve operations, predict subscriber activity, and more. By taking control of Big Data, telecommunications operators will gain a firmer grip on their market and their future. And that is a very big deal. n

Greg Dunn manages the global product management group leading the efforts for hosted and business solutions focusing on B2C services within the mCommerce, telco, and enterprise related verticals for SAP. He is specifically responsible for driving roadmaps, strategy, and product delivery to position SAP as a leader in the mobility sector.

Big Data Analytics Guide

EVERY DEPARTMENT AND EMPLOYEE CAN BENEFIT FROM THE INSIGHTS GENERATED BY ANALYZING BIG DATA, AND SOLUTIONS ARE AVAILABLE TO MAKE THOSE INSIGHTS PREVALENT THROUGHOUT THE ENTERPRISE.

Increasing the IQ of Everyone with Analytics By Jürgen Hirsch, CEO, Qyte GmbH

In our personal lives, we take for granted the ability to query mass data via the Web and receive results on-screen within seconds. Appointments and contacts are stored on laptops and mobile devices and in the cloud. Social contacts and news feeds, as well as media data ranging to hundreds of gigabytes are available on several devices for permanent use. Engaging with this vast amount of data is a daily routine.

This challenge is especially notable in the area of data analysis and business intelligence applications.

Big Data Analytics Guide

In contrast, mass-data-querying options are rare for most companies. They lack high security enterprise solutions that protect customer data and processes against unauthorized access and ensure data integrity. This void, in turn, limits employee potential and capabilities, as employees miss crucial information that could help them in their jobs. This challenge is especially notable in the area of data analysis and business intelligence applications. Here, a process has been established in which a user specifies the analysis requirements for the IT department and—after an unspecific waiting period—receives a report, which is often nothing like what was originally requested. The requested data analysis can be unrecognizable and not helpful due to analytic tools that have strict rule restrictions and are inflexible. Gaining Access to Data Why do expert users, who know and monitor relevant processes and methods, not get direct

57

access to the data that they know and are in a position to evaluate? A number of common reasons and objections are often cited for not developing and supporting easier, more direct access to data. 1. The performance of the data warehouse systems is at risk of breaking down if too many users run ad hoc queries across corporate data. Existing data warehouse solutions offer only short periods of time for direct queries; professional analyses require the creation of a data mart first. 2. Nobody actually understands the data structure of Big Data instances, and nobody is able to define correct queries. 3. Most users do not have the know-how it takes to run such queries on their own. While all of these objections are valid, how can these obstacles be overcome so that the business can benefit from its data? With regard to the first obstacle, the data warehouse market is still dominated by traditional relational database management systems (DBMSes). These have never been designed for fast responses to ad hoc queries on mass data. To fulfill the increasing analytics demands, numerous IT managers opt for one of two paths. In one approach, the DBMSes are upgraded up to the limits of what is feasible and possible—with the hope

of increasing performance by adding more hardware. The systems are further normalized in the database design (one table becomes X tables that are interconnected). In the second approach, data is aggregated. Instead of a single set, sums are built for product groups and time ranges, which reduces the amount of data for the query but also reduces the amount of information returned. As a result, column-oriented database systems are strongly recommended. These systems store immense numbers of single data entries cost-efficiently and offer them for further deployment at very high performance. Data marts or data aggregates are thus made redundant. Access is controlled via user rights and roles, and giving most users read-only access to this data avoids manipulation. Proven and fast replication systems help to keep the data pools up-to-date. Regarding the second obstacle, unwieldy data structures and inexact queries, one way to increase performance in the DBMS is to transfer the data into as high a normal form as possible. This method reduces redundant information to a single entry. Yet it also reduces clarity and comprehensibility of table models. Often the person who requested the information will not even recognize the resulting result, as the DB Optimizer has transformed it into 15 separate tables with eight different connector keys. Such proceedings are no longer necessary for column-based DBMSes. Columnar databases store redundant information only once within one column and offer all other information via an appropriate index. Thereby, tables and views can

Column-oriented database systems are strongly recommended. They can store immense numbers of single data entries cost-efficiently and can offer them for further deployment at very high performance.

58

Big Data Analytics Guide

be processed in such a way that the end user can still recognize and use the data. As for the third obstacle, workers who lack querying expertise, this can be described best with the saying, “When the only tool you have is a hammer, every problem looks like a nail.” Until now, the use of data for analytic purposes was determined by the limited access and capabilities of the tools. With intelligent, robust tools available, more users can take on these queries and be successful in finding answers. Querying Power to the People: A Case Study Enterprise-wide access to data is gaining momentum. One, early supporter, a large German health insurer, wanted to make data available to its end-users for analytical purposes. This data was stored in an Oracle database and the access was granted via Discoverer. As soon as end-users finished creating scripts with their query tool, their queries vanished into a query batch—and few returned. To get access to more analytic features, the health insurer deployed Qyte’s analytics tool RayQ and connected it to the Oracle database. To further improve the query performance, Qyte and the customer jointly implemented SAP Sybase IQ and fed it with the source data. Response times have been reduced to a fraction of the previous length, even though the number of users has grown considerably. Further improvements are possible by integrating SAP HANA. Since more and more customer data is generated in SAP systems, a direct replication of the data could be realized in SAP Sybase IQ via PBS/NLS. As more data from an SAP HANA environment is

Big Data Analytics Guide

available, RayQ can connect to HANA and SAP Sybase IQ at the same time and use the advantages from both

Until now, the use of data for analytic purposes was determined by the limited access and capabilities of the tools. With intelligent, robust tools available, more users can take on these queries and be successful in finding answers. systems simultaneously and make the data available to end-users. It is definitely possible to grant a broad group of expert users direct access to mass data and make the data available for professional analysis. It simply requires the application of suitable technologies and tools, which are available and proven in numerous tests. n

Jürgen Hirsch is CEO of Qyte GmbH. He has followed an entrepreneurial path since the age of 17. In addition to managing the company, he is responsible for strategic sales and partner management. He coordinates projects to open up new topics and application fields for the product RayQ. For more than 10 years, Hirsch has worked in fraud detection using data analytics, focusing on fraud prevention in healthcare and stock trading.

59

Market Data

The Big Deal with Big Data The volume, velocity, and variety of data coursing into organizations today are continually increasing. Organizations must find a way to ride the Big Data wave or risk being pulled under water.

Today, organizations of all types and sizes are inundated with data from various internal and external sources, from transactional data to unstructured data from social media and other sources. Organizations can struggle to get ahead of—or out from under—the increasing piles of data flooding into their businesses, or they can leverage the data to gain competitive advantage, to fight fraud, to ease regulatory compliance, or to boost operational efficiencies.

What’s clear is that companies need to come up with a plan to manage, store, and take advantage of the potential benefits of Big Data.

60

While there are many definitions of Big Data, it’s generally agreed that Big Data comprises enormous data sets and the technologies that are now available to help organizations successfully deal with and use the data deluge. What’s clear is that companies need to come up with a plan to manage, store, and take advantage of the potential benefits of Big Data. The good news is that the vast majority of organizations are at least exploring their Big Data options, according to findings from a survey of 154 C-suite executives at multinational companies performed online in the United States in April 2012 by Harris Interactive® on behalf of Bite Communications and its client, SAP. In general, organizations see the opportunities Big Data presents, as opposed to seeing only the challenges of wrestling with huge amounts of data, and most respondents identified an array of competitive and business benefits of successfully managing and using Big Data. Hopefully, this market data will provide useful insight for your upcoming plans for Big Data. ■

Big Data Analytics Guide

Figure MR_O1_Q705. Which definition of big data most closely identifies your company’s definition?

Big Data Definitions Big Data Definitions

Requirement to store and archive data for regulatory and compliance 19%

Massive growth of transaction data, including data from customers and the supply chain 28%

Explosion of new data sources (social media, mobile device, and machine-generated devices) 18%

New technologies designed to address the volume, variety, and velocity challenges of Big Data 24%

Some other definition 11% Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Defining Data: About a quarter of C-suite executives say their company believes Big Data has

Figure MR_O2_Q711. Q711. you view big data as morequarter of a challenge, or more of an to do with the growth of Do transaction data. Another of top-level executives say their opportunity for your company?

organization defines Big Data as the technologies created to address volume, variety, and velocity challenges Big Data presents.

BigData–Challenge Data—Challenge or Opportunity Big or Opportunity More of an opportunity 76%

More of a challenge 24%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Figure MR_O3_Q716. What typeAofvast datamajority sets, either social/external or existing, believe that Big Data Challenge vs. Opportunity: (76%) of C-suite executives does your opportunities company prioritize? presents for their companies, while only a quarter see Big Data as creating

challenges.

Data Set Prioritizations (Company) Data Set Prioritizations (Company)

Prioritize social/external data sets

27%

Prioritize existing data sets

73%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Existing Data Sets: Nearly three-fourths (73%) of C-suite executives said their organizations prioritize existing data sets, while only about one-quarter felt their companies put more importance on social and external data sets. Big Data Analytics Guide

61

Figure MR_O4_Q716. Q721. What type of data sets, either social/external or existing, do you personally prioritize?

Data Set Prioritizations (Personal Preference) Data Set Prioritizations (Personal Preference)

Prioritize social/external data sets

30%

Prioritize existing data sets

70%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Figure MR_O5_Q730. infrastructure does your company usepersonally to store prioritize Focusing on What’sWhat Within: A majoritysolution (70%) of C-suite executives and manage its big data? existing data sets over social and external data sets, like their companies. Only three in 10 say they personally prioritize social and external data sets.

Infrastructure Solutions forand Storing and Big Data Infrastructure Solutions for Storing Managing BigManaging Data Private cloud or off-premise server farms used only by my company 27%

Other 3% A hybrid of data warehouse and cloud technology 26%

Public cloud (Rackspace, Amazon) 11%

Data warehouse (storage equipment, data localization) 33%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Looking Up to the Clouds: A majority of companies have adopted the cloud in some form. More than one-third (38%) of C-suite executives report that their company solely uses cloud technology to store and manage Big Data, 27% say their firm uses private cloud or off-premise server farms, and 11% use a public cloud.

62

Big Data Analytics Guide

Figure MR_O6_Q736. What percentage of the model you deploy for big data is comprised of data warehouse vs. cloud?

Deployed Deployed Big Big DataData ModelsModels Private cloud 24.4% Data warehouse 53%

Public cloud 22.6%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Figure MR_O7_Q740. Which areas of your business, if any, youcompany see receiving Big Data Storage: Among C-level executives who say do their uses a hybrid of data significant growth/benefits from the utilization of big data? warehouse and cloud technology to store Big Data, data warehousing comprises 53% of all Please select all that apply.

hybrid solutions, on average.

Big Data Benefits and Growth Areas Big Data Benefits and Growth Areas Information Technology (IT)/MIS 58% Sales 57% Marketing 54% Customer service 54% Production/Operations 46% Research and development 43% Finance/Billing 37% Distribution/ Warehousing/Shipping and receiving 37% Administration 33% Human Resources 31% Advertising/Public relations 24% Facilities 17% Other 3% None 6% Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Big Data, Big Benefits: Nearly all (94%) of C-level executives were able to identify business areas that would benefit from Big Data. More than half of those surveyed said that information technology (IT) MIS, sales, marketing, and customer service were areas that would benefit from utilizing Big Data.

Big Data Analytics Guide

63

Figure MR_O8_Q745. Which of the following competitive advantages, if any, do you anticipate your company would gain by utilizing big data? Please select all that apply.

Competitive Advantages of Big Data Competitive Advantages of Big Data Improving efficiency in business operations 59% Increased sales 54% Lowering IT costs 50% Increasing business agility 48% Attracting and retaining customers 46% Ensuring compliance 41% Increased savings and cutting of spending 39% Increased brand exposure 36% Lowering risk 34% Introducing new products/services 32% Developing new channels to market 29% Outsourcing of non-core functions 27% Mergers, acquisitions, and divestitures 21% Expanding partner ecosystem 20% Other 1% None 7% Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Gaining a Competitive Edge: The vast majority (93%) of C-suite executives surveyed was able to identify areas in which their company could potentially gain competitive advantage Figure MR_O9_Q750. When you expect your company see a return by using Big Data. The topwould five anticipated areas includeto improving business operations on big data investments? efficiencies, boosting sales, lowering IT costs, increasing business agility, and attracting and retaining customers.

Big Data ROI Timeline Big Data ROI Timeline Within a year (Net) 70% Within three months 17% Within three to six months 12% Within six months to a year 41%

More than one year 19% Not sure 11%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Rapid Returns: About seven in 10 of C-level executives surveyed anticipate their organization will see a return on their Big Data investments within one year.

64

Big Data Analytics Guide

Figure MR_10_Q755. What is the total amount your company has spent/plans to spend on investing in a solution to store and manage its big data?

Big Data Investments Big Data Investments 15%

12%

9%

6%

3%

0% Less than $10K 12%

$10K to $24.9K 4%

$25K to $49.9K 7%

$50K to $99.9K 10%

$100K to $249.9K 11%

$250K to $499.9K 6%

$500K to $749.9K 5%

$750K to $999.9K 8%

$1,000K to $2,499.9K 8%

$2,500K to $4,999.9K 6%

$5,000K or more 9%

Not sure 14%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Investing Big: More than half of C-level executives (53%) say their company has spent

Figure MR_11_Q760. What is the your company spends on Big Data. Nearly a or plans to spend $100,000 oramount more on solutions totypically store and manage monthly maintenance costs for storing and managing its big data? quarter (23%) plans to spend or has spent at least $1 million or more on Big Data

management and storage.

Maintenance Costs for Storing and Managing Big Data Maintenance Costs for Storing and Managing Big Data Less than $5,000 28% $5,000-$9,999 7% $10,000-$24,999 11% $25,000-$49,999 5% $50,000 or more 12% Not sure of monthly cost 37% Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Monthly Maintenance: The median amount that C-suite executives say their company spends to store and manage Big Data is $5,000 a month. More than one-fourth (28%) of survey respondents report that their firm spends at least $10,000 or more each month on Big Data management and storage.

Big Data Analytics Guide

65

Figure MR_12_Q765. What is the average size of data that your company manages in a typical big data project? If you are not sure, please give your best estimate.

Average Size of Data in Big Data Projects Average Size of Data in Big Data Projects 500TB or less (Net) 80% 100TB or larger (Sub-sub-sub-net) 20%

500TB 16% 1TB or smaller 20% 10TB 24% 100TB 20%

1000TB 10% 1PB or larger 10%

Source: Harris Interactive, Inc., J41673-Bite Communications C-Suite Study, April 10–23, 2012

Figure MR_13_Q770. How important is having instant access to data in mobile BI/real-time analytics? Great Big Data Projects: Collectively, more than half (56%) of C-suite executives report that

the average size of their company’s Big Data projects are 100 TB or larger.

Importance of Instant Access to Data Importance of Instant Access to Data in Mobile Business Intelligence and in Mobile Business Intelligence and Real-Time Analytics Real-Time Analytics At least somewhat important (Net) 90% Absolutely essential 16% Very important 41% Somewhat important 33%

Not at all important 10%

Source:10 Harris Interactive, Inc., J41673-Bite C-Suite Study, April 10–23, 2012 Slide title: Approaches and Communications Methodologies Needing Support

Quick and Easy Access: Nine 10increasing of C-level executives surveyed consider having instant Cost and budget constraints, as wellinas data volumes, are the top limitiations companies are experiencing with respect to data management. access to data in mobile business intelligence and real-time analytics at least somewhat important.

Database Approaches and Methodologies Needing Support Agile business intelligence methodologies 36%, 39%, 32% Row-based data structure 35%, 35%, 34% Columnar data structure 28%, 22%, 34% Massively parallel processing grid 16%, 15%, 16% MapReduce/Hadoop data structure 14%, 15%, 13% Kimball method 7%, 9%, 5% All respondents Senior IT Mid-level IT

Inmon data warehouse principles 6%, 7%, 4% None of the above 29%, 24,% 33%

Companies managing 500-plus TB or more of data are more likely to have plans to support Agile BI as well as MapReduce/Hadoop data structure over the next year.

Source: IDG Research Services

Database Needs: Both senior and mid-level IT management report a need to support agile business intelligence and row-based data structure approaches over the next 12 months.

66

Big Data Analytics Guide

Figure MR_14_Slide 9 title: Data Management Challenges Cost and budget constraints, as well as increasing data volumes, are the top limitiations companies are experiencing with respect to data management.

Data Management Challenges and Limitations Cost and budget constraints 54%, 55%, 53% Increasing data volumes 45%, 39%, 50% Integrating and managing siloed data and applications 33%, 33%, 32% Inadequate staffing for database management and maintenance 29%, 25%, 32% Scalability 29%, 26%, 31% Data redundancy 28%, 25%, 31% Too many tools/interfaces 28%, 28%, 27% Data quality 26%, 30%, 22% Complex to use or administer 26%, 26,% 26% Slow querying/reporting speed 25%, 18%, 31% Difficult to maintain 23%, 24%, 22% Data latency 17%, 16%, 18% Inability to support diverse data sources 15%, 20%, 9% Inability to handle complex queries 8%, 8%, 8% Current database solution can’t load data fast enough 7%, 4%, 10% Inability to support enough concurrent users 7%, 8%, 5% All respondents Senior IT Mid-level IT

Other 4%, 4%, 3% None of the above 7%, 5%, 8%

Senior IT respondents are more likely than mid-level IT to cite an inability to support diverse data sources as a challenge. Additionally, data redundancy is more likely to be identified as a challenge at companies managing 500-plus TB of data (47% vs. 25% among those managing less than 500 TB of data).

Source: IDG Research Services

Data Management Headaches: Cost and budget constraints as well as increasing data volumes, are the top limitations companies are experiencing with respect to data management.

Big Data Analytics Guide

67

Company Index

Seth Grimes founded Alta Plana Corporation in 1997 to deliver business analytics strategy consulting and implementation services with a focus on advanced analytics (business intelligence, text mining, data visualization, analytical databases, complex event processing), as well as management, analysis, and dissemination of governmental statistics. Via Alta Plana, Grimes consults, presents, writes, and trains, bridging the concerns of end-users and solution providers. The company delivers fresh, insightful, and actionable perspectives on critical challenges that face enterprises in today’s rapidly evolving information technology market.

Dr. Brian Bandey is acknowledged as one of the leading experts on computer law and the international application of intellectual property law to computer and Internet programming technologies. His experience in the global computer law environment spans more than three decades. He is the author of a definitive legal practitioners textbook, and his commentaries on contemporary IT legal issues are regularly published throughout the world. Visit drbandey.com

Visit altaplana.com

BI Leader Consulting provides advisory services to user and vendor organizations in the areas of data warehousing, business intelligence, performance management, and business analytics. Wayne Eckerson, the company principal, is a veteran thought leader in the business intelligence field as well as a noted speaker, blogger, consultant, and author of several books and many in-depth reports. He also founded BI Leadership Forum, which promotes best practices and knowledge sharing among business intelligence directors worldwide. Visit bileader.com

Cloudera, a leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. Cloudera’s Distribution including Apache Hadoop (CDH) is a comprehensive, tested, stable, and widely deployed distribution of Hadoop in commercial and non-commercial environments. For the fastest path to reliably using this open source technology in production for Big Data analytics and answering previously unanswerable, big questions, organizations can subscribe to Cloudera Enterprise, comprised of Cloudera Manager Software and Cloudera Support. Cloudera also offers training and certification on Apache technologies as well as consulting services. As a top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, Web, advertising, retail, energy, bioinformatics, pharmaceuticals/healthcare, university research, oil and gas, and gaming, Cloudera has a depth of experience and commitment to sharing expertise. Visit cloudera.com

68

Big Data Analytics Guide

Fuzzy Logix, an analytics software and professional services company, provides a new generation of in-database analytic solutions that help companies make smarter decisions and improve effectiveness and performance. Clients can embed analytics directly in their business processes, enterprise applications, mobile devices, and Web services using in-database analytics that run inside the data warehouse. Visit www.fuzzyl.com

Qyte GmbH was founded in 1999 as subsidiary of the Hirsch & Sachs GmbH. While Hirsch & Sachs GmbH provided high quality IT-Services, Qyte GmbH worked on the programming of the data mining software RayQ. The strategic aim was to transform from a pure IT-service company into a software house with consulting competencies in all questions regarding data. In May 2004 the transformation was successfully concluded and the operative business was re-structured. Since then, the activities of Qyte GmbH focus on the continuous development and the distribution of leading-edge data mining and the business intelligence solution RayQ, as well as on the provisioning of services and consulting with regard to all aspects of our clients’ data. The company has established a network of reliable and highly competent partners, which help market the RayQ solution and enable Qyte GmbH to rely on a big competence pool when staffing project teams. Visit qyte.com

KXEN is helping companies use predictive analytics to make better decisions. Based on patented innovations, the company’s InfiniteInsight™ delivers orders with speed and agility to optimize every step in the customer lifecycle, including acquisition, cross-sell, up-sell, retention, and next best activity. Proven with more than 400 deployments at companies such as Bank of America, Barclays, Wells Fargo, Lowe’s, Meredith Corporation, Rogers, and Vodafone, KXEN solutions deliver predictive power and infinite insight™. KXEN is headquartered in San Francisco, Calif., with field offices in the United States, Paris, and London. Visit kxen.com

Revolution Analytics is a leading commercial provider of software and services based on the open source R project for statistical computing. The company brings high performance, productivity, and enterprise readiness to R, the most powerful statistics language in the world. The company’s flagship Revolution R Enterprise product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing, and media. Used by more than 2 million analysts in academia and at cutting-edge companies such as Google, Bank of America, and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups, and offering free licenses of Revolution R Enterprise to everyone in academia. Visit revolutionanalytics.com

Big Data Analytics Guide

69

Company Index

TDWI Research provides research and advice for business intelligence and data warehousing professionals worldwide. TDWI Research focuses exclusively on business intelligence and data warehouse issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence and data warehousing solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations.

Zementis is a software company focused on predictive analytics and advanced enterprise decision management technology. It combines science and software to create superior business and industrial solutions for clients. Scientific expertise includes statistical algorithms, machine learning, neural networks, and intelligent systems. Zementis scientists have a proven record in producing effective predictive models to extract hidden patterns from a variety of data types. This experience is complemented by the product offering ADAPA®, a decision engine framework for real-time execution of predictive models and business rules. Visit zementis.com

Visit TDWI.org

Acknowledgments

Editorial Team Editors: Lori Cleary, Becca Freed, Elke Peterson, BaySide Media Executive Producer: Don Marzetta, SAP Co-Producer: David Jonker, SAP Graphic Designer: Margaret Anderson, BaySide Media Developed and produced with help from BaySide Media, 201 4th St., Ste 305, Oakland, CA 94607 BaySideMedia.com

70

Big Data Analytics Guide

www.sap.com/contactsap

Material # 2012/07

(12/08) ©2012 SAP AG. All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company. Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase Inc. Sybase is an SAP company. Crossgate, [email protected] EDDY, B2B 360°, and B2B 360° Services are registered trademarks of Crossgate AG in Germany and other countries. Crossgate is an SAP company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

Recommend Documents
Jun 27, 2013 - A big data analytics system obtains a plurality of manufac. _ turing parameters associated With a manufacturing facility. (21) Appl' NO" 13/929' ...

APPLIED BIG DATA ANALYTICS. A one week program for a working professional or a student with programming skills to learn data science tools and.

Wal-Mart handles more than a million customer transactions each hour and imports those into databases estimated to contain more than 2.5 petabytes of data.

The big data analytics system identi?es ?rst real-time data from a plurality of data sources to store in memory-resident. (22) Filed: Jun. 27, 2013 storage based ...

Professor, Information Technology, Atharva College Of Engineering, Mumbai, India 5. Abstract: Big data .... To build REST API we will be using MVC architecture.

Jan 21, 2016 - Identify critical steps to make data useful for big data analytics. • Explore examples big data science research methods and lessons learned.

ZIP/POSTAL CODE. COUNTRY. EMAIL OF EACH ATTENDEE. BUSINESS PHONE ... Singapore. Big Data & Analytics for. Pharma. June 12 & 13. Philadelphia.

May 15, 2018 - Examine insights and connecting the dots between insights and results. Participating to a new era of Big Data World, witness the latest.

Abstract. In this talk, I will describe the key secular trends that characterize the field of Big Data with respect to enterprise analytics. I will describe some of.