Big Data Machine Learning and Graph Analytics - NIST Big Data

Report 0 Downloads 247 Views

Oct 27, 2014 - values of corresponding keys. • Breath-First Search (BFS). – One newly inserted vertex may totally change the levels of all vertices that connected to it. 1. Depth changes, e.g., vertex 5. 2. Parent and child swapping, e.g., 6 and. 7. 3. Complexity. 0. 1. 2. 4. 3. 5. 8. 6. 7. 7. Key. Value. ISIS. 700. Ebola. 200.

Big Data Machine Learning and Graph Analytics Current State and Future Challenges

NIST Big Data Public Working Group IEEE Big Data Workshop October 27, 2014 Prof. Howie Huang The George Washington University [email protected]

Overview • • • •

Objectives Approach Progress Next Steps

2 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Big Data Call For Big Ideas

3 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Lambda Architecture: Combining History and Real-Time Data Streaming Data Incoming Data

Stream Processing Fast

Small

Query Slow

Big Historical Data

Batch Processing

• Batch Processing for Historical Data • Stream Processing for Real-Time Data • Reply the query by merging the insights of both • Trends on Twitter 4 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Big Data Call For Next-Generation Computer Systems • Big Data need computer systems that achieve: High Performance, Scalability, Reliability, Security, Energy efficiency… • Software Challenges – Cloud Operating Systems File and storage systems • Virtualization/Hypervisor •

– Algorithms

• Hardware Challenges – Multicore processors and Graphics Processing Units (GPU) – Emerging memory technology: Flash, Phase-Change Memory, etc. | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

5

Building High-Performance Storage Systems in Data Centers • Flash-based Solid-State Drive (SSD) – Amdahl Blades [HotPower’09] Energy-efficient scalable server architecture SC’09 High Perf Storage Challenge Finalist – Performance Modeling [MSST’11] – Flashy Prefetching [MSST’12] – Hypervisor-managed Non-Volatile Memory in Cloud data centers [CLOUD’14, VEE’14] • Phase-Change Memory – RePRAM: Recycling PCM [DSN’12] – Lifetime enhancement [PACT’11] Best Poster Award – Energy-aware writes [HotPower’11] 6 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Developing High-Performance Algorithms • Merging Insights is Challenging • WordCount – Merging key-value pairs by adding the values of corresponding keys

• Breath-First Search (BFS)

Key

Value

ISIS

Key

Value

700

Ebola

800

Ebola

200

ISIS

730

Russia

10

Apple

200

Key

Value

Russia

10

Ebola

600

ISIS

30

Apple

200

6

7

– One newly inserted vertex may totally change the levels of all vertices that connected to it 1. Depth changes, e.g., vertex 5 2. Parent and child swapping, e.g., 6 and 7 3. Complexity 7 | 10/27/14

8 2

4

1

3

0

Howie Huang, NIST Panel of Big Data Future Trends

5

Pursuing Hardware and Software Innovations • Graph traversal is of great importance to cybersecurity, social networks, medical informatics, etc. • Our GPU-based BFS system achieves – No. 1 on Green Graph 500 list (Small Data) – No. 43 on Graph 500

• Make GPU and algorithmic innovations

8 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Conclusion • Big data computing systems strive to achieve – Batch processing: Volume – Stream processing: Velocity – Both also need to address Variety and Veracity

• Lambda Architecture is a step towards this goal • Challenges and opportunities: – Leverage hardware advances, e.g., computation accelerators and nonvolatile memory – Sustain software innovations

9 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Acknowledgements

• • •

NSF CAREER Award 2014 and grants 1350766, 1124813, 0937875 NVIDIA Academic Partnership Award 2011 IBM Real Time Innovation Faculty Award 2008

10 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Recommend Documents
concern, you have to start paying attention,” he says. ... organization's most valuable assets—will read and ... this newsletter distributed as a key tool in raising .... (message transfer). ... higher-performance computing than R cur- ... deep n

Wal-Mart handles more than a million customer transactions each hour and imports those into databases estimated to contain more than 2.5 petabytes of data.

Sep 1, 2015 - The fourth paradigm is a term coined by Dr. Jim Gray in 2007.3 Data-intensive science, shortened to data science, refers to the conduct of data ...

Jun 27, 2013 - A big data analytics system obtains a plurality of manufac. _ turing parameters associated With a manufacturing facility. (21) Appl' NO" 13/929' ...

APPLIED BIG DATA ANALYTICS. A one week program for a working professional or a student with programming skills to learn data science tools and.

The big data analytics system identi?es ?rst real-time data from a plurality of data sources to store in memory-resident. (22) Filed: Jun. 27, 2013 storage based ...

Professor, Information Technology, Atharva College Of Engineering, Mumbai, India 5. Abstract: Big data .... To build REST API we will be using MVC architecture.

Abstract. In this talk, I will describe the key secular trends that characterize the field of Big Data with respect to enterprise analytics. I will describe some of.

This article intends to define the concept of Big Data and stress the importance of ... Keywords: Big Data, Big Data Analytics, Database, Internet, Hadoop project.