Howie Huang, GWU, US - NIST Big Data Working Group

Report 6 Downloads 282 Views
Big Data Machine Learning and Graph Analytics Current State and Future Challenges

NIST Big Data Public Working Group IEEE Big Data Workshop October 27, 2014 Prof. Howie Huang The George Washington University [email protected]

Overview • • • •

Objectives Approach Progress Next Steps

2 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Big Data Call For Big Ideas

3 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Lambda Architecture: Combining History and Real-Time Data Streaming Data Incoming Data

Stream Processing Fast

Small

Query Slow

Big Historical Data

Batch Processing

• Batch Processing for Historical Data • Stream Processing for Real-Time Data • Reply the query by merging the insights of both • Trends on Twitter 4 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Big Data Call For Next-Generation Computer Systems • Big Data need computer systems that achieve: High Performance, Scalability, Reliability, Security, Energy efficiency… • Software Challenges – Cloud Operating Systems File and storage systems • Virtualization/Hypervisor •

– Algorithms

• Hardware Challenges – Multicore processors and Graphics Processing Units (GPU) – Emerging memory technology: Flash, Phase-Change Memory, etc. | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

5

Building High-Performance Storage Systems in Data Centers • Flash-based Solid-State Drive (SSD) – Amdahl Blades [HotPower’09] Energy-efficient scalable server architecture SC’09 High Perf Storage Challenge Finalist – Performance Modeling [MSST’11] – Flashy Prefetching [MSST’12] – Hypervisor-managed Non-Volatile Memory in Cloud data centers [CLOUD’14, VEE’14] • Phase-Change Memory – RePRAM: Recycling PCM [DSN’12] – Lifetime enhancement [PACT’11] Best Poster Award – Energy-aware writes [HotPower’11] 6 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Developing High-Performance Algorithms • Merging Insights is Challenging • WordCount – Merging key-value pairs by adding the values of corresponding keys

• Breath-First Search (BFS)

Key

Value

ISIS

Key

Value

700

Ebola

800

Ebola

200

ISIS

730

Russia

10

Apple

200

Key

Value

Russia

10

Ebola

600

ISIS

30

Apple

200

6

7

– One newly inserted vertex may totally change the levels of all vertices that connected to it 1. Depth changes, e.g., vertex 5 2. Parent and child swapping, e.g., 6 and 7 3. Complexity 7 | 10/27/14

8 2

4

1

3

0

Howie Huang, NIST Panel of Big Data Future Trends

5

Pursuing Hardware and Software Innovations • Graph traversal is of great importance to cybersecurity, social networks, medical informatics, etc. • Our GPU-based BFS system achieves – No. 1 on Green Graph 500 list (Small Data) – No. 43 on Graph 500

• Make GPU and algorithmic innovations

8 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Conclusion • Big data computing systems strive to achieve – Batch processing: Volume – Stream processing: Velocity – Both also need to address Variety and Veracity

• Lambda Architecture is a step towards this goal • Challenges and opportunities: – Leverage hardware advances, e.g., computation accelerators and nonvolatile memory – Sustain software innovations

9 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Acknowledgements

• • •

NSF CAREER Award 2014 and grants 1350766, 1124813, 0937875 NVIDIA Academic Partnership Award 2011 IBM Real Time Innovation Faculty Award 2008

10 | 10/27/14

Howie Huang, NIST Panel of Big Data Future Trends

Recommend Documents