Big Data Machine Learning and Graph Analytics Current State and Future Challenges
NIST Big Data Public Working Group IEEE Big Data Workshop October 27, 2014 Prof. Howie Huang The George Washington University
[email protected] Overview • • • •
Objectives Approach Progress Next Steps
2 | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends
Big Data Call For Big Ideas
3 | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends
Lambda Architecture: Combining History and Real-Time Data Streaming Data Incoming Data
Stream Processing Fast
Small
Query Slow
Big Historical Data
Batch Processing
• Batch Processing for Historical Data • Stream Processing for Real-Time Data • Reply the query by merging the insights of both • Trends on Twitter 4 | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends
Big Data Call For Next-Generation Computer Systems • Big Data need computer systems that achieve: High Performance, Scalability, Reliability, Security, Energy efficiency… • Software Challenges – Cloud Operating Systems File and storage systems • Virtualization/Hypervisor •
– Algorithms
• Hardware Challenges – Multicore processors and Graphics Processing Units (GPU) – Emerging memory technology: Flash, Phase-Change Memory, etc. | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends
5
Building High-Performance Storage Systems in Data Centers • Flash-based Solid-State Drive (SSD) – Amdahl Blades [HotPower’09] Energy-efficient scalable server architecture SC’09 High Perf Storage Challenge Finalist – Performance Modeling [MSST’11] – Flashy Prefetching [MSST’12] – Hypervisor-managed Non-Volatile Memory in Cloud data centers [CLOUD’14, VEE’14] • Phase-Change Memory – RePRAM: Recycling PCM [DSN’12] – Lifetime enhancement [PACT’11] Best Poster Award – Energy-aware writes [HotPower’11] 6 | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends
Developing High-Performance Algorithms • Merging Insights is Challenging • WordCount – Merging key-value pairs by adding the values of corresponding keys
• Breath-First Search (BFS)
Key
Value
ISIS
Key
Value
700
Ebola
800
Ebola
200
ISIS
730
Russia
10
Apple
200
Key
Value
Russia
10
Ebola
600
ISIS
30
Apple
200
6
7
– One newly inserted vertex may totally change the levels of all vertices that connected to it 1. Depth changes, e.g., vertex 5 2. Parent and child swapping, e.g., 6 and 7 3. Complexity 7 | 10/27/14
8 2
4
1
3
0
Howie Huang, NIST Panel of Big Data Future Trends
5
Pursuing Hardware and Software Innovations • Graph traversal is of great importance to cybersecurity, social networks, medical informatics, etc. • Our GPU-based BFS system achieves – No. 1 on Green Graph 500 list (Small Data) – No. 43 on Graph 500
• Make GPU and algorithmic innovations
8 | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends
Conclusion • Big data computing systems strive to achieve – Batch processing: Volume – Stream processing: Velocity – Both also need to address Variety and Veracity
• Lambda Architecture is a step towards this goal • Challenges and opportunities: – Leverage hardware advances, e.g., computation accelerators and nonvolatile memory – Sustain software innovations
9 | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends
Acknowledgements
• • •
NSF CAREER Award 2014 and grants 1350766, 1124813, 0937875 NVIDIA Academic Partnership Award 2011 IBM Real Time Innovation Faculty Award 2008
10 | 10/27/14
Howie Huang, NIST Panel of Big Data Future Trends