注册 / 登录

The Evolution Path of Spark/Hadoop on the Cloud

分会场:  爆款架构/数据平台/工程实践

 

案例来源 :

案例讲师

姚依非

Google senior software engineer

Graduated from Carnegie Mellon University, Yifei has worked on multiple large-scale services and platforms at Amazon, Apple and Google.
Yifei is currently a senior software engineer on the Google Cloud Dataproc team. Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning on Google Cloud Platform. Prior to joining Google, he has worked on centralized machine learning platform at Apple and marketplace platform at Amazon.

扫描二维码分享案例

 

案例简述

 

In this talk, I will introduce how Google is able to integrate open-sourced data processing frameworks such as Hadoop and Spark within the entire cloud platform. We will go through details of service components and ways to best integrate the frameworks into cloud ecosystem. We will also discuss improvements and features that would bring the best of Hadoop/Spark and cloud to achieve optimal performance.

 

案例目标

 

The amount of data that is generated everyday by the users and the services have grown exponentially throughout the years. In this case, it means that huge amounts of data will need to be processed in the cloud, and it will put a great pressure on the cloud analytics systems. The goal of this talk is to showcase the integration of Hadoop/Spark on the cloud platform, and discuss the ways to evolve the architecture to provide better performance and experience.

 

成功(或教训)要点

 

Improved performance and reliability with auto scaling, high availability and storage connectors. Better user experience with resizable cluster, metrics tools, and easy ML integrations.

 

案例ROI分析

 

By successfully integrating Hadoop/Spark clusters on Google Cloud, we enabled many large customers to be able to successfully migrate their data processing pipeline and workloads onto the cloud. The improved performances and tight integration with other cloud products gave us the performance and user experience edge over other solutions.

 

案例启示

 

In this talk, I will deep dive into the following topics:
1.Overview the solution of Google for large volume data set

  • Mission, history, and features

2.Platform Architecture

  • Service components and open-source softwares
  • Clusters and Jobs
  • Integration with Cloud
    >Storage, Metrics

3.Challenges

  • Improvements and Features
  • High Availability
  • Workflow
  • Autoscaling

4.Machine Learning capabilities

5.Summary

 

案例在团队中的意义

 

分享如何有效的结合大数据处理系统hadoop/spark和云端系统,如何达到最有效和最高性能的结合。

 

领取大会PPT

我要参会

大会全套演讲PPT

立即领取

大会即将开幕,点击抢票!

我要参会