Tech Talk | Integrating Google Cloud Dataproc with Alluxio for faster performance in the cloud


Google Cloud Dataproc is a widely used fully managed Spark and Hadoop service to run big data analytics and compute workloads in the cloud. Services like Dataproc reduce hardware spend, eliminate the need to overbuy capacity, and provide business agility. Yet users still face challenges for performance sensitive workloads or workloads running on remote data. 


Alluxio is an open source cloud data orchestration platform that increases performance of analytic workloads running on Dataproc by intelligently caching data and bringing back lost data locality. Alluxio also enables users to run compute workloads against on-prem storage like Hadoop HDFS without any app changes. 


Join us for this tech talk where Chris Crosbie and Roderick Yao from the Google Dataproc team and Dipti Borkar of Alluxio will demo how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. They’ll also show how to run Dataproc Spark against a remote HDFS cluster. 


Get access to the on-demand tech talk!

Strategic Cloud Engineer

Speaker: Roderick Yao

Speaker: Christopher Crosbie

Product Manager, Google

Speaker: Dipti Borkar

VP, Product and Marketing

Dipti Borkar is the VP of Product & Marketing at Alluxio with over 15 years experience in data and database technology across relational and non-relational. Prior to Alluxio, Dipti was VP of Product Marketing at Kinetica and Couchbase. Dipti holds a M.S. in Computer Science from the UC San Diego, and an MBA from the Haas School of Business at UC Berkeley.

Roderick Yao is a Strategic Cloud Engineer at Google. His focus is designing innovative solutions for Google Cloud customers to build and manage data pipelines and data migration to Google. Prior to Google, he was a Senior Solutions Consultant at Cloudera and drove solution architecture helping Fortune 500 companies with their Hadoop Deployment. Roderick has a Bachelor’s degree from South China University of Technology and a Master’s degree from Bentley College - Elkin B. McCallum Graduate School of Business.

Chris has been building and deploying data and analytics applications for the past 15+ years and is currently a Product Manager at Google, focused on building open source data and analytics tools for the Google Cloud platform.Chris came to Google from Amazon where he held two different positions. The first was a solutions architect for AWS, where he was awarded the 2015 solutions architect of the year distinction. The second and more recent position was as a Data Engineering Manager for an R&D group known as “Grand Challenges”. Previous to joining Amazon, he headed up the data science team at Memorial Sloan Kettering Cancer Center where he managed a team of statisticians and software developers. He started his career as a software engineer at the NSABP, a not-for-profit clinical trials cooperative group supported by the National Cancer Institute. He holds an MPH in Biostatistics and an MS in Information Science.