Community Office Hours

Running Presto with Alluxio on Amazon EMR

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic. 


In this Office Hour I'll go over:

  • How to set up Alluxio with the EMR stack so that Presto jobs can seamlessly read from and write to S3
  • Compare the performance between Presto on EMR with Presto and Alluxio on EMR
  • Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

Interested in learning more? 


Speaker: Alex Ma

Director of Solutions Engineering at Alluxio

Alex Ma is an open source veteran. Prior to Alluxio, he worked for Couchbase, where he was the Director of Solutions Engineering and Principal Architect. 

Speaker: Nakkul Sreenivas

Software Engineer at Alluxio

Prior to Alluxio, Nakkul worked as a consultant where he built and supported an entirely open source Hadoop platform for financial services clients.

Alluxio is...

...a data orchestration layer for compute in any cloud. It unifies data silos on-premise and across any cloud to give you data locality, accessibility, and elasticity.


Whether it’s accelerating big data frameworks on the public cloud, running big data workloads in hybrid cloud environments, or enabling big data on object stores or multiple clouds, Alluxio reduces the complexities associated with orchestrating data for today’s big data and AI/ML workloads.