Community Office Hours

Running Spark & Alluxio in Kubernetes

Kubernetes is widely used to orchestrate computation with improved flexibility and portability for computation in public or hybrid cloud environments across infrastructure providers. However, running data-intensive workloads introduces challenges such as efficiently moving data to compute frameworks, accessing data from multiple or remote clouds, and co-locating data with compute. Alluxio solves these problems as a new data orchestration layer bridging the gap between data locality with improved performance and data accessibility for analytics workloads in Kubernetes, and enables portability across storage providers.

In this Office Hour I'll go over:

  • Overview of Alluxio and the cloud use case with Spark in Kubernetes
  • How to set up Alluxio and Spark to run in Kubernetes
  • Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

Interested in learning more? 

Speaker: Adit Madan

Distributed Systems Engineer at Alluxio

Adit Madan is a core engineer at Alluxio. His experience is in distributed systems, storage systems, and large-scale data analytics. He has a M.S. from Carnegie Mellon University, and a B.S. from IIT.

Speaker: Bin Fan

Evangelist and Founding Member at Alluxio

Bin Fan is the founding engineer of Alluxio, Inc. and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google where he won the Technical Infrastructure Award. Bin received his Ph.D. in Computer Science from Carnegie Mellon University working on distributed systems

Alluxio is...

...a data orchestration layer for compute in any cloud. It unifies data silos on-premise and across any cloud to give you data locality, accessibility, and elasticity.

Whether it’s accelerating big data frameworks on the public cloud, running big data workloads in hybrid cloud environments, or enabling big data on object stores or multiple clouds, Alluxio reduces the complexities associated with orchestrating data for today’s big data and AI/ML workloads.