Community Online Office Hour
Traditional data lakes were established on-premises and have complex workflows spanning different business units. Infrastructure on-premises is stressed and the total cost of infrastructure is rising. At the same time, new and unanticipated workloads are rapidly being onboarded both for data analytics and AI. A public cloud promises a fully managed and elastic infrastructure, but the costs quickly start to add up as we scale in this scenario as well. A hybrid data lake approach enables organizations to get the best of both worlds as a means to reducing infrastructure costs.
In this talk, we describe the architecture to migrate analytics workloads incrementally to any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) directly on on-prem data without copying the data to cloud storage.
In this Office Hour, we will go over:
- An architecture for running elastic compute clusters in the cloud using on-prem HDFS.
- Impact of data locality on performance and operation costs
- Use of policies as a way to lower infrastructure costs
Interested in learning more?
Save your spot
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-prem storage
Tuesday, September 29
Adit Madan is a product manager at Alluxio. He is also a core maintainer and PMC member of the Alluxio Open Source project. Before joining Alluxio he was a research engineer at Hewlett-Packard Laboratories. His experience is in distributed systems, storage systems, and large scale data analytics. He has an M.S. from Carnegie Mellon University and a B.S. from IIT.
Co-head of Architect & Founding Engineer, Alluxio
Speaker: Adit Madan
...a data orchestration layer for compute in any cloud. It unifies data silos on-premise and across any cloud to give you data locality, accessibility, and elasticity.
Whether it’s accelerating big data frameworks on the public cloud, running big data workloads in hybrid cloud environments, or enabling big data on object stores or multiple clouds, Alluxio reduces the complexities associated with orchestrating data for today’s big data and AI/ML workloads.