Community Online Office Hour

Hands-on with Alluxio Structured Data Management

Users deploy Alluxio in a wide range of use cases from analytics to AI platforms, for Alluxio's unified access to data and transparent caching for acceleration. However, many frameworks are SQL engines, like Presto, Apache Spark SQL, or Apache Hive, and consume data structured as tables of rows and columns. Since Alluxio is commonly used as a filesystem of files and directories, there is a mismatch between how Alluxio exposes data (files, directories), and how SQL engines deal with data (tables, rows, columns). This gap creates various challenges and inefficiencies.

Therefore, in the Alluxio 2.1 release, we introduce Alluxio Structured Data Management, which is a new set of services that enables structured data applications to interact with data more efficiently. The new services include the catalog service and a transformation service, which all work together to bridge the gap between storage and SQL engines and enable physical data independence.

In this office hour, we introduce the concepts and components of Alluxio Structured Data Management, and go through a demo with Presto.

In this Office Hour, we will go over:

  • Introduction and motivation of Alluxio Structured Data Management
  • Overview of the different services of Alluxio Structured Data Management in Alluxio 2.1
  • A demo of using Alluxio Structured Data Management with Presto

Interested in learning more? 

Get access to the on demand video

Speaker: Bin Fan

Evangelist and Founding Member at Alluxio

Bin Fan is the founding engineer of Alluxio, Inc. and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google where he won the Technical Infrastructure Award. Bin received his Ph.D. in Computer Science from Carnegie Mellon University working on distributed systems

Speaker: Gene Pang

Founding Member, Software Engineer at Alluxio

Gene Pang is the PMC Maintainer of the Alluxio open source project and a founding member of Alluxio, Inc. He graduated with a Ph.D. from the AMPLab at UC Berkeley, working on distributed database systems. Before starting at Berkeley, he worked at Google and has an M.S. from Stanford University, and a B.S. from Cornell University.

Alluxio is...

...a data orchestration layer for compute in any cloud. It unifies data silos on-premise and across any cloud to give you data locality, accessibility, and elasticity.

Whether it’s accelerating big data frameworks on the public cloud, running big data workloads in hybrid cloud environments, or enabling big data on object stores or multiple clouds, Alluxio reduces the complexities associated with orchestrating data for today’s big data and AI/ML workloads.