Online Meetup | Speeding Up I/O for Machine Learning 

Get access to the on demand video

feat. Apple Case Study using Tensorflow, NFS, DC/OS, and Alluxio

Data scientists or platform engineers often face the following challenge when the input data for machine learning jobs are stored in remote storage like NFS or cloud storage like S3. Making direct data access is slow, unstable and expensive; manually duplicating data to the training clusters also introduces large overhead, complicated data curation and often requires engineers to build ETL pipelines.

This talk will guide the audience on how Alluxio can greatly simplify the data preparation phase in with remote and possibly multiple data sources. We will share the lessons and benchmark from Bill Zhao an engineer led in Apple when building a Machine Learning platform using Tensorflow, NFS, DC/OS and Alluxio. 

In this online meetup, you will learn about:

  • When Alluxio can help for machine learning platform;
  • How to setup and create POSIX endpoint for Alluxio service to unify the file system data access to S3, HDFS and Azure blob storage;
  • How to run TensorFlow to train models backed by accessing remote input data like access local file system.

Interested in learning more? 

Speaker: Bill Zhao

Tech Lead, Deep Reinforcement Learning Researcher at Apple

Bill Zhao is a technical leader in large-scale workloads w/ General Purpose GPU, such as distributed deep-learning, deep reinforcement learning, and big data analytics. Prior to Apple, he was a big-data researcher at UC Berkeley AMP Lab under the supervision of David Patterson, Ion Stoica, Anthony Joseph, and an early contributor to widely used datacenter software such as Apache Mesos, Spark, and Alluxio. Plus, he helps Stanford DAWNBench, an ML performance benchmark and an early contributor to industry-standard MLPerf benchmark. Bill holds an MS/BA degree in Computer Science from the University of California at Berkeley.

Speaker: Bin Fan

Founding Engineer & VP of OS, Alluxio

Bin Fan is the founding engineer and VP of Open Source at Alluxio, Inc. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems.

Alluxio is...

...a data orchestration layer for compute in any cloud. It unifies data silos on-premise and across any cloud to give you data locality, accessibility, and elasticity.

Whether it’s accelerating big data frameworks on the public cloud, running big data workloads in hybrid cloud environments, or enabling big data on object stores or multiple clouds, Alluxio reduces the complexities associated with orchestrating data for today’s big data and AI/ML workloads.