Virtual Community Event  |  Alluxio Day IV

Save your spot

_______________
Thursday, June 249AM PT

Join fellow Alluxio community users for the 4th Alluxio Community Day virtual event. This event features speakers from Facebook, TikTok, Tencent, and Intel.


 We're looking forward to these great talks! 


  • 9:00 am - Alluxio for Machine Learning Workloads (Alluxio)
  • 9:30 am - Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory (Intel)
  • 10:00 am - Building a 10X Faster Presto with Hierarchical Cache (Facebook)
  • 10:30 am - Improving Presto performance with Alluxio at TikTok
  • 11:00 am - Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 minutes (Tencent)


Interested in learning more? 


Speakers

Ginger Gilsdorf

Intel

Software Engineer

Rohit Jain

Facebook

 Software Engineer

Pan Liu

Tencent 

Big Data Engineer

Bin Fan

Alluxio

Founding Engineer, 

VP Of Open Source

Frank Hu

TikTok

 Tech Lead, Data Platform

Lu Qiu

Alluxio

 Software Engineer

Talk Information

Alluxio for Machine Learning Workloads (Alluxio)

The core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface. This effort has made a lot of progress with the collaboration with engineers from Microsoft, Alibaba and Tencent. Particularly, we have introduced a new JNI-based FUSE implementation to support POSIX data access, created a more efficient way to integrate Alluxio with FUSE service, as well as many improvements in relevant data operations like more efficient distributedLoad, optimizations on listing or calculating directories with a massive amount of files, which are common in model training. 


Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory (Intel)

Today’s analytics workloads demand real-time access to expansive amounts of data. This session demonstrates how Alluxio’s data orchestration platform, running on Intel Optane persistent memory, accelerates access to this data and uncovers its valuable business insights faster.


Building a 10X Faster Presto with Hierarchical Cache (Facebook)

RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce the hierarchical cache work including Alluxio data cache, fragment result cache, etc. Cache is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat performance oriented connectors like Raptor with the added benefit of continuing to work with disaggregated storage.


Improving Presto performance with Alluxio at TikTok

Nowadays it is not straightforward to integrate Alluxio with popular query engines like Presto on existing Hive data. Solutions proposed by the community like Alluxio Catalog Service or Transparent URI brings unnecessary pressure on Alluxio masters when querying files should not be cached. This talk covers TikTok’s approach on adopting Alluxio for the cache layer without introducing additional services.


Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 minutes (Tencent)

Alluxio has an excellent metrics system and supports various kinds of metrics. This talk will focus on: how the Alluxio metrics system works; how to implement a custom metric sink of Alluxio; and how to quickly set up an Alluxio monitor based on Prometheus and Grafana.

Alluxio is...

...a data orchestration layer for compute in any cloud. It unifies data silos on-premise and across any cloud to give you data locality, accessibility, and elasticity.


Whether it’s accelerating big data frameworks on the public cloud, running big data workloads in hybrid cloud environments, or enabling big data on object stores or multiple clouds, Alluxio reduces the complexities associated with orchestrating data for today’s big data and AI/ML workloads.