Virtual Community Event | Alluxio Day IV
Save your spot
Thursday, June 249AM PT
Join us for our 4th Alluxio Day community virtual event featuring speakers from Facebook, TikTok, Tencent, and Intel.
We're looking forward to these great talks!
- 9:00 am - Alluxio for Machine Learning Workloads (Alluxio)
- 9:30 am - Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory (Intel)
- 10:00 am - Building a 10X Faster Presto with Hierarchical Cache (Facebook)
- 10:30 am - Improving Presto performance with Alluxio at TikTok
- 11:00 am - Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 minutes (Tencent)
Interested in learning more?
Big Data Engineer
VP Of Open Source
Tech Lead, Data Platform
Alluxio for Machine Learning Workloads (Alluxio)
The core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface. This effort has made a lot of progress with the collaboration with engineers from Microsoft, Alibaba and Tencent. Particularly, we have introduced a new JNI-based FUSE implementation to support POSIX data access, created a more efficient way to integrate Alluxio with FUSE service, as well as many improvements in relevant data operations like more efficient distributedLoad, optimizations on listing or calculating directories with a massive amount of files, which are common in model training.
Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory (Intel)
Today’s analytics workloads demand real-time access to expansive amounts of data. This session demonstrates how Alluxio’s data orchestration platform, running on Intel Optane persistent memory, accelerates access to this data and uncovers its valuable business insights faster.
Building a 10X Faster Presto with Hierarchical Cache (Facebook)
RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce the hierarchical cache work including Alluxio data cache, fragment result cache, etc. Cache is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat performance oriented connectors like Raptor with the added benefit of continuing to work with disaggregated storage.
Improving Presto performance with Alluxio at TikTok
Nowadays it is not straightforward to integrate Alluxio with popular query engines like Presto on existing Hive data. Solutions proposed by the community like Alluxio Catalog Service or Transparent URI brings unnecessary pressure on Alluxio masters when querying files should not be cached. This talk covers TikTok’s approach on adopting Alluxio for the cache layer without introducing additional services.
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 minutes (Tencent)
Alluxio has an excellent metrics system and supports various kinds of metrics. This talk will focus on: how the Alluxio metrics system works; how to implement a custom metric sink of Alluxio; and how to quickly set up an Alluxio monitor based on Prometheus and Grafana.
...a data orchestration layer for compute in any cloud. It unifies data silos on-premise and across any cloud to give you data locality, accessibility, and elasticity.
Whether it’s accelerating big data frameworks on the public cloud, running big data workloads in hybrid cloud environments, or enabling big data on object stores or multiple clouds, Alluxio reduces the complexities associated with orchestrating data for today’s big data and AI/ML workloads.