Seminars > Berkeley Data Analytics Stack: Introducing Tachyon
Abstract
The Berkeley Data Analytics Stack (BDAS) is an open source software stack
that integrates software components being built by the AMPLab to make sense
of Big Data. Many systems in the stack provide orders of magnitudes better
performance over other big data analytics tools, such as Hadoop. Today,
BDAS' components are being used by numerous companies and institutions. In
this talk, I will present an overview of the BDAS with a focus on Tachyon, a
distributed filesystem that provides reliable file sharing at memory-speed
across cluster frameworks.
Bio
Haoyuan Li is a Computer Science PhD student in the AMP Lab at UC Berkeley,
working with Prof. Scott Shenker and Prof. Ion Stoica focusing on computer
systems and big data research. During his Ph.D. study, he works on various
components of BDAS. In particular, he leads the Tachyon project and
co-created DStream (Spark Streaming). He is also a founding committer of
Apache Spark. Before Berkeley, he worked at Conviva and Google on big data
processing. Previous work PFP algorithm has been adopted by Apache Mahout.
Haoyuan holds a M.S. from Cornell University and a B.S. from Peking
University.