Instant Access

New! Instant Hands-On Access to a Cazena Data Lake - Login with LinkedIn Credentials

Try Cazena Now

Video: Dan Stair’s ODSC East 2017 Presentation, “Building a near-real-time Data Pipeline in the Cloud”

In this video from the Open Data Science Conference in 2017, Cazena Senior Engineer Dan Stair shared a technical talk about real time data pipelines. For more recent insight, please visit the Cazena blog. 

 

Abstract from Dan’s 2017 talk: Learn more about how we built, tested and delivered a near-real-time data pipeline using Apache Spark in the cloud in two weeks — and still saw our families. We faced a looming deadline, and real-time analytics requirements. Using a cloud-based platform with Spark and Impala running on Microsoft Azure, and armed with a few hundred lines of Python code, we designed, tested and deployed an end-to-end data pipeline and analytics infrastructure in two weeks. The project had its challenges, both technical and operational; learn what we learned and our tips for success.


Related Resources