By Prat Moghe, Cazena Founder
Data Lakes are too hard for most enterprises.
Data Lakes are a foundational capability to get fast access to data and enable AI/ML, data engineering, and all analytics, but they require specialized expertise that is hard to find and retain.
Enterprises are discovering that Cloud data lakes in production require Architects, Cloud Ops, Data Ops and SecOps (see my DevOps Drag blog to learn more about this.) Data Lakes also take a long time, typically 6+ months of upfront development followed by ongoing management, support and optimization as the stack keeps evolving.
The early adopters, digital natives and sophisticated enterprises, have been able to build and manage data lakes, yet the majority of Global 8000 enterprises lack these skills. Worse, the gap is widening, because of the dearth of available DevOps skills.
We’re finally making data lakes easy.
For the past five years, the Cazena team has been developing the first SaaS Data Lake. Cazena’s Data Lake “as a Service” aims to simplify data lakes, so that they can be used by all enterprises, big or small, without requiring specialized skills.
Enterprises that have been using the SaaS Data Lake have radically simplified their operations and accelerated results. They have experienced production-ready data lakes and analytic outcomes within 2 – 4 weeks – a 10X acceleration over their DIY efforts. They have been able to expand and scale their deployments, without adding new DevOps expertise.
As an example, one enterprise has launched multiple digital products, driven by 100+ data scientists and data engineers globally distributed, using a variety of data engineering and ML tools, all powered by the SaaS Data Lake as a Service supported by one data architect. Cazena’s Data Lake as a Service is now deployed for many enterprises across multiple regions on AWS and Microsoft Azure. Cazena monitors and manages over one million analytic workloads for enterprises each month. (See an Engineering blog with metrics).
Data Lake Evolution: On-Prem, Cloud PaaS to SaaS
The SaaS model is a major leap forward for data lakes. This infographic shows the evolution of Data Lakes from on-prem clusters, to cloud PaaS, and now to SaaS Data Lakes. The three generations of data lakes have had distinct characteristics.
1. On-premises data lakes were notoriously hard to deploy and manage, as the technology stack was still maturing. Few enterprises could get sufficient value from them – often because access by data scientists and business users was too hard. These DIY data lakes needed large administrative teams for ongoing management, typically required 9-12 months for deployment and had significant operating expenses.
2. With the maturing of Cloud PaaS (platform-as-a-service), cloud data lakes have emerged as an alternative over the past 3 – 4 years. While these offerings reduce the complexity of the compute infrastructure, cloud data lakes require significant DevOps expertise around cloud, security and modern data ops – skills that majority of enterprises lack.
3. The SaaS Data Lake is a third generation offering that addresses the skills shortage with a SaaS model requiring zero DevOps effort for deployment and ongoing operations. The SaaS Data Lake is typically 10X faster to deploy in production, with typical outcomes in 2-4 weeks vs. 6+ months for cloud DIY. SaaS data lakes are also strategic use of resources, since they embed the best innovation around IaaS and PaaS stack, require no additional skills, reduce costs and help existing teams scale with automation.
For a good definition of SaaS vs. PaaS vs. Managed Services, refer to research and insight from Matt Aslett, at 451 Research.
To experience Cazena’s Instant Cloud Data Lake, please contact us, check out options to try it now — or explore the website to learn more. Next up, we’re working on a blog to describe our SaaS Data Lake Architecture. Stay tuned!