Earlier this week, our founder Prat authored an article on InsideBigData about planning for data lakes. He shared five big questions to consider. Experience tells us this is necessary. Far too many people jump into data lakes hastily, and not just as a justification to overuse water-related puns. It’s more often an excuse to splash around with Hadoop, which is an understandable impulse given all the hype. But three months into it, teams are often drowning, and worse, not delivering value. That’s why earlier this year a brow-raising research note on data lakes from Gartner, Inc. stated:
“Through 2018, 90% of deployed data lakes will become useless as they are overwhelmed with information assets captured for uncertain use cases.”
Hence Prat’s emphasis on the importance of planning. But even those who plan have faced a consistent water hazard, outlined succinctly by Gartner:
“Build a technology pipeline and organizational structure for moving discoveries out of the data lake. Discoveries made in the data lake often can’t stay there.”
This is a critical point. Long before big data and data lakes, analytics suffered from silos. As a tech reporter over a decade ago, I heard repeatedly that the biggest challenge for business intelligence and analytics programs was not the software or data (though those are hard.) It was getting organizations to effectively use the insights uncovered by analytics projects. In many enterprises, the problem persists.
Part of the difficulty may be operational or cultural. But companies also face major technical barriers. It’s hard to move workloads from discovery-oriented data lakes into production-grade systems. Subsets of data may need to flow to your data warehouse, or new workloads may need an operational data mart. Teams must construct data lakes with the expectation that data discoveries will regularly move to other systems. That’s a good thing! Start with the assumption that you will make many useful discoveries. If you don’t think that…why are you building a data lake in the first place?
Architecting a data lake is challenging, but Cazena has you covered in two ways – whether you plan to build or buy.
Build: If you want to DIY with Hadoop, Prat will be giving a tutorial at Strata Hadoop World next week. “Building a Production-ready Data Lake in the Cloud” includes lots of useful information, and a section on data movement. We gained lots of experience here, we’ll have our techies present, and we won’t hold back.
Buy: Consider Data Lake as a Service from Cazena, which includes data movement and tool connectors, and have the project done in days. Leaving plenty of time to go fishing, swimming or whatever other water-related recreational activity you prefer!