Data lakes may be overhyped, but they clearly represent a new opportunity for enterprise analytics.
The danger is that:
“By its definition, a data lake accepts any data, without oversight or governance. Without descriptive metadata and a mechanism to maintain it, the data lake risks turning into a data swamp.”
There are real advantages to data lakes. For example, you don’t have to define constraining data structures up front. The job of data alignment can then be done as and when needed, and pushed to the people that know it best — the business people who want to do the analysis. IT can then concentrate on making sure that there is as much data as possible at a reasonable price.
However, some proponents of data lakes make them sound like magical solution that fix all known analytic ills. The hardest problem in analytics has always been data quality and integration across many different business uses of information. Data lakes help, but there’s no such thing as a magic bullet. In particular, they don’t eliminate the need for “data warehousing” since that is ultimately a business problem, not only a technical one.
As ever, it’s not just about the technology. Businesses that want to succeed in analytics also need to work hard on the organizational structures, processes and cultures that lead to the correct data in the first place.