ETL and Information Governance Discussions From DSLayer


If you’re interested in SAP, BusinessObjects, and everything to do with Analytics, I encourage you to check out the Diversified Semantic Layer site (tagline: “unprofessional journalism at its finest”!). It’s run by a bunch of analytics veterans from around the English-speaking globe, featuring quirky (i.e. sometimes completely off-topic) discussions of the latest and greatest (and somewhat geeky) events in the analytics world.

I subscribe to the interviews as an audio podcast and listen to them on my morning run or evening shopping errands (full disclosure: they bought my goodwill with a DSLayered Bowling Shirt). I often find myself, somewhat inappropriately, arguing with them out loud and startling passers-by.

In the event you are passionate enough about analytics to hear even more about it in your spare time, here are a couple of recent shows that I found interesting:

Is ETL Dead?

The DS Layer crew Eric Vallo, Greg Myers, Jamie Oswald, Clint Vosloo, and Josh Fletcher (Dallas Marks couldn’t make it) discussed the topic of “doesn’t there seem to be less need for ETL these days?” Sadly, Clint Vosloo had the most to say – and the worst bandwidth, so his points kept getting cut off.

Topics covered:

  • ETL is by far the most arduous, long, costly process involved in business intelligence.
  • With new technologies like Hadoop and in-memory there’s less need to move data around.
  • But you’ll never find a company that has just one system, so some data movement is always required.
  • ETL is not “dying” – but instead “evolving” [or transforming?]
  • SAP Data Services is very flexible and fast – any delays are because of the source or target system.
  • In the future, the “transform” can be pushed into in-memory systems like HANA. Faster, but you could still end up with a very complex, unmanageable HANA view that only one person understands.
  • Real change will come when apps are built from the ground up to optimize HANA use (no data copying needed – e.g. if you have a brand new ERP on HANA, maybe you don’t need BW or a data warehouse – and associated data movements — at least initially)
  • The problem is always the quality of the source system. If that’s fine, then replication may be enough.
  • SAP seems to be the only vendor pitching in-memory apps for both operations and analytics (as opposed to in-memory for analytics)
  • One thing is clearly going to change: the notion of a once-a-day ETL load. Can now do it in more continuous ways.
  • Federation (e.g. smart data access in HANA) is a great option for flexibility – but you’ll want to make a physical copy sometimes.
  • Conclusion: ETL isn’t dead – but it’s going to change a lot, especially for SAP customers.

The Need for Information Governance

This show featured “I see Data Quality People” Clint Vosloo, and “I’m all about the Quality” Josh Fletcher following on the ETL discussion with a discussion of data quality and information governance. Warning: you will need a high tolerance for kids-in-the-background noise.

Topics covered:

  • Data quality is a business issue – IT can only help show the problem
  • Funding is often the biggest issue in a data quality project. Using fear can be a useful tactic – reminding people that the financial results are based on dubious figures, or that figures reported to government may be incorrect. If there are doubts about a figure, copying the CFO on the discussion may help (but may lose you some friends)
  • Using a tool like SAP Explorer to quickly show incorrect or null values – and how much money is attached to them – can be powerful.
  • Business people tend to be oblivious to data quality issues, because they only see a small subset of the data, unlike IT.
  • Humans create bad data. One way to get better data is to improve incentives, pay people for quality of data, not just quantity.
  • A real information governance strategy takes experts
  • Organizations like retailers are realizing that the ability to cross-sell requires better quality information
  • SAP Information Steward makes it easy to track data quality metrics over time, identify areas where there are big business benefits – which makes it much easier to get the momentum to fix the problem.

I hope you enjoy the shows and you can follow DSLayer on twitter and ask questions using the #AskDSL hashtag