SAP HANA and Hadoop in the Cloud: Big Data At The Globe and Mail

hana-and-hadoop

Like most media organizations around the world, the Toronto-headquartered The Globe and Mail has struggled to make a profitable transition from physical newspapers to online journalism. But now a combination of Hadoop and SAP HANA in the cloud is helping make critical decisions about how and when to charge readers for online access to articles.

In print for 167 years, The Globe is Canada’s largest newspaper, with over 300 journalists covering national, international, business, technology, arts, entertainment and lifestyle news for around 3.5 million readers a week across the country. Over the last decade, the company has invested in comprehensive data gathering and analysis systems, starting with SAP ERP in 2002 and a full enterprise data warehouse using SAP BW in 2007.

In early 2012, data analysis became an urgent business priority because of the company’s paywall project. The company knew casual readers were coming to the web site and needed to work out how many articles the company should allow them to read before asking them to pay.

Sandy Yang, a functional analyst at the Globe and Mail, explained the problem: “If we set the bar too high, we won’t have enough people to pay for our content, but if too low, they might never come back, and then we might lose a big chunk of our advertising revenue.” The ideal solution be where “some people shouldn’t even know we have a pay wall, but some should think that their money is well spent. We need to find the right balance by analyzing user behavior.”

The company uses Omniture to get insight into which articles readers are interested in and key statistics such as the number of page views per period or unique visits per period per section. But answering more complex – and important – questions required further analysis on the raw clickstream data. The internal IT teams first tried to import the web data from Omniture into a traditional relational database. But the data was complicated, stored in tab delimited text files with millions of lines, each having around 500 fields, and was growing at a rate of several gigabytes a day. The company turned to Hadoop to process the web data, but wasn’t ready to buy and maintain its own servers, so used Amazon’s Elastic MapReduce Architecture and stored the results in Amazon S3.

But that didn’t solve all the analysts’ problems. Yang explains: “The result is a whole lot of numbers. Every time a job finished, I had to add column headers and reformat the data to explain what it meant. And as soon as I handed it over to the business, they said ‘OK, it looks good, but what if….?’ I had to explain that it was a batch process, and that I couldn’t drill down and give the answer immediately. I hated to have to answer like that, but I didn’t have better options.”

globeandmail5

Figure 1: The Globe and Mail paywall project architecture, featuring Hadoop on Amazon AWS and SAP HANA

Then Yang discovered SAP HANA One, a version of the company’s new in-memory platform that runs in the Amazon Cloud. “It was so simple we didn’t think it was an SAP product!  HANA One bridged the gap between our inexplicable big data and our incredibly creative business people”. The speed of the product met expectations: “the real-time aspect instead of batch processing was delivered as advertised.” And HANA One came at the right price: “The total cost was just $3.50 an hour for visualization of data, instant response from user requests and more.”

Setting up the system took less than four part-time days. Yang demonstrated it to the marketing teams who were instantly impressed with the big leap in data transparency and how easy it was to use SAP HANA Studio to visualize the data: “with its user-friendly client interface and fast processing, people saw numbers and charts within seconds, so big data was no longer formidable to them.”

globeandmail83

Figure 2: An example of a correlation analysis using SAP HANA Studio on the Globe and Mail clickstream data preprocessed in Hadoop

Yang found that starting up the server with the previous data only takes 15 minutes: “I can use it whenever I want, and all I pay for is the time we use it, nothing more. For small businesses, and companies with no budgets, that’s extremely important. In December, I spent less than $100 Canadian dollars – $25 for HANA One and $63 for AWS cluster servers – that’s all!” This helped make the implementation an easy business decision: “Usually, you build a business case and then buy the products and then implement it. With HANA One, business can make its own case and we don’t need to buy the product upfront – we pay as we go.”

For more details about the Globe and Mail’s Hadoop and SAP HANA One project on Amazon AWS, watch the on-demand web seminar or read an extended interview with Sandy Yang by the InsiderProfiles web site. And find out how easy it is to set up your own SAP HANA One solution in the cloud.

To hear more about SAP’s plans to combine the best of in-memory databases, traditional data warehousing, open-source “NoSQL” technology and more, join us for the Big Data SAP Chat on Wednesday August 21st, 8am PT / 11am ET / 5pm CET.

 

 

One Reply to “SAP HANA and Hadoop in the Cloud: Big Data At The Globe and Mail”

Comments are closed.