{"id":12349,"date":"2013-08-19T09:53:45","date_gmt":"2013-08-19T08:53:45","guid":{"rendered":"http:\/\/timoelliott.com\/blog\/?p=5632"},"modified":"2013-08-19T09:53:45","modified_gmt":"2013-08-19T08:53:45","slug":"saphana-and-hadoop-in-the-clou","status":"publish","type":"post","link":"https:\/\/timoelliott.com\/blog\/2013\/08\/saphana-and-hadoop-in-the-clou.html","title":{"rendered":"SAP HANA and Hadoop in the Cloud: Big Data At The Globe and Mail"},"content":{"rendered":"<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" style=\"background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;\" title=\"hana-and-hadoop\" alt=\"hana-and-hadoop\" src=\"https:\/\/i0.wp.com\/timoelliott.com\/blog\/wp-content\/uploads\/2013\/08\/hana-and-hadoop.jpg?resize=690%2C310&#038;ssl=1\" width=\"690\" height=\"310\" border=\"0\" \/><\/p>\n<p>Like most media organizations around the world, the Toronto-headquartered <a href=\"http:\/\/www.theglobeandmail.com\/\" target=\"_blank\">The Globe and Mail<\/a> has struggled to make a <a href=\"http:\/\/business.financialpost.com\/2012\/05\/10\/globe-and-mail-to-begin-charging-readers-for-online-content-this-fall\/\">profitable transition<\/a> from physical newspapers to online journalism. But now a combination of Hadoop and SAP HANA in the cloud is helping make critical decisions about how and when to charge readers for online access to articles.<\/p>\n<p>In print for 167 years, The Globe is Canada\u2019s largest newspaper, with over 300 journalists covering national, international, business, technology, arts, entertainment and lifestyle news for around 3.5 million readers a week across the country. Over the last decade, the company has invested in comprehensive data gathering and analysis systems, starting with SAP ERP in 2002 and a full <a href=\"http:\/\/www54.sap.com\/bin\/sapcom\/downloadasset.spread-the-news-pdf.html\" target=\"_blank\">enterprise data warehouse using SAP BW<\/a> in 2007.<\/p>\n<p>In early 2012, data analysis became an urgent business priority because of the company\u2019s <a href=\"http:\/\/www.huffingtonpost.ca\/tag\/globe-and-mail-paywall\">paywall project<\/a>. The company knew casual readers were coming to the web site and needed to work out how many articles the company should allow them to read before asking them to pay.<\/p>\n<p>Sandy Yang, a functional analyst at the Globe and Mail, explained the problem: \u201cIf we set the bar too high, we won\u2019t have enough people to pay for our content, but if too low, they might never come back, and then we might lose a big chunk of our advertising revenue.\u201d The ideal solution be where \u201csome people shouldn\u2019t even know we have a pay wall, but some should think that their money is well spent. We need to find the right balance by analyzing user behavior.\u201d<\/p>\n<p>The company uses Omniture to get insight into which articles readers are interested in and key statistics such as the number of page views per period or unique visits per period per section. But answering more complex \u2013 and important \u2013 questions required further analysis on the raw clickstream data. The internal IT teams first tried to import the web data from Omniture into a traditional relational database. But the data was complicated, stored in tab delimited text files with millions of lines, each having around 500 fields, and was growing at a rate of several gigabytes a day. The company turned to Hadoop to process the web data, but wasn\u2019t ready to buy and maintain its own servers, so used Amazon\u2019s Elastic MapReduce Architecture and stored the results in Amazon S3.<\/p>\n<p>But that didn\u2019t solve all the analysts\u2019 problems. Yang explains: \u201cThe result is a whole lot of numbers. Every time a job finished, I had to add column headers and reformat the data to explain what it meant. And as soon as I handed it over to the business, they said \u2018OK, it looks good, but what if\u2026.?\u2019 I had to explain that it was a batch process, and that I couldn\u2019t drill down and give the answer immediately. I hated to have to answer like that, but I didn\u2019t have better options.\u201d<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/timoelliott.com\/blog\/wp-content\/uploads\/2013\/08\/globeandmail5.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" style=\"background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;\" title=\"globeandmail5\" alt=\"globeandmail5\" src=\"https:\/\/i0.wp.com\/timoelliott.com\/blog\/wp-content\/uploads\/2013\/08\/globeandmail5_thumb.jpg?resize=632%2C384&#038;ssl=1\" width=\"632\" height=\"384\" border=\"0\" \/><\/a><\/p>\n<p><em>Figure 1: The Globe and Mail paywall project architecture, featuring Hadoop on Amazon AWS and SAP HANA<\/em><\/p>\n<p>Then Yang discovered SAP HANA One, a version of the company\u2019s new in-memory platform that runs in the Amazon Cloud. \u201cIt was so simple we didn\u2019t think it was an SAP product! \u00a0HANA One bridged the gap between our inexplicable big data and our incredibly creative business people\u201d. The speed of the product met expectations: \u201cthe real-time aspect instead of batch processing was delivered as advertised.\u201d And HANA One came at the right price: \u201cThe total cost was just $3.50 an hour for visualization of data, instant response from user requests and more.\u201d<\/p>\n<p>Setting up the system took less than four part-time days. Yang demonstrated it to the marketing teams who were instantly impressed with the big leap in data transparency and how easy it was to use SAP HANA Studio to visualize the data: \u201cwith its user-friendly client interface and fast processing, people saw numbers and charts within seconds, so big data was no longer formidable to them.\u201d<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/timoelliott.com\/blog\/wp-content\/uploads\/2013\/08\/globeandmail83.jpg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" style=\"background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;\" title=\"globeandmail83\" alt=\"globeandmail83\" src=\"https:\/\/i0.wp.com\/timoelliott.com\/blog\/wp-content\/uploads\/2013\/08\/globeandmail83_thumb.jpg?resize=619%2C367&#038;ssl=1\" width=\"619\" height=\"367\" border=\"0\" \/><\/a><\/p>\n<p><em>Figure 2: An example of a correlation analysis using SAP HANA Studio on the Globe and Mail clickstream data preprocessed in Hadoop<\/em><\/p>\n<p>Yang found that starting up the server with the previous data only takes 15 minutes: \u201cI can use it whenever I want, and all I pay for is the time we use it, nothing more. For small businesses, and companies with no budgets, that\u2019s extremely important. In December, I spent less than $100 Canadian dollars \u2013 $25 for HANA One and $63 for AWS cluster servers \u2013 that\u2019s all!\u201d This helped make the implementation an easy business decision: \u201cUsually, you build a business case and then buy the products and then implement it. With HANA One, business can make its own case and we don\u2019t need to buy the product upfront \u2013 we pay as we go.\u201d<\/p>\n<p>For more details about the Globe and Mail\u2019s Hadoop and SAP HANA One project on Amazon AWS, watch the <a href=\"https:\/\/event.on24.com\/eventRegistration\/EventLobbyServlet?target=registration.jsp&amp;eventid=587800&amp;sessionid=1&amp;key=80B4D8245261F94E532F6CD6DE19BA13&amp;sourcepage=register\" target=\"_blank\">on-demand web seminar<\/a> or read an <a href=\"http:\/\/insiderprofiles.wispubs.com\/article.aspx?iArticleId=7001\" target=\"_blank\">extended interview with Sandy Yang<\/a> by the InsiderProfiles web site. And find out how easy it is to set up your own <a href=\"http:\/\/www.sap.com\/pc\/tech\/cloud\/software\/hana-cloud-platform-as-a-service\/index.html\" target=\"_blank\">SAP HANA One solution in the cloud<\/a>.<\/p>\n<p>To hear more about SAP\u2019s plans to combine the best of in-memory databases, traditional data warehousing, open-source \u201cNoSQL\u201d technology and more, join us for the <a href=\"http:\/\/www.saphana.com\/community\/blogs\/blog\/2013\/08\/08\/sap-big-data-chat\" target=\"_blank\">Big Data SAP Chat on Wednesday August 21st, 8am PT \/ 11am ET \/ 5pm CET<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Globe and Mail newspaper used a combination of Hadoop and SAP HANA in the cloud to help answer critical business questions.<\/p>\n","protected":false},"author":2,"featured_media":5627,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[2,3,6],"tags":[27,100,346,556,560,835,911,931,1031],"class_list":["post-12349","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-best-practice","category-bi-20","category-deployments","tag-bigdata","tag-analytics","tag-data-scientists","tag-hadoop","tag-hana","tag-predictive","tag-sap","tag-saphana","tag-studio"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/timoelliott.com\/blog\/wp-content\/uploads\/2013\/08\/hana-and-hadoop-1.jpg?fit=690%2C310&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3X9RF-3db","_links":{"self":[{"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/posts\/12349","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/comments?post=12349"}],"version-history":[{"count":0,"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/posts\/12349\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/media\/5627"}],"wp:attachment":[{"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/media?parent=12349"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/categories?post=12349"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/timoelliott.com\/blog\/wp-json\/wp\/v2\/tags?post=12349"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}