I was recently asked some questions by a journalist for Ziff Davis Media for an article on Big Data Trends. Here are my answers….
What are the four biggest trends in big data use this year?
1. The rise of Big Data Discovery, that combines three of the hottest trends of the last few years in analytics: Big Data, Data Discovery, and Data Science. The emerging Big Data Discovery tools will be simpler to use than existing data science products, more accessible to a wider ranger of users, and offer more powerful manipulation of a wider variety of data sources. The tools will allow the rise of a new type of user: a “citizen data scientist”.
2. Self-service data preparation or “data wrangling”. Getting reliable, consistent data has always been the achilles heel of Big Data projects. New tools such as SAP’s new Agile Data Preparation product make it easier for individuals to grab, merge, cleanse, and collect different data sets. They extend and complement traditional ETL (extract, transform, load) tools and hand-coded data movement scripts.
3. Large enterprises are realizing that Big Data is here to stay, and now needs to be integrated with their existing corporate systems — there’s no point in analyzing social data, say, unless you can connect it directly to your marketing and finance applications. This means a lot more focus on governance, with projects such as Apache Atlas, and new ways of organizing corporate analytics communities.
4. Big Data Ecosystems. So far, most big data systems have been used to support the activities of an individual organization. But I’m starting to see the rise of big data applications that cross multiple organizations to allow optimizations across an entire industry. A couple of examples: The Smart Port Logistics platform created by the Hamburg Port Authority. It is designed to connect all of the participants of the port, including the shipping companies, trucking companies, customs officials, and even the truck car parks and retail outlets. By collecting, analyzing, and feeding back information in real time, the Port Authority helps all the participants become more efficient. And the cooperation of Volkswagen, Shell, and SAP on a connected car ecosystem.
(Choosing just four is tough! The other huge area is data privacy and “algorithm ethics”)
Who is winning in big data architecture? What vendors and architectures are becoming standards?
So far, most of the examples of big data implementations have been in silos, separate from the rest of the business. But to truly succeed with big data organizations will need a “multi-polar” strategy that combines new big data platforms like Hadoop and Spark with traditional data warehousing and new HTAP (hybrid transactional analytic systems).
For now, everybody is “winning”, because it’s hard to find a vendor connected to analytics that isn’t thriving. As big data becomes a normal part of business process, vendors that have strong strategic ties with their customers will have an advantage.
How can data brokers bring context to big data?
According to research by Forrester, on average 45% of the data business people use resides outside of the enterprise BI environments. This clearly means that there is a big and growing opportunity for traditional data brokers — but they are starting to get competition from new sectors.
For example, more organizations are realizing that the data in their systems is valuable, and they are starting to investigate becoming data brokers themselves. In addition, organizations like Ariba, who run the largest business network in the world, already provide services such as procurement benchmarking. Since the acquisition of the company by SAP, there are plans to expand these services into new areas.
How should businesses adjust to take advantage of the multipolar analytics trend?
Big Data architectures are going to remain complex for the foreseeable future, since both technology and business needs will continue to evolve rapidly. It’s essential that organizations go beyond just having a “data architecture” plan and create a full information community that includes the business units and data scientists.
The days of top-down, command-and-control information infrastructures are over. A community-based approach is essential to find the right pragmatic tradeoffs that will have the best long-term benefits for the company.
What are three ways that big data can improve IoT initiatives?
1.Big Data is inextricably linked with IoT, as part of the overall datafication trend: exposing previously invisible business processes that can then be analyzed. Sensors generate massive amounts of information that may be in “poly structured” formats.
2. Simple sensors plus sophisticated predictive algorithms can transform existing business processes and create new opportunities. One of my favorites is the creation of “magic carpets” that allow the elderly to live longer in their own homes. They use simple sensors coupled with predictive analytics to discover differences between “normal” and “abnormal” behavior and gait.
3. Big Data infrastructures are required for both real-time IoT response (e.g. when an elderly person has fallen over and needs immediate help, or when an oil well is about to blow) and longer-term analysis (e.g. the predictive algorithms detect an increasing problem with gait, or predict when industrial equipment requires maintenance).