A Big Cryptographic Boost for On-Demand BI and Extranets?


A recent cryptographic breakthrough made by Craig Gentry that potentially has huge consequences for business intelligence.

One of the big problems plaguing the use of on-demand business intelligence has been concerns over data security.

As Forbes and others have reported, the new techniques could reassure companies who want to take benefit of cloud computing for analysis (at a cost of massive computing power).

As IBM Research VP Charles Lickel notes in a blog post:

Using the solution could help strengthen the business model of “cloud computing,” where a computer vendor is entrusted to host the confidential data of others in a ubiquitous Internet presence. It might better enable a cloud computing vendor to perform computations on clients’ data at their request, such as analyzing sales patterns, without exposing the original data.

The new techniques might also help in another area: extranet business intelligence. In today’s connected ecosystems of customer and partners, it’s essential to be able to share information for the good of all. But although extranets have been around since the earliest web-based BI tools, they are still not as widely used as one might expect.

There are several reasons for this – not least the ubiquitous data quality problems that companies are reluctant to expose outside the company’s firewall. But one big reason is that companies don’t necessarily trust their partners to look after the information.

Today, there aren’t great mechanisms in place for both sharing and protecting information. It’s easy to share information information outside the firewall using today’s web-based BI systems, or an on-demand platform like that of BusinessObjects. But how can you stop people from taking the information and using it elsewhere?

In many ways, this is analogous to letting people buy a CD, but not letting them make a digital copy of it. Although the music industry has tried to introduce and enforce digital rights management, it has proven woefully inadequate.

The health industry is a great example of the “data governance” issues that this problem creates. Individual health records are sacrosanct – nobody wants to have their colonoscopy results freely available on the web. But at the same time, being able to analyze the detailed data of large numbers of patient treatments and outcomes is essential if we’re going to improve medicine as a whole. So we have to be able to protect data and share it in aggregate, and various methods have been proposed to try to achieve this in bodies such as the UK’s national health service (which handily has a lot of data under its control, unlike the US health system).

Other example is industry data. Each participant in an industry benefits by having aggregate information about the industry as a whole, and a lot of middlemen and agencies have sprung up to collect information from companies (retailers, broadcasters, software companies, etc.) and then sell it back to them at a markup. In theory, these middlemen could be made obsolete, with people volunteering data to a central body, with strict controls on what data was made available to each participant. But too often, there isn’t enough trust, and no mechanism for associating payments with data value (this is what killed the “net market” phenomenon of the 90s).

Could encryption be part of the answer? I’m no expert, but so far, it seems like this particular breakthrough only applies directly to large-scale data mining issues, where you can do the analysis in “black box” fashion. The next step would be to allow companies to upload encrypted data sets that could still be linked at the raw data level (using some sort of public-key encryption?) – since this is what would be required for meaningful data analysis.

I’m convinced that we’re going to spend the next couple of decades on these problems – building the equivalent of the world’s monetary system (and look at the problems that has generated recently!) for data. Anything that can help, such as this encryption breakthrough is a step along the way…