Will BI 2.0 Sparql Thanks to the Semantic Web?

The Tech Sanity Check blog has a great ZDNet interview with Tim Berners-Lee, inventor of the worldwide web, on the standards underlying the “semantic web”.

As I’ve noted on this blog before, I’m not a fan of using the phrase BI 2.0 to simply cover a laundry list of features that enterprise BI vendors were already planning to implement.

If BI 2.0 means anything, it should be about how BI will adapt to a web 2.0 world and unlock the value stored across the web as a whole, not just corporate databases. Personal or consumer BI will flourish in a web 2.0 world, and this will in turn impact how organizations deal with the same issues. Given this definition, the semantic web is clearly a big part of BI 2.0.

I’ve paraphrased below some of the key parts of the interview (it’s hard to speak in perfectly coherent sentences when you’re accosted by a journalist in a hallway on your way to receive a lifetime achievement award — I’ve tried to keep the meaning the same).

On the meaning of the “semantic web”:

It’s a rather overcomplicated word for the data web. We have documents on the web, we don’t have data yet. The data standards are now in place so that you can get the data that is currently locked in silos by application out there so you can get data from one application and another application, and then pull into a spreadsheet, graph, or map, and analyze it together. It’s now taking off, and it’s very exciting.

On where we are today:

We started off with the RDF data standard and the OWL schema/ontology standard. We didn’t have a query langage. Today, it’s as if we have a web-based relational database system, it’s like taking databases and spreading them out and connecting them together across the web. It was like relational databases without SQL, without a standard query language. Now we have an equivalent for querying web databases, a web protocol called Sparql, and it’s just being polished off. And we’re just starting on a rules language. But because we have the data standard, ontology standard, and sparql, we’ve crossed the chasm, and you can now start to create large applications, and we’re starting to see this taking off.

On the first implementations:

There’s a lot of buzz about link data. i.e. you pull in some data, create a graph and each item has a URI associated with it, and you can click on the relationship to pull in new data about that item. It’s like the web, but pulling in data instead of documents: it could be a relationship (a person, the company they work for…), a product, a competing product… and you can bring that information in and compare it, etc. it involves real data processing. There’s a lot of excitement about public data sources.

The prototypes have been done by academics, but there are some semantic web products out there, or companies are attaching semantic web capabilities to existing products, because of course a lot of the applications which exist out there at the moment use data, so all you have to do is import/export standard RDF to enable them to interoperate with other things.

On how it compares to mashups:

The mashup folks have understood the value of bringing together interesting data sources, either by reverse engineering sites or using APIs. But you have to do it separately for every application. With the semantic web, somebody does a one-time mashup to turn it into RDF, and then that acts as a bus connecting to everything else. Nobody else has to do the back-end part of the mashup. So you can pull in data from completely different data sources and compare it and put it together. Mashups are taking two specific stove pipes and putting them together. Semantic web is about connecting everything, using a single standard. So you can just query data, graph it, explore it, swim through it.

The role of open standards:

We must keep the same openness that we have before — common and royalty-free. The excitement is not about competing technologies, it’s about what you can do with them.

Does connecting information across application stovepipes sound familiar? So what does the semantic web mean for business intelligence?

The question isn’t new — for example Neil Raden wrote about semantics for Intelligent Enteprise back in 2005 “Start Making Sense: Get From Data to Semantic Integration” — but it hasn’t received a lot of attention in the BI community, despite the obvious overlaps.

Now that the sparql standard is being finished off, and applications start emerging, the links between semantic web and BI may be starting to come to the fore.

Where we are today is covered well in an W3.org talk by Ivan Herman on the “State of the Semantic Web“. Particularly interesting (for me at least) are the sections on how sparql may be a unifying point, and that “some of the messenging on semantic web has gone horribly wrong” — i.e. that it’s been made to sound a lot more complicated than it really is, and so adoption has suffered. Ivan makes the prediction that the semantic web is poised for growth, at level 4 on a 5 point scale towards the same type of adoption as the worldwide web itself, and that the next step is “adoption by big business”.

Ivan points out that Gartner now includes the “corporate semantic web” on their emerging technology hype cycle. As you can see from the graphic, it’s a long way from the “plateau of productivity”, but far enough advanced to be making a real impact on corporate data access.

The hype cycle

Gartner’s emerging technology hype cycle, 2006, as reported in ZDNet.

So what do you think? The semantic web is clearly an extension of BI’s existing vision statements (especially now that unstructured data is coming into the picture).

Will BI vendors be the ones to take the semantic web into prime time? Will they be able to deliver?


More on “Sparql: A Query Platform for Web 2.0 and the Semantic Web