A quick fun example of data quality garbage in, garbage out…*
I once helped organize a user conference where a suspiciously large percentage of the audience came from Afghanistan, because of people lazily clicking on the first country on the registration page.
It turns out that even web giants like Facebook and MySpace have the same problem. In the June 8th episode of the NPR show On The Media, during a segment that discussed children’s use of Facebook despite the age limit restrictions, researcher Danah Boyd said:
“One of the funny things that you will find on these sites is that a huge number of kids actually say that they are from Afghanistan or Zimbabwe, which are the countries alphabetically at the top and the bottom of the possible countries you can be from… so, based on the stats of Facebook and MySpace, there are more people online in Afghanistan and Zimbabwe than there are living there” (@39’00’’)
What’s NOT funny is to realize that most of the time it’s not as easy as this to recognize you have a data quality problem. What if the figures had only been 10% or 15% out? Would anybody have noticed the problem then? Or would they have happily accepted the figures and used them for (bad) decision making?
Poor data quality is the number one technical problem preventing the successful deployment of analytic solutions. Every business should invest in good data quality solutions that help detect and fix bad data.
* Yes, I am well aware 3D charts are not “best practice” – this is deliberate, to annoy information purists whose attachment to visual information dogma distracts from more important topics – like bad data!
Comments
4 responses to “Facebook is HUGE in Afghanistan and Zimbabwe. Or Maybe Not.”
[…] Elliot artikkeli Hyvä pointti siitä miten web käyttäjät ovat laiskoja vastaamaan. […]
Helpful article, with a great illustrative example of crappy data at its most unhelpful.
But how does one assure quality data?
In this example I suppose by incorporating code that pings the user’s IP, identifies the general region, and forces them to enter a valid country before they can proceed. But that equals lost business, because internet users are charactaristically lazy… and because your competitor might not be so stringent.
Where do you draw the line between quality data, and a kick in the bottom line?
Must we all accept that sometimes we’ll have to sacrifice one for the other?
Epic pie chart. Probably the best dashboard I’ve ever seen. #legendary 😉
Professor Elliott — beyond using a browser “share” plugin — unless I’m looking right past it, do you have a way to easily share your articles on G+ ? It’s odd because I clearly see the G+ icon below your various “share” buttons & “Related Posts”, but that just takes me to your Google+ profile and/or allows me to add you to additional circles, etc. No share button?
Cheers,
Jay
Jay, thanks for reminding me… I need to find a new “sharing” plugin (the tweet this one doesn’t include a google plus button), haven’t had time…