For centuries, we have only collected and crunched a sliver of information because of the cost and complexity of processing larger amounts. We relied on data of the cleanest, highest quality possible, since we only tapped a little of it. And we tried to uncover the reasons behind how the world worked, to generalize. Yet all this was actually a function of a small-data world, when we never had enough information. Change that, and a lot of other things need to change as well.
Think of a car engine. Breakdowns rarely ever happen all at once. Instead, one hears strange noises or the driving “feels funny” a few days in advance. Many vehicles are fitted with sensors that can measure the heat and vibration from the engine. By capturing it in data, one can know what a healthy engine’s “data signature” looks like, as well as how it changes prior to a breakdown. That way, one can identify when a part is about to fail before it actually breaks. The car can alert the driver to visit a service station to get it repaired, as if it is clairvoyant. But we needed lots of data, needed to accept messy data, and had to give up knowing why the engine was about to break for the practical knowledge that it was, without a cause.
That’s big data. It ushers in three big shifts: more, messy and correlations (the book’s chapters 2, 3 and 4). First, more. We can finally harness a vast quantity of information, and in some cases, we can analyze all the data about a phenomenon. This lets us drill down into the details we could never see before. Second, messy. When we harness more data, we can shed our preference for data that’s only of the best calibre, and let in some imperfections. The benefits of using more data outweighs cleaner but less data. Third, correlations. Instead of trying to uncover causality, the reasons behind things, it is often sufficient to simply uncover practical answers. So if some combinations of aspirin and orange juice puts a deadly disease into remission, it is less important to know what the biological mechanism is than to just drink the potion. For many things, with big data it is faster, cheaper and good enough to learn “what,” not “why.”
A reason that we can do these things is that we have so much more data, and one reason for that is because we are taking more aspects of society and rendering it into a data form (discussed in chapter 5). With so much data around, and the ability to process it, big data is the bedrock of new companies.
The value of data is in its secondary uses, not simply in the primary purpose for which it was initially collected, which is the way we tended to value it in the past (noted in chapter 6). Hence, a big delivery company can reuse data on who sends packages to whom to make economic forecasts. A travel site crunches billions of old flight-price records from airlines, to predict whether a given airfare is a good one, or if the price is likely to increase or decrease. These extraordinary data services require three things: the data, the skills, and a big data mindset (examined in chapter 7). Today, the skills are lacking, few have the mindset even though the data seems abundant. But over time, the skills and creativity will become commonplace — and the most prized part will be the data itself.
Big data also has a dark side (chapter 8). Privacy is harder to protect because the traditional legal and technical mechanisms don’t work well with big data. And a new problem emerges: propensity — penalizing people based on what they are predicted to do, not what the have done. At the same time, there will be an increasing need to stay vigilant so that we don’t fall victim to the “dictatorship of data,” the idea that we shut off our reasoned judgment and endow in the data-driven decisions more than they deserve.
Solutions to these thorny problems (raised in chapter 9) include a fundamental rethink of privacy law and the technology to protect personal information. Also, a new class of professional called the “algorithmist” that will do for the big data age what accountants and auditors did for an era 100 years ago, when the cornucopia of information swamping society was in the form of financial data.
What role is left for humanity? For intuition, experience and acting in defiance of what the data suggests? Big data is set to change not only how we interact with the world, but ourselves. Read it, and please tell us what you think.
If you are interested to find out more about the book, take a look at the Q&A with the authors.
What people are saying
“This brilliant book cuts through the hype surrounding big data. A must-read for anyone in business, information technology, public policy…. And anyone else who is just plain curious about the future.” —John Seely Brown, Former director of Xerox PARC
“Just as water is wet in a way that individual water molecules aren’t, big data can reveal information in a way that individual bits of data can’t. The authors show us the surprising ways that enormous, complex, and messy collections of data can be used to predict everything from shopping patterns to flu outbreaks.” —Clay Shirky, author of Cognitive Surplus and Here Comes Everybody
“An optimistic and practical look at the Big Data revolution — just the thing to get your head around the big changes already underway and the bigger changes to come.” —Cory Doctorow, boingboing.com
© Viktor Mayer-Schönberger & Kenneth Cukier – Website by Matuvu