Yesterday and today I attended the Big Data Future conference at Ohio State University. Ohio State is now partnering with IBM to create a Big Data center here and is emerging as a major academic power in this field. This conference brought together experts from around the country and one from the UK to provide an excellent overview of current work, trends, and issues in the field. I unfortunately couldn’t stay for this afternoon’s presentations, but you can read more at the conference website and upcoming publication in I/S: A Journal of Law and Policy for the Information Society.
Big Data and Open Data are not the same, but they’re closely related (as I wrote about in my post on my keynote talk). Some of the trends and issues we’re seeing with Big Data are relevant to Open Data as well. In that vein, here are nine Things to Know about Big Data that I learned on my trip to Columbus. You can see the bios of the people I’ve cited with their names in bold on the conference website here.
- Get ready for the Big Data backlash. A number of speakers have addressed the issue of “Big Data hype,” noting that Big Data has been discussed so enthusiastically that we may now enter a reactive cycle. Mike Nelson decried what he saw as an emerging “bad datatude” in the public, and even suggested that we should rename Big Data, which may have a Big Brother-like connotation for many people. His suggestion: BFFMUDD, for Big, Fat, Fast, Messy, Unstructured, Distributed Data.
- Beware of “big data hubris.” Several speakers quoted a new report showing that Google Flu Trends – one of the first big examples of Big Data’s predictive value – has turned out to be highly inaccurate. Apparently Google may have outsmarted itself by tweaking its algorithms in the wrong way. Whatever went wrong, it’s an example that shows that hoping to find truth by crunching data without looking at a broader picture is not likely to work.
- Data can’t replace judgment. Data, especially Big Data, is a tool that can aid human decision-making, but it can’t replace it. Ray Harishankar put it this way: “Data plus analysis is information. Information plus context gives you insight. And insight has to lead to the right actions that lead to outcomes that add value.”
- Correlation doesn’t trump theory. Some Big Data advocates have suggested that Big Data almost makes theory unnecessary: With enough data, they say, we can find patterns and trends that are important and informative even if there’s no theory as to why. It’s true that simple correlations can fuel accurate predictions, up to a point. But even the ability to do predictive analytics doesn’t mean you really understand how the systems you’re studying are working. Eytan Adar suggests that we look at Big Data efforts on a spectrum from predictive to explanatory, and pay more attention to the need to understand what we’re seeing, not just see patterns that may predict future patterns.
- Big Data is tracking a mobile society – with a risk. Around the world, mobile devices are the primary online human connectors. Farnam Jahanian points out that there will be twice as many mobile devices as people in the world by 2015, all capable of sending location and other data to companies that can collect it. This will be one of the major sources of Big Data about society in the future. But Kate Crawford notes the privacy risk here: Since human mobility patterns are unique, you can identify an individual with only three or four cellphone-generated data points.
- Big Data can help cities – or damage urban democracy. The ability to track city activity through cellphone data, remote environmental sensors, laser-generated aerial maps, and more, can give us “hypercoordinated cities” with a “higher urban metabolism,” in the words of Harvey Miller. (I unfortunately had to leave before Michael Batty gave a featured talk on urban analytics, but he is making it available on his website.) Again, though, Kate Crawford noted the cautions here. If we’re not careful, urban data-gathering can disproportionately help the well to do and hurt the poor. Boston’s StreetBump app, for example, collects data on potholes by tracking smartphone vibrations that can reflect a bumpy roadway from volunteers. But most of smartphone owners are well off, so that potholes were initially detected and fixed in more wealthy areas – a problem that StreetBump is now moving to fix. On the flip side, “predictive policing” is being used to focus police surveillance on areas that are predicted to have high crime rates, which can lead to discriminatory law enforcement.
- Privacy still matters. Forget about those reports claiming that the public, especially young people, have given up on privacy. We still care about privacy, we just don’t know what to do about it. There are two concerns: We want to know what government agencies or data-tracking companies are finding out about us, and, if we don’t like it, we want them to stop. It’s still not clear how to address these concerns. Some speakers at the conference suggested that the solution is simply to have the government and companies become more transparent about what they’re collecting, an approach some called “mutually assured disclosure.” But Gary Bass, a long-time transparency advocate, said that that proposed solution is “not the real world. For 30 years I have fought tooth and nail to make data available, and the government and companies have fought tooth and nail not to. . . . This is a fight that goes on all the time.” The risk, as others said, is that we may strengthen the growing “power asymmetries” between the data collectors and those whose data they collect.
- Big Data should be beautiful. The rapid advances in data visualization are producing some beautiful results. Take a look, for example, at this video of “experiments in bicycle flow” that gradually resolves data on bike traffic in London to show the clearest patterns, or some of the visualization work being done at the Ohio Supercomputer Center. Data visualizations like these aren’t just about esthetics, they’re about understanding. Angela Shen-Hsieh, an expert in data visualization at IBM, talks about the need to make data “more suitable for human consumption,” and to focus on “the last 18 inches” of the information journey from the computer screen to your brain.
- Big Data will (probably) generate big value. Despite all the caveats, there is a lot of social and economic value to be found in Big Data. McKinsey’s landmark report on Big Data a few years ago predicted that it would generate trillions of dollars in economic value. Angela Byers of Microsoft, who co-authored that study, said today that it might still take five or ten years to generate that value – partly because we still face a major skills gap between the amount of data that’s available and the number of people who know how to work with it. But the economic value is likely to come, and in some surprising ways. Johan Bollen and his colleagues use sentiment analysis on Big Data from Twitter to predict the stock market: their computation of the “calmness” of sentiment on Twitter predicts the Dow’s closing level three days later.
Joel Gurin, Founder and Editor, OpenDataNow.com