As I’ve worked on my upcoming book, Open Data Now – to be published by McGraw-Hill on January 10 – I’ve had to think through and explain how Open Data, Big Data, and Open Government are related to each other. Lately I’ve seen a number of others, like the authors of the new McKinsey Open Data report (see page 4), try to map the territory in similar ways. The Open Data community is producing a lot of Venn diagrams these days, with a lot of colorful overlapping circles. (Some also deal with the use of Personal Data, but that’s one circle too many for me.)
For my own contribution to the discussion, I’m proposing the model shown here. To all Open Data wonks: Please take a look, comment, and add your own ideas. We’re at a stage where we need to define more precisely what we’re talking about. This may help.
My starting point is the evolving understanding of these three areas. Big Data essentially describes very large datasets, but that’s a somewhat subjective judgment that depends on technology: today’s Big Data may not seem so big in a few years when data analysis and computing technology improve. Open Government is a combination of ideas: it includes collaborative strategies to engage citizens in government; government releasing data about its own operations, like federal spending data; and government releasing data that it collects on issues of public interest, such as health, environment, and different industries.
While others have produced thoughtful and comprehensive descriptions of Open Data, some refer largely to open government data, which I think of as a category of Open Data (though perhaps the most important one). I’ve described Open Data as accessible public data that people, companies, and organizations can use to launch new ventures, analyze patterns and trends, make data-driven decisions, and solve complex problems. All definitions of Open Data include two basic features: The data must be publicly available for anyone to use, and it must be licensed in a way that allows for its reuse. Open Data also should be relatively easy to use, although here there are gradations of “openness.” And there’s general agreement that Open Data should be free of charge or cost just a minimal amount.
Starting with those basic descriptions, the intersection of these three concepts defines the six subtypes of data shown on the diagram. (There’s no separate category for the intersection of Big Data and Open Government – anything in that category is also Open Data.) Here are characteristic examples of each, referring to the numbers above.
1. Big Data that’s not Open Data. A lot of Big Data falls in this category, including some Big Data that has great commercial value. All of the data that large retailers hold on customers’ buying habits, that hospitals hold about their patients, or that banks hold about their credit-card holders, falls here. It’s information that the data-holders own and can use for commercial advantage. National security data, like the data collected by the NSA, is also in this category.
2. Open Government work that’s not Open Data. This is the part of Open Government that focuses purely on citizen engagement. For instance, the White House has started a petition website, called We the People, to open itself to citizen input. While the site makes its data available, publishing Open Data – beyond numbers of signatures – is not its main purpose.
3. Big, Open, Non-Governmental Data. Here we find scientific data-sharing and citizen science projects like Zooniverse. Big data from astronomical observations, from large biomedical projects like the Human Genome Project, or from other sources realizes its greatest value through an open, shared approach. While some of this research may be government-funded, it’s not “government data” because it’s not generally held, maintained, or analyzed by government agencies. This category also includes a very different kind of Open Data: the data that can be analyzed from Twitter and other forms of social media.
4. Open Government Data that’s not Big Data. Government data doesn’t have to be Big Data to be valuable. Modest amounts of data from states, cities, and the federal government can have a major impact when it’s released. This kind of data fuels the participatory budgeting movement, where cities around the world invite their residents to look at the city budget and help decide how to spend it. It’s also the fuel for apps that help people use city services like public buses or health clinics.
5. Open Data – not Big, not from Government. This includes the private-sector data that companies choose to share for their own purposes – for example, to satisfy their potential investors or to enhance their reputations. Environmental, social, and governance (ESG) metrics fall here. In addition, reputational data, such as data from consumer complaints, is highly relevant to business and falls in this category.
6. Big, Open, Government Data (the trifecta). These datasets may have the most impact of any category. Government agencies have the capacity and funds to gather very large amounts of data, and making those datasets open can have major economic benefits. National weather data and GPS data are the most often-cited examples. U.S. Census data, and data collected by the Securities and Exchange Commission and the Department of Health and Human Services, are others. With the new Open Data Policy, this category will likely become larger, more robust, and even more significant.
The map is still evolving, but this is a start. Let’s get the discussion going. I look forward to hearing your thoughts.
- Joel Gurin, Founder and Editor, OpenDataNow.com