The Obama Administration has long recognized the importance and power of Open Data. As President Obama said recently, Open Data “is going to help launch more start-ups. It’s going to help launch more businesses. . . . It’s going to help more entrepreneurs come up with products and services that we haven’t even imagined yet.”
On May 9. the Obama Administration issued an Executive Order and Open Data Policy that will revolutionize how we access and use Open Data from the federal government. The Open Data Policy will require federal agencies to:
- Use open licenses – that is, make data public in a way that “there are no restrictions on copying, publishing, distributing, transmitting, adapting, or otherwise using the information for non-commercial or commercial purposes.”
- Use standard kinds of metadata – “data about data” – to tell users where each dataset comes from, when it was collected, what its quality is, and other essential facts.
- Support interoperability (making it possible to analyze one dataset together with another) and information accessibility (making the data usable in the first place).
- Over time, build an inventory of all the agency’s datasets and publish a list of all the ones that are or could be open to the public.
- Protect privacy and confidentiality, and keep data secure.
In addition, the policy sets out seven key criteria for Open Data. They echo some of the principles developed at a meeting in 2007 assembled by Tim O’Reilly, Carl Malamud and others and funded by the Sunlight Foundation. O’Reilly and Malamud were among the many advocates who praised the new Policy. Malamud called it “the most significant advance in information policies by the federal government since the passage of the Freedom of Information Act” almost fifty years ago; O’Reilly tweeted that building Open Data in this century is like building the federal highway system in the last one.
A week after the announcement I talked with Nick Sinai, U.S. Deputy Chief Technology Officer at the White House Office of Science and Technology Policy. As part of the team that developed the Open Data Policy, he was pleased with the way it was received and shared his perspective. Here’s our conversation:
Gurin: The announcement of the Open Data Policy and the Executive Order last week were really historic. Tell me what some of the early reaction has been and what you all see as some of the implications.
Sinai: Thank you. It really was a big day for open data, with some landmark steps that the President and the Administration took to make data more available and more accessible to the general public, to researchers, to innovators and to entrepreneurs. The response across the innovation community as well as open government and leaders within government has been very positive. We certainly got some great response from folks who have been a bit critical of the Administration when it comes to open government and open data, so it’s great to see their enthusiasm about the policies. The innovation community has embraced our steps as well.
Gurin: What really impressed people, I think, was the combination of the breadth and the specificity of the Policy. There’s a lot of clear direction and clear commitment in ways that I think are going to make a very big difference. What do you think are the aspects that are likely to have the biggest impact most quickly, and what do you see as some of the long-term hopes for where this could all go?
President Obama: Open Data will help entrepreneurs ‘come up with products and services that we haven’t even imagined yet.’
Sinai: Let me start by saying that we wanted to make sure we operationalized this. This is not just a check-the-box policy exercise. It’s something that should really become part of the core DNA of agencies, where they see open data as valuable to achieving their mission objectives. We also wanted to enable and empower agencies to make open data the new default, as required by the Policy. So we did a couple of things. For the first time in US history, we put our policy memo on GitHub, allowing for feedback there. And we also published a series of free open source tools from the office of the CIO and the CTO on Project Open Data that agencies can use to implement open data practices.
These are quite simple, but that’s exactly the point. Taking a simple spreadsheet and turning it into an application programming interface, something that previously may have taken more resources or capabilities. It’s a free set of tools that not only the federal government can use, but also state and local or foreign governments can use, and improve, and really evolve at the pace of technology.
One of the other really exciting things about the policy is that it requires agencies to inventory their data assets in a defined period of time internally and also publish externally a list or a catalog of those data assets that are already public and those that can be made public. That provides for an exciting dialogue between public and other interested parties with agencies about what the priorities are for datasets that should be made public but because of resources and time haven’t been made public yet. So we hope that helps solve some of the asymmetric information challenges in operationalizing open data.
This is something that should really become part of the core DNA of agencies.
Gurin: It’s a very clear toolkit, and it communicates that there’s a seriousness of purpose here to make this happen. One of the things that’s so important about publishing that inventory, which a lot of people have been wanting, is that it really does set up an opportunity for dialogue. I noticed that another part of the policy specifically says to agencies that you expect them to set up ways of getting feedback from stakeholders and having a dialogue with people who are going to be using the data. That seems like a very positive new step.
Nick: We think so. And we think there are great in-person mechanisms and great online and hybrid mechanisms where a variety of stakeholders can get engaged, whether they’re entrepreneurs who are building new products and services, or transparency or good government groups who are focused on accountability or identifying government inefficiencies.
As you know, under Todd Park’s leadership, we’ve been convening stakeholders in workshops, these data jams, and in larger celebration events, these Datapaloozas, to celebrate open data from the federal government. But we need to build on that and institutionalize that for these different stakeholder communities. The Executive Order and policy memo and set of tools are our first steps towards that.
Innovators are using Open Data . . . creating great companies and adding jobs.
Gurin: I’ve signed up to go to the Health Datapalooza in June and I’m looking forward to that. There have been several of those now. Could you talk about the current experience and plans for those large gatherings and what you’ve seen come out of those so far
Sinai: In health, energy, education, public safety, global development, and finance – the six flagship areas for our open data initiatives – in each of them we have had these convenings, or summits, or Datapaloozas to bring together companies, organizations, and civil society. They celebrate and highlight how innovators are using open data to positively impact the public and address important national challenges, and often as a byproduct are creating great companies and adding jobs.
So in June, a number of federal agencies are partnering with some nonprofits to host the fourth annual Health Datapalooza – I think you’ll have something like 1800 entrepreneurs, innovators, clinicians, patient advocates, and others for a variety of different presentation sessions, speeches and so forth. Last year the Health Datapalooza had so many entrants of companies that use open government data to transform health care in interesting ways that they had to have an American Idol style bakeoff to get down to those that could make the main stage with Secretary Sebelius. This year they’re doing a Codeathon which will be really exciting for people who want to roll up their sleeves and get into it . And in these other areas – in energy, education, public safety, global development, finance – we expect to have additional Datapaloozas and convenings as regular occurrences to celebrate innovation.
One of the benefits of these events is they make the upside and the benefits of open data very concrete. When you’re talking about making government information open and public and machine-readable, if you’re not careful it can be perceived in terms of cost and risk. We have to be very careful to protect privacy and national security and sensitive information.
While the questions of cost and risk are really important, often the benefits can be a little abstract. So having these convenings helps demonstrate the very concrete products and services that are helping Americans in very tangible ways, helping save lives, improve health care, helping just immeasurably, and that helps this equation of cost versus benefit.
Gurin: For somebody who’s just getting into the area of open data, what are the best ways to track what upcoming events are, what government data releases are, and so on?
Sinai: One of the easiest things to do is go to Data.gov, the powerful central hub for open government data. It’s going to launch new services in the coming months that include improved visualization, navigation, mapping tools, better context to understand the datasets, as well as more developer-focused tools like application programming interfaces. Data.gov is designed as a central place for people to start. In addition, you can follow the OSTP blog and engage with your specific area of interest. A number of the open data initiatives have specific sites and hashtags, for example around health data. So there is a Healthdata.gov, for example, that I encourage people to check out.
Gurin: You alluded to the questions that have come up about personal data and privacy. There’s been a lot of discussion about the pros and cons of making data available that might have some personal or private aspect to it. Could you talk about how that’s being seen now and what some of the controls are to be sure that’s being done in the right way.
Sinai: The OMB and OSTP memo that came out with the Executive Order is pretty explicit about the need to protect private information and sensitive information and national security information, and is pretty explicit about the mosaic effect – this notion that if we’re not careful, people can be reidentified even if personal identifiable information (PII) is stripped out of the data. So there’s a set of guidelines and requirements around the checks that agencies have to go through when they’re taking that kind of data and delivering it to the public – how to not only strip the PII out of it but also to go through the various mosaic effect checks. The Department of Health and Human Services and the Census Bureau do a very good job of thinking about this.
There’s a variety of different techniques, whether it’s masking, or even introducing slight error into certain records, or other statistical techniques to make sure that folks can’t be reidentified. We’re confident that agencies will continue to exercise that kind of caution. It’s an absolute requirement that agencies work fully to protect privacy and we are happy to work with other stakeholders to make sure we do this appropriately.
The Open Data Policy is a really historic. . . inflection point.
Gurin: Any final thoughts?
Sinai: It’s been exciting to be part of the process here, and to be working with such a great team of folks under CTO Todd Park. I think the Open Data Policy is a really historic set of steps that we will look back and see as an inflection point – both around enhancing government efficiency as well as fueling economic growth.
Gurin: Thanks for your time, and congratulations again.
- Joel Gurin, Founder and Editor, OpenDataNow.com