It’s official: An Open Data company has won an award that gives it bragging rights as the hottest startup of the moment. Last month, Enigma.io was chosen out of thirty companies for the top award at TechCrunch Disrupt NY. It’s not only great news for this New York-based company, but a sign of the growing importance of Open Data. You can see the team’s award-winning presentation here:
Enigma.io was named in honor of the codebreaking machine developed by computer pioneer Alan Turing during World War II, and also because the founders believed that too much public data was more enigmatic than it should be. Enigma.io would demystify it. “The idea,” says CEO Jeremy Bronfman, “was to create something like the Internet for public data that really made it easily accessible and digestible.” The vision has proved compelling. When I met with Bronfman in Manhattan three weeks after their victory, he had been fielding calls from a number of companies that wanted to work with Enigma.io, as well as several government agencies that thought the startup could help them get their data into shape.
The company’s strategy, Bronfman explained to me, was to take valuable public data, beginning with data that was “relevant to corporations or economic activity,” and make it much more usable. Two challenges here. The first was accessibility – the problem you have when you know what you’re looking for but you can’t get to it in a usable form. The second was what’s being called data discovery – the ability to search about a topic you’re interested in and discover data that you didn’t know you needed.
“Let’s say you’re an analyst who’s following McDonald’s,” Bronfman says. Every time a new McDonald’s opens, it has to file for an FCC license to operate the drive-through window. That means you can track new franchises by looking at FCC license data. However, you might very well not know that.
FCC data can help track McDonald’s growth.
“We’ve all been trained by Google, at least in my generation,” says Bronfman. “So the approach you take is, ‘I want to learn everything about McDonald’s,’ not, ‘I want to look for FCC licenses.’” It’s Enigma’s job to connect you to the data you need, and to help you discover that the FCC has data that’s relevant to the fast-food business. “We want people to be able to interact with data just by searching based on the topic they’re interested in rather than having to figure out where that information lives,” says Bronfman.
To build Enigma, the team – now a little over a dozen people, all based in New York – has had to figure out how to take data in all kinds of formats, from data mailed to them on CDs to data scraped from company websites, and put it into an easily usable form. And they’ve had to make it possible to search through the entire data set – not just the metadata used to describe it – at a rapid rate. Maintaining the speed of search gets harder as Enigma.io adds more data; the company already has a hundred thousand data sets (essentially, a hundred thousand large data tables) and more than twenty billion individual data points. On the other hand, figuring out how to incorporate new data sets gets easier over time, because there are only so many different data formats to solve.
Today, Enigma charges for access to its data on a sliding scale – more for hedge funds, less for academics, nonprofits, or government agencies. Eventually, though, the company wants to make more of its data and search capability available for free. (If they were to do it today, the traffic would crash their system.) In this model, Enigma’s revenue would come less from access to data and more from analytic or other premium services.
Aircraft registry data predicted Sarah Palin’s nomination.
While the value of many Enigma data sets is apparent, some have a more hidden meaning. I was curious about why the company makes aircraft registry data available. “It was part of the inspiration for Hicham and Marc,” says Bronfman. “They read an article by someone who figured out that Sarah Palin was going to be the 2008 vice presidential nominee a few days before it was announced by using this data set. He took a list of aircraft and went to each of the Web portals of all the top Republican donors, then figured out which companies they were involved with, looked at the planes that they owned, and saw that all these planes that never flew above Chicago suddenly were flying to Wasilla.”
“Another database that is getting an enormous amount of use, though we only have it for a few states now, is sales tax receipt data,” says Bronfman. “Every time a retailer makes a transaction in a state that has sales taxes, there is a filing associated with that. Each state has a different level of granularity with which they report it and a different update schedule. But for example, we can tell you what every restaurant or hotel in the state of Texas’ revenues were for the month of March, location by location. That’s an enormously powerful indicator both macroeconomically as well as microeconomically.”
Even at this relatively early stage, Enigma.io has developed a robust data resource and an impressive set of tools. Bronfman described and showed me a few things you’ll be able to do with Enigma:
- Mash up your own proprietary data (for example, a customer list) with Enigma’s public data behind a secure firewall
- Create alerts for events of interest that are reflected in new data
- Do geographically-based searches using data that Enigma will geocode
- Take a company – say, Boeing – and see what government agencies it contracts with to see how it may be impacted by cuts in government spending
- Link news articles to Enigma data – enabling you to click on a company mentioned in the Wall Street Journal and read Enigma data about its lobbying, campaign contributions, or other activities
Ultimately, says Bronfman, just as there is one Internet, “we think there is going to be sort of one central ecosystem around which all this stuff happens. We want that to be us. There are a lot of incredible businesses that can be built on top of this data in powerful ways. There’s amazing innovation that can be built when you provide the right ecosystem and tools.”
- Joel Gurin, Founder and Editor, OpenDataNow.com