Open data benefits many, but cost breakdown unclear - Investigative Reporters & Editors

Editor's Note:

This article first ran on July 20, 2017 on the Investigative Reporting Workshop's website.

By Clairissa Baker and Yang Sun, Investigative Reporting Workshop

A new citywide data policy in Washington, D.C., shows there is no simple way for cities to clearly budget open data initiatives.

Meanwhile, as the city works this summer to implement its newly formed data policy and decide what’s releasable, experts say when more data sets are made available online the result will be better access to information, better journalism and more government transparency.

For investigative journalists, the upcoming data release — expected within nine months after the issuance of the order and adding to the 900 sets already available online — means greater access to information on everything from traffic patterns to invaluable health statistics.

“Just knowing the government has that data is a huge step,” said Kate Rabinowitz, founder of the DataLensDC blog that works to help citizens better understand and access the city’s open data.

But Washington officials can’t seem to determine how much the massive data availability will cost taxpayers. The work being done now at the Office of the Chief Technology Officer is in response to Mayor Muriel Bowser’s April 27 executive order, which calls for an inventory and classification of all government data.

Open data matters differently to each citizen, but it can matter a lot.

One example is Google Maps. Many use this tool every day, and it is based on open transit data from city governments, allowing people to determine best routes and commute times.

Rabinowitz also acts as one of the co-captains of Code for DC, a civic hacker community that translates data sets into content usable to average citizens.

One project Code for DC works on is called Housing Insights. The city has a massive amount of data on affordable housing, “but they are all over the place and it’s kind of messy,” Rabinowitz said.

A team of coders and data scientists collected all the relevant data sets and created interactive visualizations, allowing policy makers and the public to understand what affordable housing looks like in the District and what the challenges and opportunities are.

By sharing civic data, people will become more informed of city services, journalists will tell better stories of the city and institutes will advance their research, said Stephen Larrick, the open cities director for the Sunlight Foundation.

Quantifying the benefits of open data, however, can be as hard as measuring the costs.

“How can you put a price on an informed public,” Larrick said. “And how can you put a price on people having the facts that they are relevant to the decisions that are being made?”

It’s not the first time the city has made data public, but this new policy is an important step in making the government more transparent and accountable, experts say.

A decade after the debut of the city’s online data portal, there are more than 900 data sets available online on a range of subjects from information about 311 calls to crime reports to others on health care and government spending.

As city leaders work to implement the policy, there’s no clear cost associated with the rollout.

“It can be a real irony,” said Larrick. “Many open data programs are about being transparent about things like cost,” yet the cost of the program is obscure.

A number of aspects contribute to the inability to quantify costs of open data.

“Open data is a new thing, and very often it is the thing that is a part of someone’s job, but it’s not someone’s full-time job,” Larrick said.

Employees might work on data infrastructure or web services, among other assignments. Some of these tasks fall into the costs of open data, but unless employees track exactly how many hours they spend working on open data, it is not clear how much.

Not knowing how much open data programs costs is often a barrier to implementing policies.

The city is not at fault for lacking a concrete budget, Larrick said, but “the government should do a better job” of examining and listing these costs.

An analysis of the 2017 budget and the 2018 proposed budget for the Office of the Chief Technology Officer shows multiple line items related to open data, including “data transparency and accountability” and “data governance and analytics.” Even in those line items, however, it’s unclear what is related to the mayor’s new initiative.

Each agency is responsible for finding and categorizing its own data, so the costs are spread out and vary widely depending on the city, a Sunlight Foundation survey found.

One of the biggest differences between cities is whether they use a contractor — two include Socrata and Junar — to host the websites that contain this data.

Washington creates everything in-house; the city pays its staff to create and maintain a website to host the city’s data.

In January 2016, Bowser announced the Open Data Initiative and created the Chief Data Officer position.

About seven months later, Barney Krucoff filled the position, leading a team of more than two dozen people who will reach out to each agency and coordinate data collection.

Before Bowser’s 2017 executive order, technical teams were already in place. After, those were rearranged with staff from Business Intelligence, Geographic Information System and the Citywide Data Warehouse.

“D.C. was the leader of open data in general, but we didn’t have a data policy,” Krucoff said.

He said the District posted its data sets in the early 2000s and built a website hosting data sets published by the government in 2007, both among the earliest in the country. The city even led a hackathon, called Apps for Democracy, in 2008, which Krucoff described as “new and novel of the time.”

US City Open Data Census, research co-conducted by the Sunlight Foundation, Code for America and Open Knowledge International, in 2015 ranked D.C.’s data openness 27th out of 100 cities, as San Francisco, Las Vegas and New York City took the top three positions.

Washington leaders hope to get back to the cutting edge of government transparency with the new data policy. Krucoff said what separates this administrative action from others is that it’s a data policy rather than an open data policy.

The government will not only log and categorize all of the data, but also create a system in which enterprise data is “freely shared among district agencies, with federal and regional governments,” and with the public when the information allows it, according to the data policy.

All city agencies’ data sets will be classified on a scale from zero to four, where level zero data sets have no confidentiality concern and should be completely disclosed to the public.

Significant steps will be taken to ensure the safety of information with privacy and security concerns. The policy includes a host of security protocols for agencies to follow while handling sensitive data sets.

Feedback from the public, such as transparency advocates and civic hackers, helped shape the District’s final version. The drafting team also looked at New York City, San Francisco and the state of Maryland, Krucoff said.

The government will proactively publish a whole class of non-confidential information. This will complement, but not replace, the Freedom of Information Act.

FOIA legally requires government’s reaction on individual requestors and covers items ranging from hard-copy documents to videos.

“I think FOIA will always cover a wider set of material, and open data will cover a more specific set of what we can be proactive,” Krucoff said.

However, there is a gap between the technical language of open data and the accessibility of it by citizens without a computer background.

To bridge the gap, data intermediaries, such as researchers and developers, play an important role. They use the data to make recommendations and tools that the public can understand and use.

Journalists use this resource to find information about their communities. Having information available online makes journalists’ jobs a little easier because the government can place data online that is asked for many times over, instead of responding to requests every day or every month.

“That transparency leads to, I think, a better relationship between government agencies and the public,” said Charles Minshew, data services director at the Investigative Reporters & Editors.

Opening up data is beneficial to both the government and the public, Minshew said, and “it is a true public service.”

Besides the promise of a citywide data inventory, the city also redesigned the online data portal by incorporating more functionality, including visualizations, search functions and interactive tools.

Michael Kalish, Rabinowitz’s counterpart at Code for DC, appreciates the city’s efforts in increasing the website’s usability.

“So I think they’ve very quickly went from something that was not very user friendly to a very approachably useful tool to the community, ” Kalish said.

More needs to be done, however, to make data truly open.

Rabinowitz encounters data inconsistencies and missing records while working with city's open data.

In terms of improving the open data quality, Larrick’s primary suggestion for cities is to reach out to communities and listen to their needs.

“It really makes the benefits of open data a lot more tangible,” Larrick said.

Krucoff hopes going forward that the data policy will empower analysts of each agency to explore the value of data and develop a community in which agencies have a common set of tools and data-minded individuals.

“We generally believe that data is sort of an important asset to the city that we’ve never really known,” Krucoff said.