By Charles Minshew, Director of Data Services
Today, the White House released the salaries of Executive Office staff members as a PDF.
The NICAR Data Library has converted this file to a CSV for download, publication and sharing. You can access the file in our Github repository.
The database includes salaries for 377 Executive Office employees. Senior policy advisor Mark S. House, a senior policy advisor, has the highest pay -- $187,100. Three employees have no salary: Jared Kushner, Ivanka Trump and Reed Cordish.
Charles Minshew, IRE Director of Data Services
Inspired by our members, IRE is pleased to announce the first release of raw, unprocessed data from the NICAR Database Library.
The contents of the FBI’s Uniform Crime Report (UCR) master file for 2015 are now available for free download on our website. The package contains the original fixed-width files, data dictionaries for the tables as well as the FBI’s UCR user guide. We are planning subsequent releases of other raw data that is not readily available online.
Last week, Buzzfeed released 40 years worth of federal salary data to the public that it received via the Freedom of Information Act. Their work inspired us to do the same with data held by the NICAR Database Library. The UCR data we’re releasing today is only available via public records request.
The yearly data from the FBI details arrest and offense numbers for police agencies across the United States. If you download this unprocessed data, expect to do some work to get it in a useable format. The data is fixed-width, across multiple tables, contains many records on a single row that need to be unpacked and in some cases decoded, before being cleaned and imported for use in programs like Excel or your favorite database manager. Not up to the task? We do all of this work in the version of the data that we will soon have for sale in the Database Library.
The NICAR Database Library at IRE maintains a number of federal datasets and houses decades worth of data from dozens of federal agencies. We also conduct custom data cleaning and analysis for newsrooms around the world. For information or a quote, email us at email@example.com or call us at (573) 884-7711.
Reporters rushed to download data from the Environmental Protection Agency’s open data portal after a contractor reported the portal would shut down this Friday. But the EPA called reports untrue Monday morning and stated that the site will not close and that data will remain available.
Users visiting the EPA’s open data portal were greeted with the message that “This site will be shut down Friday, April 28, 2017.” At one point on Monday, the website was unavailable due to a high number of visitors. That message has since changed to read: “The data on this Web site will continue to be available on April 28, 2017.”
In March, IRE & NICAR announced plans for a federal data directory that will connect reporters and the general public to at-risk, rescued data sets. The directory currently links to hundreds of datasets, including EPA data rescued by DataRefuge, as well as some saved by IRE members.
The Washington Post’s Steven Rich, an IRE board member, is downloading some of the data currently stored in the EPA open data portal, despite assurances from the agency.
“As we saw with the sudden shuttering of open.gov, open data portals are only available for as long as they're available,” Rich said. “Even though the EPA’s open data portal may not be shutting down on Friday, there’s no guarantee the site will remain forever."
During previous government shutdowns, data portals for the U.S. Census Bureau and other agencies became inoperable.
This worry follows a Sunday post from EPA contractor 3 Round Stones, the company that powers the data portal, stating it was notified by the EPA to be prepared to shut down the EPA’s Open Data portal by noon on Friday, April 28. CEO Bernadette Hyland has since updated her post on Medium to read, “The service will remain operational and available.”
In a statement emailed to IRE, EPA spokesperson John Konkus said that Hyland’s original statement was “inappropriate and unauthorized.”
“This is a contractor sending inappropriate and unauthorized communications on EPA's behalf,” Konkus wrote. “The website isn't going anywhere, and this episode has little to nothing to do with contingency plans in case of a shutdown.”
IRE has reached out to Hyland requesting a response to the EPA spokesperson’s statement.
Want to help our data preservation efforts? Let us know if you have collected a dataset, by filling out our federal data survey.
Charles Minshew is IRE's Director of Data Services
Lois Norder, The Atlanta Journal-Constitution
Documents concerning doctors in every state accused of sexual misconduct are now being made available to journalists and other researchers by The Atlanta Journal-Constitution.
The newspaper has created a data portal to share the information, which it gathered in its recent investigation "Doctors & Sex Abuse." That includes public medical disciplinary orders from every state, as well as court records and other documents.
Most of the documents were obtained by scraping medical board websites, and the AJC notes that some websites didn’t include every document on a physician and that scraping may not have obtained every order posted. Those are important caveats to keep in mind.
But the catalog of cases could allow journalists to examine their state’s handling of sexual misconduct cases and identify doctors allowed to continue practice after findings that they abused patients. Documents can be searched by state, by doctor name, or by certain words in the text.
The portal is at http://ajc-data-share.herokuapp.com/
Journalists must agree to ground rules before being granted access. The AJC is requesting that those who use the information for stories also link from their coverage to the doctors.ajc.com website.
More information about individual states and their handling of abuse cases can be found at doctors.ajc.com/states/
That site includes each state’s rating on patient protection laws and other state highlights.
In an effort to promote rescued federal data and prevent duplicated efforts, IRE and NICAR are cataloging rescued data in a public directory.
If you saved a dataset and are willing to share, please take our survey.
There are some datasets of concern that are still publicly available. In the cases of data saved by an IRE member, we provided the link to the original source. If the dataset disappears, we will work with the member holding the data to ensure it's publicly available.
The NICAR Data Library has updated the National Inventory of Dams (NID), a database kept by the U.S. Army Corps of Engineers on dams in all 50 states, Washington D.C. and Puerto Rico with details that include structure type, purpose, owner, most recent inspection date and inspection frequency. The data also includes latitude and longitude.
The safety of America’s dams has been in the spotlight after the main spillway at the Oroville Dam in California, the country’s tallest, collapsed.
You can use these records to zero in on neglected and potentially dangerous dams in your area. In the past, newsrooms have used the data to produce stories on emergency preparedness, aging infrastructure, and the effect of dams on the environment.
More than 3,000 dams have been added to the inventory since the last update from the NICAR Data Library in 2015. In this most recent update, the USACE has included previously-withheld information in the inventory. These two fields list the name of the closest city to the dam and its distance from the dam. The USACE still withholds the hazard ranking of individual dams from the public. The hazard ranking indicates how bad it would be if a dam fails; a dam is ranked “high hazard” if people will likely die in that event. Seventeen percent of the nation’s 90,500 dam structures fall into this category.
While the USACE will not release that information for individual dams, they do provide state-level data. We’ve compiled that into a spreadsheet and included it with the 2016 update. We also include the 2015 dataset as well as the 2002 version of the database which includes hazard rankings. This 2002 data is a starting point for further reporting but note that the rankings might have changed since then.
The team at NICAR has created a guide to getting started with the data and finding the story. You can also download additional documentation from the NID database page.
The IRE Resource Center also has tipsheets and stories to help you get started:
To purchase this data from NICAR, IRE members should visit the NID database page and log in to purchase and download the data online. Non-members should contact the Data Library at firstname.lastname@example.org or (573) 884-7711.
If you have any questions, please feel free to contact us.
Editor's Note: This article first ran on the California Civic Data Coalition's website on Oct. 8. Ben spoke at our San Diego Data Watchdog Workshop, a program funded by the Ethics and Excellence in Journalism Foundation.
By Ben Welsh
Dozens of students and journalists gathered from across Southern California — and even as far as Mexicali, Mexico — to learn advanced online research techniques, receive Microsoft Excel instruction and develop other skills during two days of training.
One of the offerings was a new class created by our team titled “First Python Notebook: Scripting your way to the story.”
Our subject: Contributions to campaigns for and again Proposition 64, a ballot measure asking California voters to decide if recreational marijuana should be legalized.
The data: Drawn from this site’s bulk download service that repackages CAL-ACCESS, the jumbled, dirty and difficult database that tracks money in California politics.
The complete script for the class is now available online on GitHub for anyone to take at home or teach elsewhere. We’re aiming to expand and improve it for future events, so if you give it a try I’d love to hear your feedback. Email me any time at email@example.com.
In the wake of any natural disaster, there’s a seemingly endless number of public service and accountability stories to chase. You want to know when the power is going to come back on. How many people have been displaced, injured and (in the worst storms) killed. Did officials take the right precautions?
We often focus on the toll these disasters take on individuals, but there’s another important frame to consider – an economic one. Businesses are damaged, some never to open again. Workers get laid off. The long-term impact is huge.
Fortunately, the Bureau of Labor Statistics has made it a little easier for reporters to assess the potential economic impact of storms like Hurricane Matthew. The bureau used flood zone maps and geocoded employment data to calculate the number of businesses at risk during a hurricane. The 2012 data also includes the average number of employees working in these zones, as well as the total quarterly wages. You can look at the data on a state and county level, which makes it a great tool for local and national reporters alike.
So, for instance, you could report that:
- In Florida, an average of 1.4 million employees working at 127,433 businesses could have been affected by Hurricane Matthew, a Category 4 storm. Nearly 4,000 of those businesses are in Duval County (Jacksonville), Florida.
- In Charleston, South Carolina, 6,820 businesses were found in the flood zone for a Category 4 storm. The total quarterly wages at stake? $1.2 billion.
Tips for further reporting:
- Broaden your scope. Did water pour into your favorite local restaurant? Maybe a popular small business ended up with serious damage. We see lots of stories about the fates of one or two businesses in the wake of a storm. Use the BLS data to zoom out and add context. That restaurant was just one of X-number of businesses in the flood zone in your county.
- Which industries are at risk? The hurricane zone data doesn’t drill down that far, but there are some workarounds. For instance, you could look up the most prominent industries in a specific county. That could give you some good leads on where to focus your in-person reporting efforts. (Jump to the bottom of this post for more details on how to look up information on industries.)
- Watch for people filing for unemployment. If businesses are damaged or destroyed, it’s likely that many employees will be temporarily, if not permanently, laid off. Keep tabs on the Unemployment Insurance Weekly Claims Data, which is updated every Thursday by the Department of Labor. It might take a couple week or months for the numbers to go up, so check it regularly.
Learn more about the data on the BLS flood zone FAQ page.
Tips for looking up prominent industries:
You might be wondering, what are “location quotients” and why sort on them? Location quotients are ratios that allow us to see if a specific industry has a greater share of employment compared to the nation as a whole. David Hiles of the Bureau of Labor Statistics explains:
If an LQ is equal to 1, then the industry has the same share of its area employment as it does in the nation. An LQ greater than 1 indicates an industry with a greater share of the local area employment than is the case nationwide. For example, Las Vegas will have an LQ greater than 1 in the Leisure and Hospitality industry because this industry makes up a larger share of the Las Vegas employment total than it does for the nation as a whole.
Student debt is quickly becoming a national crisis. But reporting on student loans and college finances has always been thorny, especially when dealing with complicated bureaucracies and patchwork data.
Earlier this year, for the first time ever, the Obama Administration released a comprehensive intersection of student population, college performance and “outcome” data, measuring with precise detail who gets into what school and what they do after graduation. But the Department of Education’s raw “College Scorecard” is a labyrinth of information covering some 7,800 campuses all over the country, broken down by almost 2,000 different variables – everything from enrollment demographics and SAT scores to repayment rates and post-graduate earnings.
Today, NICAR is offering a simplified, Excel-ready version of the database – College Scorecard Simplified – along with a data dictionary and step-by-step guide on how you can analyze data specific to your beat and start reporting. We’ve cleaned and pared down the original database to include the most usable fields, a step that will save busy reporters a great deal of time. We’re also providing a robust list of caveats, in addition to the DOE’s documentation and other important resources, most notably ProPublica’s Debt by Degrees project.
NICAR’s free College Scorecard Simplified database is accessible for reporters of any experience level to quickly download and analyze. Using the data, newsrooms can track and compare schools’ accessibility across different income levels alongside performance metrics and ultimate outcomes.
Many schools across the country are under budget constraints, so understanding how appropriations have impacted students can paint a vivid picture in your city or state. The Scorecard also provides insight into the world of private for-profit colleges, which have sprung up in cities – and online – across the country. Most importantly, the data covers all federal grant and loan recipients, so reporters can now measure the effectiveness of government aid across different types of students and schools.
This project was prepared by Brett Murphy, the 2016 IRE and NICAR Google NewsLab Fellow. Special thanks to Annie Waldman at ProPublica and Andrea Fuller at the Wall Street Journal.
FBI Uniform Crime Reports (UCR) is one of the best tools for tracking crime trends in communities nationwide; FBI UCR data for 2014 is now available from the NICAR data library.
Law enforcement agencies around the country voluntarily submit reports to the FBI on what are known as "index" crimes: Murder, nonnegligent manslaughter, forcible rape, robbery, aggravated assault, burglary, larceny-theft, motor-vehicle theft and arson. These crimes are meant to serve as an index for gauging fluctuations in both the overall volume and rate of crime. The data include the number of crimes by agency and by month. Geographic information include region, state, county, city, and metropolitan statistical area (MSA).
UCR data can help you track different crimes in your community, as well as crime rates over time. Peruse IRE tipsheets for help: