Wage theft happens more than you realize, and in some hidden places. Workers in fast food restaurants, big-box stores or hotels aren’t the only ones. We’ve found examples of Colorado’s teachers, doctors and even mortuary workers whom employers have illegally denied wages by skimping on overtime, meal-breaks or just refusing to pay.

Rocky Mountain PBS I-News has started using data from the U.S. Department of Labor to uncover examples of industries and businesses that chronically violate wage laws. The data present fertile grounds for digging across ZIP codes and industries. This guide will help you use the same data for a quick-hit story or an in-depth investigation.

Quantifying the problem

The Labor Department’s Wage and Hour Division (WHD) tracks its investigations in a system called the Wage and Hour Investigative Support and Reporting Database, or WHISARD (pronounced “wizard”). The agency conducts investigations in response to complaints or on its own.

WHISARD includes data about violations of a few laws you might be familiar with, such as the Fair Labor Standards Act, Family and Medical Leave, and Davis-Bacon Act, as well as other a few others, including those related to contract, migrant and immigrant workers.

WHISARD contains fields on the subject and outcome of the investigation, for example the business name, address, industry code (NAICS), the law that was violated, the amount the employer agreed to repay employees, the number of employees it agreed to pay, and the amount of any civil penalties the agency assessed.

Your state department of labor may have data to supplement WHISARD, so check how it tracks worker complaints. In Colorado, the state maintains a database of investigations that was created around 2001, but the information in it is confidential under a 100-year-old state law.

Data options

I downloaded Wage and Hour Division enforcement data from the Department of Labor’s Data Catalog, rather than make a Freedom of Information Act request for a more complete data set. The online data was slightly more up-to-date and easier to handle.

For wage investigation data, click on “Wage and Hour Compliance Action Data” at the bottom of the page. This version of the data contains one main table called “whd_whisard,” which is downloadable as a CSV or XML file. You can also download the data dictionary. The department updates the data monthly.

The record count for the November 2014 version was 188,005 rows. You can open the CSV file in a spreadsheet, but you might want to use something a little heftier than that, depending on the kind of analysis you want to perform. I used MySQL with SequelPro as front-end client to perform some basic analysis.

If you to generate a customized Microsoft Excel file, you can use the search feature of the Labor Department’s Data Enforcement site to extract a slice of WHISARD data.

You can also obtain the full version of the WHISARD database through a FOIA request or from the IRE and NICAR Database Library, which has data through September 2010.

Unlike the downloads above, the full version is a relational database with eight main and 24 lookup tables that are joined through case_id, the unique identifier. You’ll need a relational database manager, such as MySQL or Microsoft Access to work with the tables.

Problems with the data

Two columns in the data that I downloaded from the Data Catalog were totally wrong. As of now, do not use the fields “flsa_mw_bw_atp_amt” and “flsa_ot_bw_atp_amt” which should contain the amount of back wages the employer agreed to pay for violations of minimum wage and overtime under the Fair Labor Standards Act. Instead, use “flsa_bw_atp_amt” to calculate the back wages an employer agreed to pay under the act, according to a public information officer. She told me the rest of the data are correct and IT is in the process of correcting the mistake. Still, proceed with caution.

The DOL’s Data Enforcement page has incorrect information regarding the date range of the data. I was told the it actually covers 2005 to present.

Another caveat: the database only represents cases that have been investigated by the Department of Labor, and in the case of the data available to download online, only closed investigations. Some workplaces are never, or rarely, investigated.

Different divisions of the WHD have their own investigative focus (like the ski industry in Colorado, for example). So the data do not truly reflect U.S. wage theft. Therefore, you can most accurately attribute the data as “the amount the Wage and Hour Division recovered as of 2005” not “the amount of wages employers withheld.”

As for other dirty data, be careful of variations on the business names and addresses. Sometimes it might include a comma and “Inc.” at the end, or “LLC,” and sometimes it might not. I used OpenRefine to clean it and wildcard searches for quick checks.

One final word of caution, if you want to add up the amount withheld per industry, note that industries are expressed in the NAICS codes in varying lengths, with a greater number of digits representing a more specific subset of an industry. All construction is “23” and roofing contractors are “238160”. But industry code 56 includes a variety of industries, such as armored car services and office administrative services. Experiment with grouping by different lengths of NAICS codes, depending on how specifically you want to define a local industry.

Telling local stories

For our first story, I added up the amount that had been recovered in Colorado to show the minimum amount that workers had been deprived. I used MySQL to make slices of data available to our partner newspapers. Pat Ferrier at The Coloradan used the data for a look at violations in Fort Collins.

One employer we focused on had a history of 30 years of alleged wage violations that were detailed in court records. Yet, the DOL data showed very little about the business. As a result of our story, two former employees we interviewed finally received wages due to them years after the fact.

Finally, if you find something interesting in the data, use the “case id” field to file a FOIA request for the “Narrative and Compliance Action Report” for more details on the violations. The data are really just the starting point.


Anna Boiko-Weyrauch is a reporter with I-News, based at Rocky Mountain PBS in Denver. @AnnaBoikoW