Skip to content

By Nakylah Carter, IRE & NICAR

When investigating topics with large data sets, reporters have to become creative and proactive in order to include all the crucial information to make the story pop. In this edition of Data Dive in The IRE Journal, we feature two investigative data pieces about the American health crisis. 

In The Washington Post series “Dying Early: America’s life expectancy crisis,” the newspaper investigated the decline in U.S. life expectancy, revealing that the crisis is not temporary and rooted in deeply systemic issues. While opioid overdoses and “deaths of despair” were widely reported, their findings showed that chronic diseases like heart disease, obesity, and liver ailments are the primary drivers of the decline, erasing more years of life than overdoses, suicides and homicides combined. 

In The Baltimore Banner’s collaboration series with The New York Times discussing “Baltimore’s Overdose Crisis,” the organizations explored CDC Wonder death data and National Cancer Institute SEER death data and found that Baltimore had the highest overdose rate of any major American city in American history. Additionally, they found that Black men who were born between 1951-1970 were the most impacted. 

Learn more about each investigation below, and what steps they took to execute their investigation with the data they were able to obtain. 

A man with
James “Big Ken” Manuel, 66, a heart patient with many comorbidities, goes through his evening medicine ritual at his home on the West End of Louisville on June 5, 2023. He estimates spending at least $800 a month on medicine. Jahi Chikwendiu / The Washington Post

Data reveals that terminal illnesses are the leading factor of America’s life expectancy crisis

America’s life expectancy crisis often gets overlooked because it is so easy to miss. 

Because the decline in American life expectancy is not novel, The Washington Post wanted to find a unique angle to tell this story from. They were able to do so with “pathbreaking data analysis, revelatory story reporting and framing, and distinctive visual presentation.” Their largest challenge? Confronting the false narrative that America’s declining life expectancy was solely due to the grim manifestation of opioids and recent despair when in reality, it was far more deeply rooted than that. 

“When I started analyzing it, I realized that it’s absolutely not about overdoses and suicide,” said Dan Keating, The Washington Post data reporter on the project. “I mean, those were bad, but those aren’t what really drove America’s life expectancy crisis.”                                                                                      

After seeing a long-term comparison of U.S. life expectancy compared to peer nations, Keating was determined to look into the reason why the United States life expectancy was on the decline and “the causes of the causes” of deaths driving the change. In the year from conception to publishing, the Post did just that. 

The Washington Post found that while external causes like overdoses and gun violence make Americans under 40 up to 2.5 times more likely to die than their peers in other wealthy nations — resulting in nearly 60,000 excess deaths annually — chronic diseases such as diabetes, liver disease and obesity have a much greater impact. Among those aged 50 to 64, these conditions cause 1.5 times more deaths than in peer nations, leading to over 200,000 additional deaths per year. The analysis of years of life lost confirmed that chronic diseases are the dominant factor in the U.S. life expectancy gap.  

The Post also found that the gap in life expectancy between rich and poor Americans has widened dramatically. While income inequality has grown by 39% since 1980, the death rate gap expanded by 570%. In contrast, other wealthy nations showed little to no life expectancy disparity across income levels. 

“We did indeed find that the wealthiest counties in the US have lower life expectancy than the poorest in France and Canada and Japan. So while being poor in this country is really bad, even being wealthy is not great either,” Keating said. 

The investigation also highlighted the role of policy decisions, such as seat belt laws and cigarette taxes, in shaping health outcomes. Additionally, an analysis of $18 billion in food advertising revealed that fast food dominates marketing, while fresh produce is almost entirely absent. Garnering over two million page views, The Washington Post’s “Dying Early” series’ major findings all came from the Post’s own enterprise analysis combining U.S. and international death records, Census demographics, life expectancy estimates and data specific to particular issues and causes of death. 

The Post verified their findings by ground-truth reporting in areas where they found the largest death rate disparities and also confirmed with experts at Harvard, Yale, Syracuse, the University of Michigan, Georgetown and Rice University who have done research in the field. Their data analysis was checked thoroughly through code checks in SAS for accuracy and reproducibility by another data reporter. According to reporters on the series, this project was driven completely by data analysis. 

The silhouette of a man appears at the end of a dark hallway in front of a window.
Larnell Robinson, tenant council president, at the Rosemont Tower complex in Baltimore. Fifteen people over the age of 50 have fatally overdosed in the building since 2018. Jessica Gallagher / The Baltimore Banner

Data reveals overdoses are the leading cause of death in Baltimore

When The Baltimore Banner was launched in 2022, an investigative series about Baltimore’s opioid crisis was one of the first projects conceived. While many of Baltimore’s city and residents look at Baltimore’s homicide rate when looking at deaths in the city, The Baltimore Banner, along with the mayor, pointed out that overdose was the culprit that killed more people than homicide each year. Despite this, no Maryland organization had published anything that went beneath the surface of this Baltimore epidemic over the last 20 years. 

According to reporters, about the same time The Banner’s reporting was getting underway with their new data in hand, the New York Times accepted reporter Alissa Zhu into their Local Investigations Fellowship. Although the Fellowship typically only includes a single reporter, The Banner’s data reporter Nick Thieme and reporter/photographer Jessica Gallagher also worked on the project. The project was edited by editors from both The Banner and The Times. The stories took approximately 15 months to report, write and edit.

While getting data is usually difficult in investigative pieces, the Banner team was innovative, leveraging the people they had at their disposal. Thieme, also an adjunct journalism professor at Columbia University, was able to obtain CDC mortality data with his academic credentials that resulted in richer, less aggregated data sets that reporters could deep dive into. 

“By getting access to this ‘academics only’ data, we were able to get data about every single death in the United States, and that was how we were first able to show that Baltimore was experiencing the worst overdose crisis of any major American city in American history,” Little said. “But then it also allowed us to go this step forward and to do this, you know, age-period-cohort analysis to identify this one generation of Black men who had borne the brunt of overdose crisis for basically their entire lives.”

Because Maryland had not released autopsy data to the public like many other states, reporters in Maryland were never able to study or fully understand the death trends happening in their respective cities throughout Maryland. After a reporter’s public records request was denied, The Banner filed a lawsuit against the Maryland Medical Examiner’s Office and was able to win the litigation. The data they obtained ultimately created the foundation for reporting on the Baltimore opioid crisis.  

“We were able to get line by line, name by name, address of residence, address of incident, among other information,” said Ryan Little, data editor for The Baltimore Banner. “This is what allowed us to show that of the top 10 places where the most overdoses were happening, that many of them were senior homes, and types of places where black men born in their 50s and 60s would be.”

The Banner conducted an extensive and robust data analysis, compiling a unique dataset of every death in the U.S. from 1969 to 2022, including demographic details and causes of death. This allowed them to calculate death rates by age, race, sex, and location, making it one of the most comprehensive datasets of its kind. Using this data, The Banner found that Baltimore’s overdose crisis is the worst among major U.S. cities.  

The analysis revealed that Black men born between 1951 and 1970 consistently overdosed at the highest rates in Baltimore for decades, disproportionately driving the crisis. To confirm this, The Banner used advanced age-period-cohort modeling, reviewed by experts, to separate age, death year, and birth year effects. Their findings, supported by autopsy records and census data, highlighted the deep impact of inequality on overdose rates, particularly in senior housing and disadvantaged communities. This was investigative science as journalism.

“I think this is the type of data story that is really important because it really helps us understand something people really think they know, but maybe they don’t understand,” said Little. “I think when a lot of people think about the overdose crisis, they tend to think of white victims who live in rural areas, but what our reporting showed is that for many places in this country it’s not been that at all. And it’s really helped people re-understand a crisis and sort of see victims who might have been lost or might have been forgotten about otherwise.” 

Every statistic included in the published series was checked against CDC Wonder for accuracy, and the statistical models that were used in the articles were checked through best statistical practices and by making sure the inferences match with what reporters saw in the real world. The project employed a lot of statistical measures using packages such as DHARMa in R, to analyze the data and verify their findings. 

Scroll To Top