How The FBI Estimates Missing Data And Why You Should Keep Ignoring 2021
The FBI's estimation methodology isn't well suited for estimating tons of missing data.
Last week I wrote about how revisions are normal (though the revisions in 2022 and 2023 were abnormally large for unknown reasons) and that the revisions to 2021's estimates were substantially larger than the ones to 2022. Since then I’ve been thinking about one of my favorite Simpsons exchanges. In it, Springfield has added a second area code and Homer is mad that they got no warning about the change. Lenny and Carl try to set him straight, alas…
The same could be said about the 2021 national estimates of reported crime which have been in the news recently because they were recently revised downward giving the appearance that *actually violent crime rose a bit two years ago!*.
But nothing has actually changed because the 2021 national crime estimates were flawed when published two years ago and they’re arguably even more flawed now for reasons I try to describe below. Fortunately, the flaws that occurred in 2021 are specific to that year’s estimates and the substantially decline in participation that occurred due to the NIBRS transition. Our understanding of crime trends in 2023 and 2024 is unimpacted.
The 2021 national crime estimates have never been great. I wrote about this in May 2022 before they were initially published and I wrote about it in October 2022 after they were published. I've written about it many times since, usually while explaining that the very real issues with 2021 data haven't carried on to 2022 and beyond. The problem is that the NIBRS switch meant that the share of the US population covered by an agency reporting crime data fell from roughly 95 percent in an average year to 65 percent in 2021.
To fill that gap, the FBI and BJS undertook an estimation procedure designed to account for the missing agencies. So many agencies did not report data that confidence intervals were needed for the first time. BJS and FBI wrote:
Historically, no confidence intervals have been needed for SRS-based crime data. When the converted data of agencies that submitted to NIBRS was combined with the data of agencies that submitted to SRS, approximately 95 percent of the population was covered. Therefore, even though estimation was used to account for the small portion of missing crime reports, no confidence intervals were needed because the change to the estimates would be negligible if the values for agencies covering the remaining 5 percent of the population were known.
Statistical weights are designed in such a way that reporting agencies represent nonreporting agencies who have similar agency characteristics, such as agency size and agency type. Furthermore, different statistical weights are created for different geographic levels of estimation because the distribution of nonreporting agencies throughout the United States varies by state and region.
The FBI and BJS originally estimated that there were 1,313,200 violent crimes in 2021 with a lower bound of 1,223,400 and an upper bound of 1,402,900. That’s quite a range of possibilities, though the middle of the range suggests violent crime rose in 2021 which matches what city data tells us. I have repeatedly said that the 2021 data requires an asterisk which is fine because uncertainty is a part of life and what happened in 2021 has no impact on our larger understanding of national crime trends.
The 2021 estimates require an asterisk and should be ignored, and analyzing crime trends shouldn't rely on them. Moreover, the 2021 national crime estimates that were revised in 2024 are probably worse now than they were when initially published by FBI & BJS two years ago.
It’s not fully clear how the FBI reached its most recent 2021 national crime estimates given how much reporting remains missing. The FBI's standard estimation procedure for filling in missing data usually is not a big deal for understanding national crime trends because participation is usually nearly complete, but applying that procedure to 2021 would likely produce an undercount as I’ll show.
Fortunately, the problems with estimating national crime estimates are specific to 2021 and the NIBRS switch — the 2022 and 2023 don’t suffer from similar issues thanks to improved participation. This is plainly seen in the below table showing how little of recent year reporting relies on estimates outside of 2021.
Estimating Missing Data
Around 19,000 agencies report millions of crimes to the FBI each year. Sometimes an agency won’t report crimes for a plethora of potential reasons, but usually around 95ish percent of the nation’s population is covered by a law enforcement agency that successfully reported data to the FBI that year.
When an agency doesn't report, or doesn't report fully, the FBI employs an estimation procedure that they spell out below:
Because not all law enforcement agencies provide data for complete reporting periods, the FBI includes estimated crime numbers in these presentations. The FBI computes estimates for participating agencies that do not provide 12 months of complete data. For agencies supplying 3 to 11 months of data, the national UCR Program estimates for the missing data by following a standard estimation procedure using the data provided by the agency. If an agency has supplied less than 3 months of data, the FBI computes estimates by using the known crime figures of similar areas within a state and assigning the same proportion of crime volumes to nonreporting agencies.
In other words, if an agency submits between 3 and 11 months of data then they fill in the gaps evenly. If an agency reports 20 crimes in 4 months then that figures is tripled to be estimated at 60, for example. Agencies with 1 or 2 months of data are treated as non-reporters. If an agency does not report then the FBI computes the crime rate for agencies of similar population groups (cities of 250k+, cities of 100 to 250k, etc) within that agency’s state and imputes a total for use in the national estimates from there.
The bolded and italicized part is important because it differs from how the initial 2021 national estimates were prepared. Those initial national estimates, done by BJS and FBI, imputed the totals of nonreporting agencies for the national estimates by comparing similar sized cities regardless of the state.
It’s not really a big issue when Tucson, Arizona and Orlando, Florida are the only big cities of 250,000 or more that didn’t report any data in a year — such as in 2023. Will the estimates be perfect? No. Is perfect the enemy of good enough? Yes. Will the estimates accurately portray crime trends? Also, yes.
But using this methodology to impute crime trends for 2021 will likely lead to substantially lower crime estimates, especially in places like California, Florida and Pennsylvania where only a small share of agencies have submitted 2021 data to the FBI (some other states like Louisiana have largely caught up by submitting 2021 data after the fact).
It’s not clear whether the FBI actually employed its stated methodology when computing the 2021 national crime estimates the last two years. The national violent crime rate went from 1,313,200 with a wide margin of error when initially published by BJS and FBI to 1,253,716 when the 2022 estimates were published last year to 1,197,930 when the 2023 estimates were published last month.
The FBI writes in the methodology section of its 2022 and 2023 Crime in the United States reports that: “Because SRS data was not directly collected by the FBI, in order to compile estimates for 2021, the FBI has since gathered a sampling of 2021 SRS data to augment the information collected via NIBRS and compile reliable estimates.”
“Gathered a sampling” and “compile” are somewhat vague terms, though, and a deeper dive shows that more data is not inherently better than just sticking with the initial estimate in this specific situation.
Coverage of 2021 has improved from 65 percent nationally when the 2021 estimates were initially published by FBI and BJS to about 74 percent today. But more cities doesn't mean the estimates are better.
In California, only San Diego County reported anything to the FBI under NIBRS in 2021 while the state’s coverage of 2021 has grown to around 11 million people today (nearly 30 percent of the state’s population) as some cities have added 2021 data over the last two years.
Critically, however, California’s heaviest crime cities like Los Angeles and Oakland have still not reported 2021 data to the FBI. So to impute the Los Angeles count of violent crimes for use in a national estimate you would find the combined violent crime rate of the five cities in California in LA’s population group (250k+) that have reported (San Diego, Fresno, Anaheim, Stockton, and Chula Vista) then multiply by LA’s population (divided by 100k).
This is fine for estimating missing data if you have reporting from nearly all of a state’s population, but only a quarter of the population that lives in Californian cities of 250,000 or more would be determining the rate for the other 75 percent. And the cities that have reported tend to have lower violent crime rates than the rest of California in every population group (things get weird with the cities under 10k group so they aren’t included below).
The actual rate in 2021, based on offenses reported separately by the California DOJ, would be substantially higher for each population group than the rate you’d get in 2021 using the FBI’s stated imputation methodology.
So the imputed violent crime rate for a California city of 150,000 would assume its rate was 255.5 per 100k while its actual violent crime rate, on average, would be much higher. The same problem would apply to other large states like New York (currently at 22 percent participation for 2021), Florida (also 22 percent) and Pennsylvania (49 percent).
Of course we don’t know how many agencies were sampled or when (and whether sampling means applying all SRS agencies that belatedly reported for 2021 or just some). That limits our ability to know the degree to which the standard imputing methodology would systemically hamper any new 2021 estimates and whether different sampling was applied to 2021’s estimates in the 2022 and 2023 reports to produce substantial declines from the initial BJS and FBI estimates.
What we can assess pretty clearly is that the 2021 estimate for violent crime is almost certainly too low. About 920,000 violent crimes for 2021 have been reported to the FBI from agencies covering about 245 million people as of the 2023 report. Chicago reported only 6 months in 2021 so I doubled that agency’s total as the FBI methodology spells out. I didn’t do the same for other partial reporters, but it's sufficient to know that a bunch are out there making the reported total almost certainly an undercount of what the FBI would estimate for those agencies by several thousand violent crimes. Violent crime was up 0.5 percent from 2020 to 2021 in the agencies that followed the rules and reported.
I went out and grabbed 2021 violent crime counts from the state UCR programs for 35 mostly large agencies that made it easy to grab (I would’ve done more but didn’t have the time). Those agencies, which are reproduced below for the sake of transparency, accounted for about 176,000 violent crimes from around 30 million people. Violent crime was up 7 percent from 2020 to 2021 in these agencies which makes sense given how much New York and Los Angeles would dominate this subset (and violent crime rose in both places in 2021).
That leaves about 8,000ish agencies covering about 60 million people left to fill in and about 1.1 million violent crimes already recorded. In order to reach the most recent 2021 estimate supplied by the FBI then those agencies would need to account for a little less than 100,000 violent crimes. In 2020, however, those same 8,000 agencies reported more than 150,000 violent crimes, so violent crime would have to have fallen by 34 percent in this subset of agencies to match the FBI’s 2021 estimate.
It seems reasonable to assume that violent crime in the remaining 8,000 agencies largely matched the change in the places we do have data, suggesting reported violent crime was up a bit in 2021 relative to 2022.
Yet the revised 2021 estimate currently shows a very large decline from 2020 to 2021 followed by a slight increase in 2022. Robert VerBruggen laid out nicely how the violent crime rate has changed through the revisions showing how it primarily reflects changes to 2021. That zig zag almost certainly didn't happen though it's impossible to know for sure exactly how many violent crimes the current estimate is undercounting right now because of the vagueness of the methodology and an inability to completely recreate those estimates.
Fortunately, the year is 2024 and the problems that plagued 2021’s estimates are not an issue for our understanding of crime trends in 2022, 2023 and 2024. The 2021 crime estimates are about as relevant as an argument about whether the Yankees were any good in 2021 (doesn't matter, they're in the World Series now).
Normal levels of participation in 2020, 2022 and 2023 help paint a story of US crime trends that doesn’t rely on the faulty 2021 data. The 2021 estimates were deeply flawed and should largely be ignored. That was true in 2022, it was true in 2023, and it’s true in 2024.
It would be great if those estimates were better, and it’d be great if we had more insight into how the revised 2021 estimates were derived over the last two years. Somehow, 9 percent of the country’s population reported 2021 data to the FBI — none of which was NYPD or LAPD — which decreased violent crime in 2021 from over 1.3 million (with a healthy 95 percent confidence interval) two years ago to under 1.2 million (outside of said confidence interval) now.
It’s certainly valid to want to understand why this change occurred and to hope for better transparency to help communicate the issue. A discussion on methodological transparency to better establish if/why the standard methodology was used for 2021 would be great. It's frustrating! Welcome to the world of analyzing crime data from the outside!
The exact 2021 national crime rates are simply not knowable with any confidence. But the issue only impacts 2021, we were all supposed to ignore 2021’s estimates anyway, it is now 2024, and we have a multitude of sources helping to articulate last year and this year’s crime trends which are fortunately very clear.
To paraphrase Hanlon's razor, never ascribe to malice that which can be adequately explained by poor crime data.
There's no conspiracy, there's no attempt to deceive, there are not unprecedented stealth changes being suddenly made, the FBI didn’t suddenly “find” a ton of crime. There's simply a methodology that’s poorly suited to lots of uncertainty being unclearly applied to a flawed year of crime data producing flawed, frustrating, uncertain results for that year and that year alone.
Hi Jeff,
Thought you might find our recent article on the FBI "stealth edits" hiding a "crime surge," and how this now viral narrative originated from a single source: the discredited John Lott: https://armedwithreason.substack.com/p/the-weaponization-of-crime-data-disinformation
Best,
Devin
Hi Jeff: Sorry for asking questions you probably answered, but what percentage of law enforcement agencies are submitting full year crime statistics through the NIBRS for 2023-2024? How confident are you that their estimates (for those not reporting full year data) are accurate? Do they consider unreported crimes in their estimates (per BJS data on non-reporting)?
There seems to be a huge difference (depending on the source) on the number of crimes recorded via NIBRS compared to the previous SRS. Doesn't it seem inevitable that crime numbers should rise if most law enforcement agencies are now reporting a greater number of crimes via full year data via the NIBRS?
I apologize for asking so many questions but various sources seem to provide conflicting information.
Thanks, Len.