I really like baseball. I find the game calming to watch with a good mixture of random chaos sprinkled in for fun every once in a while.
Also there is data. Lots and lots of data.
Baseball has undergone a data revolution in the 21st Century, starting with Michael Lewis’ seminal book Moneyball and continuing into what is now sometimes called the Statcast Era (the Athletic piece linked here is an excellent history lesson).
The idea of Statcast was born over a decade ago, spawned from a system that put a bunch of cameras set up in every MLB stadium to measure what happens on every pitch. Statcast, which was introduced into every MLB ballpark in 2015, took things a step farther by measuring everything that happens to the ball after it is hit as well as every movement from the players on the field.
All of this new knowledge has revolutionized baseball decision-making, but it only came about after a substantial dedication of thought, research, planning, and funding.
Crime data will never reach that level of speed and precision, but the US is in need of a similar direction to better measure, understand, and respond to crime trends.
To be clear, the data collection infrastructure needs of national crime data are substantially more complex than those of MLB. And while a camera and radar can be used to capture the movements of a ball and players (vastly oversimplifying), the reporting of crime data requires an error-prone human's touch with only limited opportunities for automation.
But the infrastructure for reporting crime data could use major advancements, in the spirit of Statcast, which would go a long way. This was the major theme of a recent report from the Council on Criminal Justice's Crime Trends Working Group (full disclosure I was on the working group).
Originally headed by the wonderful Rick Rosenfeld until his passing (it is hard to overstate just what a giant in the field Rick was and how impactful his insights on all things crime data have been to me and many others), the committee set out to make recommendations for improving the country's crime data infrastructure (John Roman, who is also great, took over for Rick).
You can read the report that came out a few weeks ago here, but I wanted to highlight three standout ideas from the report, specifically as they relate to the feasibility of a Statcast Era for crime data.
Timeliness
Tons of information is available about an MLB at bat almost immediately after the pitch is thrown. Crime data for 2024, by contrast, won’t be formally released by the FBI until October 2025. There are workarounds, such as the Real-Time Crime Index that we’re hopefully launching soon, but anything resembling a Statcast Era for crime data would require faster, more reliable data reporting from more agencies to better understand crime trends as they develop.
Some states do have the infrastructure in place to report data quickly, but others do not. The issue was summed up beautifully in an email I got from someone with a state UCR program as to why monthly crime data with a short (45 day) lag isn’t possible. They wrote that their state:
"is not in a position at this time to be able to provide data on a monthly basis. We are working towards quarterly publishing (a quarter behind schedule) but the way our system currently works, agencies only submit once per month (the target is by the middle of the following month). I know some states are entirely on XML and receive incidents in close-to real time, but (the state) doesn’t have that type of submission structure in place. After an agency submits their flat monthly file, it processes in our system and then they can see any errors, correct them in their system, and submit the corrections in the following month. For example, a January file would be submitted (at best) by mid-February, and then the errors and additions that need to be made to January wouldn’t get sent until February’s file (at best, submitted by mid-March). I would have major concerns about the accuracy and completeness of data if we were to provide it monthly with only a one-month lag. The timeline of that example is the absolute best-case scenario in terms of agencies submitting by the middle of the following month."
There are two ways for agencies to submit data up the chain, and Record Management System provider CivicEye sums up the differences between flat file and XML data submissions. They wrote:
Another challenge of NIBRS reporting is the use of two different formats: flat file and IEPD/XML. The flat file format is an older version that is being phased out by the newer IEPD/XML format. This is because the XML format provides more data and a better, more flexible, and more organized format compared to the Flat File format which is just rows of letters and numbers.
However, when states decide to upgrade from Flat File to XML, agencies are required to comply and rush to make the transition. While the cost of upgrading from Flat File to XML is not as high as upgrading from SRS to NIBRS, it still requires months of development work for most RMS (Record Management System) providers.
More, faster data reporting would require more states to invest in data reporting infrastructure that mimics the states that report quickly. Transitioning all states to XML submissions standards would be a critical step in this endeavor. These are not easy investments for states to make and would require significant funding and guidance from the Federal government to implement.
Accuracy
The second finding from the CCJ report concerns the accuracy of crime data and the the challenge of quality control. There are all sorts of examples in this newsletter of not fully correct crime data being reported at all stages of the reporting process.
The accuracy challenges get even harder with the NIBRS transition. Agencies must capture substantially more data points related to every offense which means officers require training to ensure proper documentation is being carried out. The CCJ report says:
Other law enforcement officials shared concerns about the quality of data submitted by patrol officers and the difficulty of securing staff and resources for quality assurance and data infrastructure at a time when department leaders and elected officials were eager to put more officers on the street. Data gleaned from focus groups and a survey conducted with state UCR program managers indicated that the overwhelming majority of programs lacked the staff or resources to provide much quality control assistance to local law enforcement agencies.
Improving the data reporting infrastructure is fairly useless without also improving quality control. Again, this would require that more resources are provided to local agencies and state programs, even beyond what is currently available. Even the allocation of modest resources towards data improvements would go a long way in a lot of places.
In addition, this is an area that may be ripe for machine learning and AI to have an impact. But those tools bring their own challenges and limitations, and the last thing that crime data needs are tools that add even more uncertainty to an uncertain field.
Completeness
The final part of the report that I’d like to call out concerns the lack of completeness with respect to NIBRS compliance. According to BJS, 82 percent of the country’s population is covered by a NIBRS reporting agency as of May 2024 which is up substantially from around 36 percent in 2015. CCJ reports that:
States that successfully transitioned to NIBRS, in general, provided adequate funding to their state UCR programs and used federal grants to provide local agencies with technical assistance and innovative tools such as a low- or no-cost records management system. Many of the states that transitioned to NIBRS also enacted statutes that required local law enforcement agencies to submit incident-level data on a timely basis. States that had been less successful in the transition to NIBRS typically underfunded state UCR programs, failed to help local law enforcement agencies navigate a sometimes confusing software vendor landscape, and lacked statutes mandating NIBRS reporting.
Some states have had tons of success transitioning to NIBRS (Texas is at 99 percent coverage) while others like Florida and Pennsylvania (42 and 43 percent respectively) have not. Part of the issue going forward is that the NIBRS transition is increasingly becoming a small agency problem.
Fixing NIBRS compliance to get that last 18 percent of the US population is critical for a truly effective Statcast Era of crime data, but such a fix would require resources and effort. The report highlights the potential of workarounds though. A sample of several hundred agencies can be successful for predicting national crime trends:
In 2012, BJS funded the creation of the National Crime Statistics Exchange (NCS-X)33 to explore whether it was possible to use NIBRS data to produce accurate, national estimates of reported crime that included incident-level details and characteristics. Working closely with the FBI, BJS determined that it was possible to produce accurate results on crime trends from a sample of 400 carefully selected jurisdictions.34 A joint effort was launched to help these 400 agencies transition to NIBRS with the ultimate goal of constructing a weighted sample that would allow for the generation of nationally representative estimates of crime trends.
So sampling lots of cities works for understanding national crime trends! This is good to remember as we work towards more complete crime data reporting.
To A Statcast Era
Crime data will never match the precision and promptness of baseball's Statcast system. Max Scherzer throws a fastball and we can measure everything about it immediately. James Wood hits a double and we can know the odds that the left fielder should have caught it immediately. But it may take an officer a while to write a report, or investigation of a suspicious death 911 call takes a few days to be classified as a murder. The immediacy simply can’t be anywhere near baseball with crime data, but that doesn't mean that we shouldn't strive for improvements in that direction.
Crime data can strive to be faster, more accurate and more complete, but the process would require substantially more resources than what it cost MLB to implement Statcast.
Unfortunately, crime data funding is wholly inadequate (as discussed at depth in the CCJ report), so reaching achieving better results would require a much bigger commitment at the state and (especially) Federal level.
Some people would argue though that understanding why murder is rising or falling and whether crime reduction policies are working is significantly more important than evaluating the movement on Paul Skenes’ fastball or the exit velocity and launch angle of the game winning homer that James Wood hit yesterday (not me, but some people with better priorities would).
Getting better, faster crime data is possible, but it won't happen without substantially more resources dedicated to the problem. The vision of a Statcast Era of crime data is a worthy goal, and hopefully the recent CCJ report outlines some of the hurdles and steps needed to head in that direction.
Are these recommendations something that can be easily passed over to the crime stat collectors in a big American city? For example, I’m in MA and Boston is having a world-beating year for murders (in a good way! Very few!) but there hasn’t been much abt WHY this is happening. It could be good policing, it could be pure dumb luck, it could be that rising property values pushing crime into lower-property-value suburbs.
Hi Jeff: Form my article today on https://crimeinamerica.net:
"The Vast Majority Of Crime Is Not Reported To Law Enforcement
Per the Bureau of Justice Statistics of the US Department of Justice, 42 percent of violent crimes are reported to law enforcement. Thirty-two percent of property crimes are reported to the police.
Twenty-six million Americans were victimized by identity theft in 2016 and only seven percent of victims called the police according to the Bureau of Justice Statistics.
74 percent of violent victimizations against juveniles were not reported to the police per the Office of Juvenile Justice And Delinquency Prevention of the USDOJ.
4,000 police agencies did not participate in crime reporting to the FBI in 2023.
The effort to get law enforcement agencies to fully use the FBI’s new National Incident-Based Reporting System remains a problem.
Murder is down 26 percent in the FBI's first quarter preliminary-unofficial statistics of 2024 but that’s a reflection that urban homicides increased by 50 percent in the cities measured (2019-2022) per the Major Cities Chiefs Association. Crime statistics cannot grow by those percentages without substantially declining in subsequent years regardless of interventions. It’s always been that way.
Cops are tired of making arrests and not seeing offenders prosecuted or held in jail pretrial, so they don't.
Arrests and crimes solved have plummeted during recent decades.
Thousands of police officers have left the job and sometimes there are long waits for an officer to arrive at a crime scene and the complainant gives up and disengages. There are multiple reports of cities having hundreds of police officers down from authorized levels. Two US Department of Justice agencies call the lack of cops in cities a crisis.
Crime-ridden Oakland has just 35 officers on patrol across the city at any given moment, the police department has admitted. The admission by the Oakland Police Department came after local news station KTVU asked them how it took cops 48 hours to respond to a July 4 shooting at an apartment complex during which residents reported hearing over 100 gunshots.
When responding to burglar alarms, smaller cities also commonly receive a faster response time, while medium cities “take an average of 40 minutes,” says Don Chon, a professor of criminology at Auburn University in Montgomery, Alabama. In larger cities it could take several hours for police to respond, if they respond at all.
What All This Means
First, this means the overwhelming majority of what we call crime is not reflected in FBI or local crime statistics. There are reasons for not reporting crimes or suspicious activities. That means you would have to have SUBSTANTIAL reductions or increases in crime for the data to be meaningful.
Second, if justice system employees feel that cities are being less than honest as to their crime statistics, some examples (like yours) provide credibility."
All of this doesn't begin to touch the huge increase in violent crime per the National Crime Victimization Survey, a wasted opportunity to investigate why the increase took place because everyone seems intent on ignoring it.
Finally, we have significant reductions in police staffing. If a major city like Oakland, CA has 35 officers per shift, taking crime reports is not high on the list of priorities. Philadelphia is down over 1,000 officers. Cities want officers to be available for makor breaking events thus recording of crimes takes a distant second place.
AI
To get an accurate and prompt analysis of local crime, it will take a complex undertaking involving AI taking in all possible variables.
We would never make pronouncements in the medical world based on a fraction of the available data. Businesses invest billions in understanding as many metrics as possible. Why don't we????
It will take an AI understanding of crime survey data (Including Gallup) and unreported crimes and police staffing to give us the metrics we need. But at the moment, we are struggling with a sole focus on REPORTED crime which is a very small subset of total crime that is hampered by police staffing.
As a former police officer, I was told to respond quickly to crimes regardless of the events that had taken place immediately before a significant need for me to be somewhere else. Superiors didn't care about the status of my paperwork, they wanted me to be on the scene of a crime or major accident in progress.
I'm convinced that there will never be a truly accurate account of local or national crimes. Reported crime may be instructive and give us indications but no serious medical researcher would tell us that there has been a five percent increase or decrease in what they are studying without including endless competing variables that sway the data one way or another.
I just read a story on mammograms as being insufficient in establishing many forms of breast cancer and the need to switch to MRIs. Yet we continue tend to push mammograms. A more complete understanding of the data was necessary to move us onto the right path.
AI should be able to analyze all of the independent variables and give us a far more precise understanding of crime. We want cops to complete more reports and the NIBRS requires more attention to detail which is wonderful "if" police officers weren't madly running from call to call .
BJS has the data they need to concurrently analyze crime for SMSAs.
So the FBI reports 11,000 hate crimes and the BJS reports an average of 250,000. We make pronouncements on hate crimes based on the 11,000 which strikes me as potentially and ethically wrong.
Only AI can create an analysis of all forms of crime and crime reporting and create a somewhat accurate picture of crime both locally and nationally. Yes, it would take a massive undertaking and the funding necessary.
But for the moment, telling me that crime has increased or decreased five percent, while instructive, is open to so many errors that for the sake of policy, it tends to be of limited use.
My opinions. Best, Len.