On AI and Analyzing Crime Data
I’ve been thinking about AI a lot lately and figured it was worth a post on the subject.
To be clear, every single word in this post and every post I’ve ever written was written by me — including all the em dashes and typo’s. The subjects are entirely thought of and chosen by me, the analysis is completely and fully my own, the pieces are organized entirely by me, and I don’t rely on AI for any sort of analytic brainstorming.
This newsletter is free of AI, creatively and analytically speaking. The drawbacks of AI are numerous and I’m exceedingly wary of the tool replacing creative endeavors with soulless slop (not to mention the environmental impacts and numerous concerns surrounding data centers popping up everywhere). I find AI-produced videos to be way more on the creepy than cool side of the ledge.
. AI produces work in all fields that is sometimes incorrect and hallucinations are a terrifying proposition for a data analyst. It takes a level of expertise to understand and always be on the lookout for such mistakes. For me, it is neither an all-encompassing magic machine nor is it an incompetent mistake factory.
Yet, at the same time, AI has been revolutionary for my analytic work process, especially over the last few months
To see how I’ve used it, consider the case of Roodhouse, Illinois.
Roodhouse is a tiny town of 1,500 people in southwest Illinois that was founded in 1850 by John Roodhouse. It’s a tiny town that doesn’t have much crime and rarely reports anything to the FBI. Roodhouse reported 11 thefts, 7 aggravated assaults, 4 rapes, 1 motor vehicle theft, and no robberies or murders in 2018 which just so happens to be the most recent year with complete data reported to the FBI from that town.
A funny thing happened in 2017 though when Roodhouse reported an astounding 486 murders to the FBI. At least that’s what Jacob Kaplan’s excellent website shows. The thing is, though, that neither the FBI’s old UCR website nor the Crime Data Explorer show 486 murders in Roodhouse in 2017. What’s more, the state of Illinois records no data for Roodhouse in its 2017 report on Illinois crime.
So, was this an error in Jacob Kaplan’s data? Or did the FBI collect something that didn’t make it into the official reporting?
Enter my good friend Claude.
To unravel the mystery of Roodhouse, I grabbed the FBI’s Return A master file for 2017 from the CDE’s download section. I’m not a coder and the FBI warns on the CDE that “Master files for each collection are fixed-length, ASCII text format, compressed using WinZip software, and require some programming knowledge to extract the data.” To overcome this problem I fed it the FBI’s help file which is a scanned in PDF from 1990 that explains how to properly code each column of the dataset.
As I said, I’m not a coder, but Claude is, and Claude was able to easily decipher the 15-page help file to produce a very usable CSV. From there, I filter down to Roodhouse and see very clearly that 486 murders were reported in Roodhouse in November 2017.
Obviously Jacob Kaplan’s dataset isn’t responsible for the error and I feel foolish for even considering the idea it might have. Somewhere along the way there were 486 murders reported to the FBI for Roodhouse for November 2017 though that total doesn’t show up in any of the more easily accessible FBI repositories for unknown reasons. Whether it was audited out or included in the national estimates I can’t say for sure, though my guess is that it’s the latter.
Either way, this datapoint is a mystery that proves again the imperfection of crime data, and it’s a mystery that was solved for me quickly and easily thanks in no small part to AI.
Later, I fed Claude the CDE’s help file for parsing the Return A file and other CDE files, so now I can just ask for any ASCII file on the Crime Data Explorer and it’ll convert it to an easy to use CSV.
I’ve used AI in a myriad of other ways that make my analysis more efficient and effective. The shooting tracker dashboard I was talking about a few weeks ago was built by Claude. It goes out and grabs daily, weekly, and monthly shooting data from 30 agencies that make it readily accessible to evaluate shooting trends nationwide.
Shooting data is really poorly kept, so this dashboard is a way of attempting to make it more available and easier to parse. The tracker calls for more than two dozen scrapers that get each agency’s data and can tell when datasets are missing or there are other errors.
Chicago publishes a daily table of victim-level shooting data which is easy to scrape and count. Detroit publishes an aggregated total of non-fatal shooting victims every week in a PDF that usually has a predictable naming convention. Buffalo comes from GIVE’s Tableau dashboard that the scraper has to manipulate in order to grab agency-level data.
I am not skilled enough to build these scrapers on my own, but Claude and I did it together in a few days.
AI is also immensely valuable for the back end of the Real-Time Crime Index. What used to take days can now be done in a few seconds as we scrape data from hundreds of disparate data sources, compile it all together, and produce a report that can be manually audited to identify and remove outliers and data anomalies.
The AI makes mistakes, of course. I almost certainly don’t need to tell you this, but the AI makes mistakes. The code is not perfect, but I’m also not shipping out software to be used by millions of people so it doesn’t really need to be for my purposes. The data from scrapers do need to be as close to perfect as possible which means a lot of back and forth to ensure perfection.
My favorite error that Claude made was when I built an app for fun to evaluate residential real estate trends based on data from Realtor.com and Zillow.com.
At first there was a list of each state on the main page that a user could select, but I asked Claude to make that a clickable map. Claude decided to draw the states from memory rather than using an actual pre-existing map of the United States (of which there are a few). It did about as well as I would have drawing each state from memory (not complimentary).
There are a myriad of other projects that I’ve used AI to help make the analytic process more efficient (including one project that I’m not quite ready to announce yet but will in the next few weeks).
In my opinion, the AI isn’t super useful as a way to replace my expertise, but it is super useful because I have my subject matter expertise. I know what crime data tasks are repetitive and ripe for automation, I know what kind of data can produce impactful graphics if it can be reached effectively, and — arguably most importantly — I know the data well enough to be able to identify and correct mistakes when the AI makes them.
This last point is critical to the responsible use of this technology to improve my analytic workflow. I know how NIBRS data is structured, when a drop in crime may just be due to systemic underreporting, and why you can’t just use aggravated assaults with a firearm as an apples-to-apples substitute for shooting victims.
I know how to write and analyze datasets that are frequently flawed. I don’t need Claude to do that. It is a skill based on years of hard work, ugly graphics, and draft analyses that didn’t go anywhere to understand what is and is not analytically useful.
But AI can help with quickly and easily making exceedingly complicated datasets accessible in ways that I simply cannot do. It can easily build good looking visualizations that make communicating that data much easier. The code doesn't need to be flawless if the tools work, can be successfully audited thanks to my expertise, and produce enormous efficiency gains for my process.
Ben Casselman of The New York Times had a great piece recently about the increasing ubiquity of AI among economists, and his thread on the piece on Bluesky had this nugget that I found resonated with my use of AI:
AI is a useful tool for my everyday work, not unlike Microsoft Excel, Gmail, and Slack. It is not some deity-level work thing that will replace my expertise, but rather I see it as as a strong compliment to my expertise. Nothing I work with in the world of crime data is perfect, but having a tool that dramatically improves my efficiency as an analyst and data visualizer is incredibly helpful and that’s the itch that AI scratches for me.
New on the Jeff-alytics Podcast
Ken Dilanian is a seasoned journalist covering the Justice Department and FBI, and in this episode he shares his insights on the evolving landscape of covering those agencies. We talk about crime data, the challenges of media coverage in 2026, the impact of political shifts on justice institutions, and a whole lot more in this jam-packed conversation.
There are few journalists in the country with a better front row seat to the Justice Department in 2026 and Ken paints a fascinating picture of what it takes to cover it.
And while you’re here, be sure to check out these other recent great episodes:
New Orleans Mayor Helena Moreno
Researcher and Former Crime Analyst Carlee Ruiz
Council on Criminal Justice President Adam Gelb
Baltimore Mayor Brandon Scott







I find some of the biggest AI-haters have simply not updated their views of AI since it first came out. Claude-pro is way better than ChatGPT free, and a whole world away from the free versions of a few years back. And the tool-apps that are built on top of that (Claude-code) for example, are really game-changing.
That's not to deny that there are downsides and risks, but people saying "it can't do X or hallucinations mean it can't do Y" are often just out-of-date or letting their hate of data centers or something else get in their way of assessing the reality of what AI can currently do.