I'm Building A Crime Data Assistant

Give it a look.

Jun 01, 2026

So, I'm building a new thing and you're invited to test it out, if you want. It is a chat bot designed to answer crime data questions to the best of its abilities and it’s available here at NIBRSAI.com (for the time being). I’m calling it the Judicious Explorer of Federal Files or J.E.F.F. (kidding! Please don’t call it that).

A crime data chat bot is something that we on the Real-Time Crime Index team have been kicking around for the better part of the last year with the idea of making crime data accessible via natural language query. It has been tested with an ever growing circle of friends, confidantes, and LinkedIn followers, and now I want to share it with you, my dutiful Substack readers.

The assistant was built with Claude and uses Claude Haiku to generate responses. It has been a labor of love and is still very much a work in progress, but it handles most queries exceptionally well.

Let’s say you want to be able to compare the number of murders per year in Los Angeles and to New York City’s murder count since 1980. Normally, doing so requires grabbing multiple spreadsheets and comparing. Now you can type ‘LAPD vs NYPD murders since 1980’ and get exactly that data in both chart and graph form.

I think it’s a really cool and natural way to consume crime data, but it needs users to help calibrate and polish it.

Do you want to know how thefts are trending right now in Omaha? Just ask. All the data on current crime trends for every agency currently in the Real-Time Crime Index is discoverable through the assistant. It should return monthly and rolling 12 month theft counts for Omaha through March 2026 right now.

The assistant also has historical crime, clearance, and staffing data some of which goes all the way back to 1930. Want to know how many NYPD officers there were in the 1950s? Just ask.

We worked with Jacob Kaplan to scrape all of the FBI’s old PDFs of pre-1960 crime data to create the most full collection of historic crime data available anywhere (a big announcement on this is coming in a week or two!). You can search at an agency level for crime, clearance, and staffing data through 2025 — which uses raw FBI data over that span. There is also national-level data on crime and clearances from 1960 through 2024 which will be updated through 2025 when the FBI formally publishes that data in a few months.

That’s not all though!

I’ve worked to teach NIBRS to the chat bot. This involved training it on the NIBRS manual and lots of discussion about the quirks of the program’s reporting. The whole project actually started with trying to build a chat bot that could make it easy to query how many carjackings are being reported. Carjackings haven’t been formally reported in FBI data historically, but you can calculate carjackings in NIBRS by taking robbery incidents where a vehicle was stolen.

There are more than a billion rows of NIBRS master files data dating back to 1991 and it’s all there for the querying. Of course, you have to be careful when using NIBRS data. NIBRS participation didn't become widespread until after 2021, so you can't always compare totals over time, and the master files are not inherently 100 percent complete (more on that in a sec).

That said, if you want to know which big Texas cities had the highest rate of carjackings in 2025, just ask!

The final piece of available data in the chat bat is the Supplementary Homicide Reports which stretch back to 1976 and give detail on victims, offenders, weapons, circumstance, and more. Want to know which 10 cities had the highest share of 2025 murders come via firearm (min 30 murders), just ask!

What about deeper analysis? You may want to know why murder is falling in the United States? You can ask, but the bot will give you an appropriately noncommittal answer like the one below. The bot is designed to retrieve data, but it is explicitly designed to not be an analyst. It is not there to tell you why trends are happening or extrapolate what they mean. I trust AI to retrieve data, even from complex datasets, but I don't trust AI to explain what it all means.

I also don’t want it riffing on what places are becoming “safer” or “more dangerous”, loaded terms with unclear definitions.

It can also only tell you what is available and sometimes there is simply no data because an agency has failed to report proper data to the FBI. This is especially true pre-1960, but there are other large missing portions of crime data in recent years. There is a good deal of missingness in the NIBRS data, both because most agencies only joined the program in the last few years and because not every offense gets added to the master files which populate the bot.

An agency may report 42 murders in a given year and the NIBRS master files may have records for all 42 or it may only have details on 38 of them, that’s just a fact of life with crime data and not something that can be fixed. Usually these differences are small, but sometimes they’re not so small. The SHR is another dataset that has been flawed at times, usually because Florida and Alabama haven’t always reported to the program.

And, finally, the bot can only give data on the subjects it has access to. You’re welcome to ask it about prosecution data or 911 data or immigration enforcement data or fantasy football data, but none of that is in the Uniform Crime Report and none of it is in the crime data assistant.

I’ve tried to use my knowledge of crime data to make it as close to perfect as I can, but the thing is that the assistant is not perfect. In fact, that’s why I want you using it. It does really well most of the time, but I want to know where the edge cases may be that need strengthening. I know how I would want (and have started) to use a tool like this, but I don’t know how other potential users would want to use it.

It is not supposed to guess, and it has been told to say it doesn’t know rather than guess or hallucinate an answer. There is always the chance, however, that there are hallucinations still in the system and I’d like to figure out what causes them so that they can be eliminated as fully as possible.

The crime data assistant has been trained to query a database and return data. If it fails to find the proper data or is asked a question it can't answer then it should say so. You may ask it difficult questions that weren’t foreseeable which it can study and learn from. Hopefully the chat bot answers correctly, but any potential failures will inform an improved crime data assistant long into the future.

Can it handle traffic? Can it handle edge cases? Did I just waste a ton of time? I want to find out!

New on the Jeff-alytics Podcast

Most debates about crime policy are framed as a choice. You’re either tough on crime or you’re not. You focus on enforcement or prevention. And the answers tend to sound simple.But once you move from talking about crime to actually trying to reduce it, things get more complicated, requiring nuanced solutions to complex problems.

My latest guest is Neera Tanden, president and CEO of the Center for American Progress and a longtime policy advisor who has worked across multiple administrations, including serving as domestic policy advisor in the Biden White House.

Apple

Spotify

Amazon

You can also catch it on the Jeff-alytics YouTube page where I’ll be posting episodes and video clips, so be sure to like and subscribe there if you’re so obliged!

And while you’re here, be sure to check out these other recent great episodes:

Senator Chris Murphy

Manhattan Institute Senior Editor Charles Fein Lehman

Civil rights attorney Jill Collin Jefferson

Law professor Rachel Harmon

Thanks for reading Jeff-alytics! This post is public so feel free to share it.

Jeff-alytics

Discussion about this post

Ready for more?