ANOTHER Cautionary Crime Data Tale
Friends don't let friends use NeighborhoodScout crime data.
A piece was recently published by a betting website purporting to show the “most dangerous NBA arenas.” There’s a data component and a survey component to this piece and both are worth a closer look. Taken individually each part is largely nonsense, but taken together the two parts are also largely nonsense.
There are numerous reasons to be extremely wary of any list like this.
The primary reason to be extraordinarily skeptical of a list like this is that the crime data comes from NeighborhoodScout and NeighborhoodScout’s data is made up. We know this because they tell us as much through how they get crime data.
Per NeighborhoodScout, “Our exclusive crime data are developed for each neighborhood using our mathematical algorithms and crime statistics from more than 18,000 local law enforcement agencies.”
Translating that into English, that means they most likely take FBI UCR Part I data for every law enforcement agency nationwide and feed it into an algorithm that spits out crime rates based on various factors that likely influence crime levels. Or, in other words, it's a made up guess.
The FBI’s data does not delve into any geographic level, so NeighborhoodScout has to take the citywide data and guess what the geographic figures might be. The neighborhood and zip code data they produce is not publicly available and there’s no easy way to point out what a terrible approach this is.
They also say that they “offer seamless national coverage” which, duh, anyone can download this data from the FBI, “and up to 98% accuracy” which is nonsensical and unprovable. What does that even mean?
Then there’s this from NeighborhoodScout’s FAQ:
“NeighborhoodScout's crime data are always the most recent 'Final, Non-Preliminary' data available as classified by the FBI. It is the most up-to-date and fully-vetted data that is available, with complete national coverage. We insist on using Final, Non-Preliminary data for our analyses and analytics rather than basing our research on preliminary data that may need to be updated or have errors in it.“
This sounds good and mature but really just highlights the holes in the company’s methodology by confirming that they are decidedly not using real data from police departments and are relying on the FBI’s Uniform Crime Report figures. And all of that is before we get to the enormous hole in NeighborhoodScout relying on 2021 crime data given how underreported NIBRS was that year.
You can probably guesstimate an area’s crime rate using a combination of citywide crime levels and poverty/education/employment factors. But absent actual data from a police department’s record management system — which is virtually impossible to obtain at scale — then your crime rate guess is just that, a guess. It's certainly not precise even if the methodology is finally fine for showing where crime rates tend to be higher or lower. And using imprecise guesswork to rank 29 NBA arenas is not a great methodology.
One glaring problem with the piece in question is that assessing crime rates by zip code is a terrible way to evaluate crime. This is especially true when comparing crime rates by zip codes between vastly different cities in vastly different situations. Philadelphia’s NBA arena sits in a zip code with nearly 54,000 people according to the 2020 Census while Washington’s arena sits in a zip code over 30 times smaller. Population size alone explains a decent amount of the disparity in crime rates between these arenas.
Zip code boundaries are also frequently relatively arbitrary too. Houston’s arena is literally split down the middle by two zip codes.
The New Orleans arena sits next to the interstate in a zip code with a largely residential population and a Home Depot on the other side of the interstate. Meanwhile the Superdome — which is attached the arena by a walkway — is in the city’s Central Business District. Two very different populations live in the two zip codes with two very different types of crime, separated by an arbitrary line. Yet this methodology could be used to show that the Superdome is safer or less safe than the arena.
Taking the piece at face value I decided to test NeighborhoodScout’s figures using real world crime data. I used 2022 data from a handful of cities that publish crime data on their websites.
Milwaukee’s Fiserv Forum sits in the second smallest zip code by population and had a UCR Part I crime rate of 266 per 1,000 in 2022 according to MPD data compared to 60.5 in NeighborhoodScout. Detroit’s Little Caesars Arena had a crime rate of 75 per 1,000 in DPD’s data compared to 59 in NeighborhoodScout. And Houston’s Toyota Center had a crime rate of 139.3 per 1,000 in HPD data compared to 54.7 in NeighborhoodScout’s data.
Incident-level data by zip code is not available for most NBA arenas, but that quick glance has me questioning the 98 percent accuracy figure just a bit.
There'salso the fact that comparing NBA arenas by population per 1,000 people within a zip code makes no sense on an analytic level either. There are 365 days in a year but an NBA team plays only 41 regular season home games (though the Lakers and Clippers share an arena). Then there are the playoffs which means extra games in many arenas — depressing myself as a Pelicans fan here.
Plus teams don’t play for 24 hours in a day. Crowds attending a game surge the population of a zip code to many times the zip code’s size during the game which can be doubly true if an NBA arena shares a zip code with another professional sports team (like a WNBA team or an MLB team as in the case of Minneapolis). Yet this analysis accounts for none of these issues to determine which arena is “most dangerous”.
At best the piece shows which zip codes with NBA arenas in them have the highest crime rates according to NeighborhoodScout’s flawed algorithm. Not exactly breaking news and it says nothing about what goes on within or in the immediate vicinity of an arena.
The survey portion of the piece is also nonsense for several reasons.
The average age of survey respondents was 37 years old, so if they have kids they’re probably on the younger side. Yet one of the questions is “Are you comfortable letting your minor children visit your team’s arena without you?” I frequently let my kids (ages 5 and 7) go to NBA arenas by themselves but I’m probably in the minority.
The most common type of “crime” witnessed at an NBA arena was verbal harassment which is not a UCR Part I crime, is poorly defined, and is probably something that is frequently not criminal when witnessed at an NBA arena. The second most common “crime” is physical violence which is also poorly defined and may or may not be criminal in nature (if I saw a minor fight at a game or the home team’s fans trolling a visiting fan then both might count on this survey but neither are inherently criminal).
The third most common type of witnessed crime per the survey is public intoxication which is a crime I guess. I’m not sure I’ve ever been to a New Orleans sporting event and not seen public intoxication, and I certainly wouldn’t point to public intoxication as an element that makes New Orleans sporting events dangerous. The survey as presented also doesn’t give a timeframe for when these crimes were witnessed. If you’ve been going to Knicks games for decades you’ve probably seen some stuff which says nothing about whether a Knicks game is safe to attend now.
There are crimes that occur inside and outside of NBA arenas some of which become very public like a string of vehicle burglaries and thefts in New Orleans earlier this year or a shooting outside a Milwaukee game last year. This post is in no way intended to make light of those incidents or suggest that nothing criminal happens every happens in or around an NBA arena.
But hopefully the post illuminates how challenging it can be to work with crime data, how careful a writer who is not very familiar with crime data should be when drawing conclusions from potentially faulty data, and how skeptical any reader should be when a gambling website starts talking about crime data.
I’m fairly confident these same algorithms and assumptions are used analyzing climate data...