It got me thinking about the distance formula and a scatterplot by LA Times reporter Jon Schleuss that used FBI crime data. I wrote my first Makefile to grab a similar dataset of state crime rates and make a d3 scatterplot with ProPublica StateFace icons.
When a state is selected, the distance formula loops each point and sorts to find the state with the “nearest” crime rates, according to the scatterplot x and y coordinates.
“Nearest” probably has more statistical meaning than I realize, and I could have used other, more efficient d3 algorithms like quadtrees.
I also think having a short distance between x and y coordinates is not the same as being “similar.” For example, Washington’s property crime rate dropped while its violent crime rate rose, but the point closest to Washington is Colorado, which had declines in both crime types. An algorithm for the most “similar” changes in crime rates could consider whether rates increased or decreased in addition to the distance between values.
I like scatterplots because they lay out all the data, but I’ll continue exploring interactive algorithms as ways to guide the user.