Area nerd finds cool data difficult to put on map

Interning at the Texas Tribune has been a blast because the newsroom is comfortable working with data. I’ve enjoyed working on a few projects with managing editor and criminal justice reporter Brandi Grissom, and recently she approached me with an awesome dataset.

For the last two years a state law has required Texas county jails to file monthly reports of how many undocumented immigrants are housed with Immigration and Customers Enforcement detainers and how much money is spent housing the immigrants. Brandi filed a request to the Texas Commission on Jail Standards and received PDFs of these ICE detainer reports for each month. With the help of Tabula I had a database of county jails and how much they were spending on undocumented immigrants, which was exciting because immigration is one of my biggest interests.

My first instinct was to use this awesome dataset to make a cartogramLA Times style – resizing Texas counties by how many undocumented immigrants are housed in their jails.

But the data quickly told me this wouldn’t be the best approach for this story.

Outliers were the problem. Harris County, which includes Houston, housed way more undocumented immigrants than any other county jail. Some quick charts in R demonstrate how crazy it is to compare counties:

It was clear that a visualization with Harris County on the same scale as other Texas counties would do a disservice to the data. A cartogram could show the disproportion, but it would be difficult for users interested in smaller counties. Accessibility was a priority because this little-known dataset hadn’t been covered in many other places we wanted show users how much their county jails are spending on undocumented immigrants. The newsworthiness of the data encouraged us to publish quickly, but that meant I didn’t have too much time for design.

I was recently admiring the Chicago Tribune medical costs database that lets users get to the information they want to see in what seems like a huge dataset. With the help of Ryan Murphy’s TableSift.js, I went the sortable table route to let users lookup their county jail and see totals for the two-year we analyzed. Brandi advocated a visualization of prisoners and costs over time, so I added a simple sparkline plugin.

Check out Brandi’s story here and the chart here.

Of course I was still curious what a cartogram would look like, so I spent the weekend fiddling with Raphael.js and d3.js. I’m still new to visualizations with SVG, but Anthony Pesce’s shp2svg tool made it really easy to get the data on a web page. The map below shows Texas counties sized by how many undocumented immigrants in housed in their jails in 2012 using d3 linear and logarithmic¬†scales (full-page graphic here).

I also gave a cartogram program called Scape Toad a whirl:

We made the ICE detainer reports available to download. It’s a really neat dataset and I’d love to see if people have other viz ideas.

PS I’m glad to share Tabula tips for the PDF extraction!