Machine learning helps detect abusive doctors

For its series about sexually-abusive doctors across the United States, the Atlanta Journal Constitution needed to build its own database. No one centralized source collected that information, so reporters scraped state government websites to harvest medical board disciplinary information.

Then reporters applied machine learning to analyze more than 100,000 cases and score each on the probability that sexual abuse had occurred.

Map resources


Resources for GIS maps and Census data


GIS desktop mapping software

Esri – ArcGIS for Desktop


QGIS – free and open source


QGIS plugins repository


Base maps and data

U.S. Census geography


U.S. Census TIGER shapefiles


IRE 2010 U.S. Census data


Census Reporter


American Community Survey geospatial data


Esri Open Data


ArcGIS Online


National Historical GIS (1700-2014)


Center for International Earth Science Information Network Archive of Census related GIS products and other resources


State GIS data links from University of Arkansas


The National Map


Meeting other mapping experts (a.k.a nerd bonding)


National States Geographic Information Council


Esri User Groups







Spreadsheets for data journalism

For many students and journalists, spreadsheets are the entry-level computer-assisted reporting tool and it’s easy to see why: They come installed on many computers and are relatively simple to use. Most students learn how to use spreadsheets in middle school, if not before. Another attraction for data journalists is that government agencies release a lot of data in formats that can be easily opened in spreadsheet programs. It’s no wonder that some data journalists have joked that spreadsheets are the “gateway drug” of CAR.

Spreadsheets help us run calculations and create pivot table reports. We can use them to sort and sift through huge data tables, too.




We’re going to use Microsoft Excel in class, which is installed in our lab and should be on everyone’s personal laptop. Our lab computers run Excel for Windows 2010 and can open spreadsheet files with more than 1 million rows and more than 16,000 columns. Older versions of Excel, 2003 and earlier, can only handle files with more than 65,000 rows and 256 columns.

More recent (2008 and 2011) versions of Excel for Mac can open the same larger files as their Windows counterparts. Earlier versions share the same limits as in Windows.

However, if you eventually want to use the Microsoft Power Query add-on to churn through millions of records, you’ll need Excel for Windows 2010. If you’d like to use the NodeXL tool to create network diagrams, it’s Excel for Windows 2007 or later.

Another great spreadsheet option is the open-source program Calc. You can download Calc as part of OpenOffice or LibreOffice. Open-source software is available free of cost and of licensing restrictions. Calc runs on Mac or Windows and support large files (more than 1 million rows x 256 columns).




You can see that Calc is not as polished as Excel, but it will definitely meet your spreadsheet needs.

Google Drive spreadsheets might be a good option for smaller files — there’s a 20 MB limit for files that you’re uploading and converting. Further, you can only have 400,000 cells. So if you have a spreadsheet with 40 columns, you can only have 10,000 rows.

Pivot tables in Google Drive spreadsheets are primitive, when compared to Excel or Calc. But Google Drive spreadsheets shine at  handling live data feeds from the Internet and in allowing you to collaborate on the same file with other journalists or the public.  You can even build forms for entering data into your spreadsheet.

Practice safe computing, though: Don’t store any sensitive information on Google Drive or any other cloud-based service, where you lack exclusive control of the data.

RazorSQL for SQLite

By Emma Vandelinder and Allison Graves

RazorSQL is a front-end program that provides tools to browse, manage and edit databases, with more than 30 database managers supported.  Although RazorSQL is very similar to SQLite Manager, new users will still need time to make the transition.

One of the first differences is in the query box. Like many SQL clients, RazorSQL shows the user two windows for viewing. However, unlike SQLite, users are able to type their queries on numbered lines and the SQL is highlighted blue.




Another major difference with RazorSQL is the drop-down menu that appears when you begin a query. This drop-down menu allows the user to select field and table names, which helps cut down on typing errors.

For example, this drop-down menu appears when the user types a WHERE statement.




Unlike SQLite, RazorSQL offers users the opportunity to edit data. In the second window, users can turn on editing and directly type into the field. This could be useful if the user notices any inconsistencies in the data.




RazorSQL also offers more features, a seen by the greater number of toolbar buttons (top), compared to SQLite Manager (bottom).





All in all, RazorSQL is a comparable database manager to SQLite. We believe that the simplest way to execute SQL is to use the client that you first used. However, we think that RazorSQL is a worthy alternative.

Users can try RazorSQL free for 30 days. After that, the cost for a single-use license is $99.95. The program is available for Mac, Windows, Linux and Solaris operating systems.