Spreadsheets for data journalism

For many students and journalists, spreadsheets are the entry-level computer-assisted reporting tool and it’s easy to see why: They come installed on many computers and are relatively simple to use. Most students learn how to use spreadsheets in middle school, if not before. Another attraction for data journalists is that government agencies release a lot of data in formats that can be easily opened in spreadsheet programs. It’s no wonder that some data journalists have joked that spreadsheets are the “gateway drug” of CAR.

Spreadsheets help us run calculations and create pivot table reports. We can use them to sort and sift through huge data tables, too.




We’re going to use Microsoft Excel in class, which is installed in our lab and should be on everyone’s personal laptop. Our lab computers run Excel for Windows 2010 and can open spreadsheet files with more than 1 million rows and more than 16,000 columns. Older versions of Excel, 2003 and earlier, can only handle files with more than 65,000 rows and 256 columns.

More recent (2008 and 2011) versions of Excel for Mac can open the same larger files as their Windows counterparts. Earlier versions share the same limits as in Windows.

However, if you eventually want to use the Microsoft Power Query add-on to churn through millions of records, you’ll need Excel for Windows 2010. If you’d like to use the NodeXL tool to create network diagrams, it’s Excel for Windows 2007 or later.

Another great spreadsheet option is the open-source program Calc. You can download Calc as part of OpenOffice or LibreOffice. Open-source software is available free of cost and of licensing restrictions. Calc runs on Mac or Windows and support large files (more than 1 million rows x 256 columns).




You can see that Calc is not as polished as Excel, but it will definitely meet your spreadsheet needs.

Google Drive spreadsheets might be a good option for smaller files — there’s a 20 MB limit for files that you’re uploading and converting. Further, you can only have 400,000 cells. So if you have a spreadsheet with 40 columns, you can only have 10,000 rows.

Pivot tables in Google Drive spreadsheets are primitive, when compared to Excel or Calc. But Google Drive spreadsheets shine at  handling live data feeds from the Internet and in allowing you to collaborate on the same file with other journalists or the public.  You can even build forms for entering data into your spreadsheet.

Practice safe computing, though: Don’t store any sensitive information on Google Drive or any other cloud-based service, where you lack exclusive control of the data.