Spreadsheets for data journalism

For many students and journalists, spreadsheets are the entry-level computer-assisted reporting tool and it’s easy to see why: They come installed on many computers and are relatively simple to use. Most students learn how to use spreadsheets in middle school, if not before. Another attraction for data journalists is that government agencies release a lot of data in formats that can be easily opened in spreadsheet programs. It’s no wonder that some data journalists have joked that spreadsheets are the “gateway drug” of CAR.

Spreadsheets help us run calculations and create pivot table reports. We can use them to sort and sift through huge data tables, too.

 

xl

 

We’re going to use Microsoft Excel in class, which is installed in our lab and should be on everyone’s personal laptop. Our lab computers run Excel for Windows 2010 and can open spreadsheet files with more than 1 million rows and more than 16,000 columns. Older versions of Excel, 2003 and earlier, can only handle files with more than 65,000 rows and 256 columns.

More recent (2008 and 2011) versions of Excel for Mac can open the same larger files as their Windows counterparts. Earlier versions share the same limits as in Windows.

However, if you eventually want to use the Microsoft Power Query add-on to churn through millions of records, you’ll need Excel for Windows 2010. If you’d like to use the NodeXL tool to create network diagrams, it’s Excel for Windows 2007 or later.

Another great spreadsheet option is the open-source program Calc. You can download Calc as part of OpenOffice or LibreOffice. Open-source software is available free of cost and of licensing restrictions. Calc runs on Mac or Windows and support large files (more than 1 million rows x 256 columns).

 

calc

 

You can see that Calc is not as polished as Excel, but it will definitely meet your spreadsheet needs.

Google Drive spreadsheets might be a good option for smaller files — there’s a 20 MB limit for files that you’re uploading and converting. Further, you can only have 400,000 cells. So if you have a spreadsheet with 40 columns, you can only have 10,000 rows.

Pivot tables in Google Drive spreadsheets are primitive, when compared to Excel or Calc. But Google Drive spreadsheets shine at  handling live data feeds from the Internet and in allowing you to collaborate on the same file with other journalists or the public.  You can even build forms for entering data into your spreadsheet.

Practice safe computing, though: Don’t store any sensitive information on Google Drive or any other cloud-based service, where you lack exclusive control of the data.

RazorSQL for SQLite

By Emma Vandelinder and Allison Graves

RazorSQL is a front-end program that provides tools to browse, manage and edit databases, with more than 30 database managers supported.  Although RazorSQL is very similar to SQLite Manager, new users will still need time to make the transition.

One of the first differences is in the query box. Like many SQL clients, RazorSQL shows the user two windows for viewing. However, unlike SQLite, users are able to type their queries on numbered lines and the SQL is highlighted blue.

 

razor1

 

Another major difference with RazorSQL is the drop-down menu that appears when you begin a query. This drop-down menu allows the user to select field and table names, which helps cut down on typing errors.

For example, this drop-down menu appears when the user types a WHERE statement.

 

razor2

 

Unlike SQLite, RazorSQL offers users the opportunity to edit data. In the second window, users can turn on editing and directly type into the field. This could be useful if the user notices any inconsistencies in the data.

 

razor3

 

RazorSQL also offers more features, a seen by the greater number of toolbar buttons (top), compared to SQLite Manager (bottom).

 

razor4

razor5

 

All in all, RazorSQL is a comparable database manager to SQLite. We believe that the simplest way to execute SQL is to use the client that you first used. However, we think that RazorSQL is a worthy alternative.

Users can try RazorSQL free for 30 days. After that, the cost for a single-use license is $99.95. The program is available for Mac, Windows, Linux and Solaris operating systems.

SQLite client alternatives

SQLite Manager add-on for Firefox browsers has its quirks and shortcomings as a front-end program for SQLite databases. So I asked my students to find alternatives and write short first looks at those programs. They found:

  • RazorSQL, a commercial program for Macs, Windows and Linux machines
  • SQLPro, a commercial program for Macs
  • SQLiteStudio, a free and open source program for Macs, Windows and Linux machines
  • Navicat for SQLite, a commercial program for Macs and Windows machines
  • MesaSQLite, a commercial program for Macs

In the coming days, I’ll post their reviews here.

SQLite test run part 1

This semester in my computer-assisted reporting class, we’re giving a new database manager a look: SQLite Manager. The program is an add-on for Mozilla Firefox browsers and takes just a few seconds to install. Users work with self-contained SQLite database files, which have a .sqlite extension.

SQLite touts itself as the “most widely used and deployed” database engine, running on every Android phone and iOS device and inside many popular web browsers.

sqlitemanager

Before, we used MySQL Community Server and Navicat Essentials client for MySQL, whose installation sometimes consumed an entire 75-minute class, depending on the snags students encountered. An unfortunate few students running Macs were beset by fatal errors that sent them scrambling for tech support outside class.

So, what a relief to set up a database manager in a fraction of the time that it took before.

However, there is a learning curve in switching from other database managers, such as MySQL and Microsoft Access. We’ve discovered some pretty significant differences in how SQLite behaves.

First, SQLite is case sensitive when evaluating WHERE statements for text. For instance, the following statement will only find variations of Columbia that are spelled in proper case. It will miss columbia or COLUMBIA.

WHERE city = "Columbia"

To pick up the other variations, we would need to rewrite WHERE to:

WHERE city = "Columbia" OR city = "columbia" OR city = "COLUMBIA"

or to:

WHERE city IN("Columbia", "columbia", "COLUMBIA")

In the alternative, we could use LIKE to pick up the variations. LIKE is case sensitive by default, but that can toggled off by running the following PRAGMA statement in the SQL editor:

PRAGMA case_sensitive_like = false

Now we can run the following SQL to pick the three different ways Columbia is stored:

WHERE city LIKE “columbia”

As someone who’s primarily a MySQL user, this struck me as odd. In MySQL and Access we usually use LIKE in concert with wildcard characters to match parts of text inside a field. That allows us to get any city field entries like this: Columbia, Mo.

In SQLite, we can do that, too, with this statement:

WHERE city LIKE "columbia%"

The % symbol in SQLite is the wildcard for any number of characters. So we would get all records where the city starts with columbia and ends with anything.

The other wildcard is the underscore ( _ ) and stands for exactly one character.

A couple of other SQLite quirks:

  1. There’s no way to store things that look like dates in date-formatted fields. Unlike MySQL and Access, SQLite lacks a date data type. (Check out item 1.2). However, we can use date and time functions to work with date information that’s stored as text in this manner: yyyy-mm-dd.
  2. The program runs queries where the SELECT and GROUP BY statements are out of whack. Most of the time, these queries will generate inaccurate results that may look OK. Access will not do that and instead throws an error message. MySQL will run those queries, but it does give us the option to change the program to full GROUP BY mode by issuing a SQL command.