Due: Noon, Tues., Jan. 31

 

One of the best ways to dive into the world of computer-assisted reporting is to examine stories that have used data analysis.

So, for this assignment, I would like you to read a CAR story that’s been published in the past two years. Then, I want you to use the commenting system here to write a few paragraphs that summarize the story and how data was used. Also, note questions that you have about the story.

You can find examples of CAR stories a few different places:

  • Investigative Reporters and Editors Extra!Extra! investigative story blog, where you can see articles categorized as CAR.
  • Another option is GreatJournalism.net, where you can find stories by data type used. Derek Willis, a newsroom developer at The New York Times, maintains this site.
  • The Times and USA Today regularly run data-driven stories, so you might want to poke around on their sites. Search the sites for phrases like “data analysis”, “analysis of computer records” or “computer-assisted.”

I’m eager to see what you all dig up!

 
  • Alicia Stice

    Pro Publica has been covering super PACs for a while now. They wrote an investigative story explaining the implications of the different court rulings involving campaign finance.

    Along with the story, they have a page which breaks down how much super PACs are spending, where they’re spending it and what they’re spending it on. The page is updated to include current information.
    http://projects.propublica.org/pactrack/#committee=C00490045

    You can look at it by individual super PAC and see what they are spending and where they are spending it. It even explains the type of political advertising the money was spent on. For example, on Jan. 27, “Restore our Future,” a super PAC supporting Mitt Romney spent about $1,500 on an email campaign supporting Mitt Romney and about $30,000 on a phone campaign attacking Newt Gingrich.

    • http://twitter.com/davidherzog David Herzog

      I’m getting an error when I go to the site, but this sounds like a nice way to make sense out of the super PAC spending as the elections season goes into full swing.

  • James Jobes

    An independent student newspaper in the city of Athens, Ohio has been publishing a series of graphs and stories on underage drinking. The one that i found most compelling was a graphic and accompanying story about where the most underage liquor citations happen. It uses an infographic to display that a majority of the citations occur during events like music festivals and Halloween, where large groups of people are all around and mass drinking is implied. It found that 4.6% of underage drinking arrests occurred at actual bars.

    http://thepost.ohiou.edu/content/under-influence-under-21-bars-safer-bet-those-under-age-21-athens

    This graphic was most likely arranged from an excel sheet displaying every citation since 2006 with location included. And the reporter most likely sorted the information exactly how we learned Thursday in class. It’s good to confirm the processes we’re learning is being used in the field correctly and to illustrate good points.

    • http://twitter.com/davidherzog David Herzog

      A good example of how the data helps tell you what’s *really* going on.

  • Emilie Stigliani

    I found a simple but interesting computer assisted reporting story on The Other Local website, which is collaboration between The New York Time, New York University and the East Village. Nick Desantis wrote the article “Data shows bars with most noise complaints, but is it sound and fury?” (http://eastvillage.thelocal.nytimes.com/2012/01/23/noise-complaints/?scp=20&sq=data%20analysis&st=cse) and it was published on Jan. 23, 2012. 

    The premise of this piece is that Desantis analyzed a data set of 311 noise complaints (from NYC Open Data, http://nycopendata.socrata.com/) that were logged from January 2010 to Oct. 16, 2011 about bars in the East Village.  He then looked at the bars with the highest noise complaints — the no. 1 offender being Sutra Lounge — and interviewed both the owners and the neighbors.

    He reported on peoples theories and responses. Many of the bar owners felt like only a few people placed complaint calls repeatedly, meaning that the noise was not a problem for the average neighbor. Some of the neighbors complained of sleepless nights.

    What I like about this piece is that it’s not complicated. The numbers seems simple enough to understand. Still, Desantis found a way to turn the data into an interested story and fleshed it out with real people.

    Here are some lingering questions  I had:

    Why did the reporter not interview the law enforcement who are the intermediaries between the bar owners and the neighbors? It seems like would have some insight into the issue.

    It seems like the data analysis in this story is pretty simple. Did Desantis go about it with a spreadsheet or did he just count the number of bars by hand?

    • http://twitter.com/davidherzog David Herzog

      I think this is a good example of how you can use data in hyperlocal reporting to help tell interesting stories that are meaningful. Good find.

  • Yan Lu

    I found a story through the IRE website. And it’s about Tucson’s housing market going down. Nearly 6,400 homes in Pima County, Arizona were sold for under $ 100,000 in 2011, which is more than  35% of the 18,000 homes sold the entire year.

    The article was initially published on the Arizona Daily Star: http://azstarnet.com/real-estate/in-homes-here-sell-for-under-k/article_f957345a-a856-56bc-aa1e-46f8f27c3f29.html

    The data that this article used was collected by the reporter Rob O’Dell. He first noticed the situation, and then exploring data to support his idea. He built a database of traditional home sales (mainly sales between two homeowners), and acquired a database foreclosures (both homes sold back to banks and to third parties at auctions).

    Besides using data in the article, the newspaper also show an interactive map on their website for readers to explore. It is a good way to visualize databases.

    - Yan Lu

    • http://twitter.com/davidherzog David Herzog

      Another good example of how you can find a good story by cross-referencing data. Rob used GIS mapping software to find the neighborhoods with the highest concentrations of foreclosed properties and tax liens.

  • Thomas Koll

    I read an article from August 2011 on FOX-5 in D.C. (http://www.myfoxdc.com/dpp/news/investigative/fox-5-investigates-congress-cars-080111#ixzz1U04u1jAQ) about members of the House
    that lease vehicles on taxpayer dollars. Utilizing an expense report a few
    thousand pages long, the news station uncovered what members are using tax
    dollars to pay leases each month for the vehicles they own personally, but they
    also use for their congressional duties. While the practice is outlawed for members of the Senate, no such rule has been laid for the House. The story quoted figures from $500 to $1,400 a month of taxpayer money to pay for personal vehicles. The data was used mainly to criticize
    those officials calling out for harsher cutbacks in the federal budget to
    adjust for raising the ceiling when they themselves are wasting taxpayer money on
    personal vehicles.

    What struck me as the most interesting about this story was
    that this data-driven, computer-assisted way of reporting resembles what the TV
    shows portray the modern work of law enforcement does to catch criminals. Paper
    trails (almost) never lie, and those trails are a lot easier to follow with the
    advent of the Internet and data being stored digitally.  This especially helps journalists in their roles
    as watchdogs of democracy. In a profession where one continually has to settle
    for lame quotes from politicians that everyone knows is BS, but can’t say so
    because that’s ‘bias,’ this ability to uncover raw data can help to restore a
    bit of honesty. The reporter with the ability to contradict those quotes with
    facts from a checkbook will not only make a name for themselves as
    hard-hitting, but will also probably feel a lot less helpless than many other
    reporters without the ability or the drive to do the data mining.

    • http://twitter.com/davidherzog David Herzog

      The reporter on this story, Tisha Thompson, is an Mizzou grad, BTW.

  • Celia Murray

    I found a fascinating article in The Orange County Register about the influence of immigration and immigrants on California: . The article combines enormous quantities of data from the last four decades from the U.S Census Bureau, interviews with experts and immigrants, as well as detailed analysis of other records. 

    The article relies heavily on the data and records that the reporter has managed to obtain and throughout the whole piece there are further links and references to other research and the records themselves. 

    The use of data is imperative to the writing about the mass influx of immigrants in the 90s, the percentage of immigrants currently in California who are from Mexico and South America, as well as the jobs that immigrants held. The reporter also uses data to write about a period in the 90s where Americans from other states who had previously flocked to California, left en masse. 

    What I liked about this piece is how the statistics and figures that were obtained helped to shape the story and compliment the interviews with sources. The numbers bolstered the angle and provided strong evidence without necessarily becoming the focus. It would have been easy for the journalist to have relied too much on the numbers and flooded his piece with them. However he showed restraint and used them only when necessary. 

    One point that did stick with me was why the reporter didn’t pursue an investigation into the wages and salaries of the immigrants. It would have been very interesting to see whether incomes increased or declined, and how low they were to start with. 

    • http://twitter.com/davidherzog David Herzog

      You’re right the data really helped the reporter frame out the story and get beyond anecdotes.

  • Roxanne Foster

    I looked into this story from my hometown newspaper, the Houston Chronicle, entitled “Despite fatal crashes, seat belts don’t click with all police” (http://www.chron.com/news/houston-texas/article/Despite-fatal-crashes-seat-belts-don-t-click-1717362.php#page-2).   Reporter Moises Mendoza analyzed data from the National Highway Traffic Safety Administration and found that 40% of all officers killed in automobile and motorcycle accidents across the U.S. while on duty weren’t wearing their seat belt. The article talked about police culture and beliefs that, in spite of training and pressure from their superiors, led up to 40% of the men and women on the force to not wear their seat belt while on duty.

    The data found that during the past three years, more Texas officers were killed in auto crashes (18) than those who were shot and killed in the line of duty (16). In Texas, officers are expected to submit to the same laws as civilians regarding seat belt. The article cited the need to be quick on their feet in pursuit of suspects and the likelihood that their weapon could get caught on the belts as reasons why officers choose not to obey this law, in spite of the risk involved.

    Some areas were said to be cracking down on officers who didn’t wear seat belts while on duty. Their efforts included citations and GPS units that track officers involved in high-speed pursuits. I found it really interesting that officers self-report whether or not they were wearing a seat belt after incidents, but the reporter said “authorities have no evidence of under-reporting.”

    Questions I had regarding the article included:

    - How can you verify that under-reporting isn’t an issue, especially if officers are breaking the law by not wearing a seat belt and their superiors are cracking down on those who violate the law?

    - What types of tangible repercussions have been put into place for those who violate the seat belt law? Is data available yet that shows how many officers are cited while on duty? (Are their peers even taking this mandate seriously?)

    - How widely used are new technologies that address the concerns of officers when using their seat belt (i.e., belts that break away or disengage once the car is put in park)?

    • http://twitter.com/davidherzog David Herzog

      Holy moly, an incredible story, especially the contrast between deaths by shooting and in accidents. You’re right to take the under-reporting claim with a grain of salt.

  • Haoyun Su

    I read a story from the Guardian’s datablog, entitled “US plastic surgery statistics: breasts up, chins down” (http://www.guardian.co.uk/news/datablog/2011/jul/22/plastic-surgery-medicine#_)

    The article talked about plastic surgery business is booming despite the economic recession based on 2010′s data. In 2010, Americans spent $10.1 billion on over 13 million cosmetic procedures, a 5% up compared to the number of  2009 and a 77% increase compared to that of 2000.

    The Guardian provided a graph based on the 2010 data. It looked into the most common surgery, cost of treatment, top 5 cosmetic procedures done by women and men, etc. The data found that male procedures are on the rise, and the number of teenage customers is unusually high for British readers.

    It also found the type of surgeries that have the biggest decrease or increase. All data is from the American Society of Plastic Surgeons and has included the number of reconstructive surgeries, a 2% increase compared to that of 2009.

    The author raised the question “what can you do with it?” Basically, the story gave the original data and also did a thorough analysis with graphics and comparison. It specified the issue that local readers might be interested in – the number of teenage surgeries. It didn’t draw any conclusion based on the data but left readers with questions to think about.

    After reading it, my question included: 

    - If it were written by an American newspaper, what will be different from the current version? I think it will be more convenient for American papers to delve deeper into the question “why.”

    - The original data (http://www.plasticsurgery.org/News-and-Resources/Statistics.html) had already been sorted out and analyzed very thoroughly by the institute. In this case, how can reporters come up with new ways to explore the data?

    • http://twitter.com/davidherzog David Herzog

      Never saw that before, that’s really interesting data. I like, too how the Guardian let you see the data they used.

  • Mengni Yang

    The New York Times article “Number of Older Inmates Grows,
    Stressing Prisons” (http://www.nytimes.com/2012/01/27/us/older-prisoners-mean-rising-health-costs-study-finds.html?scp=1&sq=inmate%20aging&st=cse)
    addressed a growth of health care costs for legislators as aged inmate
    population has become a protruding problem. Citing a report of the Human Rights
    Watch, the article said the health care of these aging prisoners can be nine
    times higher than that for younger inmates. The aging problem can be manifested
    in the fact that the growth rate of the incarcerated is 90 times the rate of
    the total prison population from 2007 to 2010. The author took California as an
    example and talked about how the state is dealing with the aging population and
    reducing health care costs.

    Several kinds of data are used in different ways. The author
    compared the growing rate of imprisoned men and women 65 years and older to the
    growing rate of the total prison population from 2007 to 2010. Through
    calculating the growing rate and through comparison, trends can be concluded
    through the data. Then, the author put forward another set of comparison: the
    growing rate of imprisoned people older than 55 and the growing rate of the
    rest of the inmates. The reason for this comparison is a little bit confusing
    to me, because I can’t see the point of comparing data by using different
    baselines (55 years old and 65 years old) and different standards (65 years and
    older comparing with the total, and 55 years and older comparing with the
    rest). If I were the reporter, I probably will go deeper based on one set of
    criteria, asking more items such as population, percent population of total and
    so on; rather than using the same item (growth rate) of different criteria.

    Perhaps the percentage of aging population is extremely
    small comparing to that of younger population, which might not cause too much
    trouble for legislators. It might be interesting to put those numbers in the
    article as well, and at the same time, resorting to estimation of possible health
    care demand, supply, and costs in the future.

    Besides the rate of growing inmates, the author took
    Michigan as an example to elaborate how the cost of health care differs between
    different ages.  The comparison is very
    apparent and clear, which explain the issue very well.

    Last but not least, the author listed the inpatient costs in
    public prison hospitals and private operators, which indicated a way for the
    government to reduce health care cost in prison.

    Regarding this article, more data can be provided to make
    the article more interesting and more in-depth. For example, how does the possibility
    of disease contagion among aged population differ from that among younger
    inmates? How the public has been debating on the topic of providing health care
    to inmates? What is the death rate in prison? Will it be possible that the
    death rate offsets the growing rate of older prisoners? 

    • http://twitter.com/davidherzog David Herzog

      Good catch! It would be good to know the percent of total for those over 55, 65, etc.

  • http://twitter.com/David_Cawthon David Cawthon

    According to one New York Times article, it’s clear someone’s rolling in the dough. http://economix.blogs.nytimes.com/2012/01/17/measuring-the-top-1-by-wealth-not-income/?scp=5&sq=inequality%20income%20data%20analysis&st=cse

    This story — which serves as more of an explanatory, “behind-the-scenes” piece — elaborates and defends the methods used to measure the data in a article that profiled the top 1 percent by income, rather than wealth. 
    The original article: http://www.nytimes.com/2012/01/15/business/the-1-percent-paint-a-more-nuanced-portrait-of-the-rich.html?_r=1

    The New York Times explained that the Census has a measure for income, but not wealth; the Census also yields more recent data and a larger sample size, which offers many statistical advantages, as more data can be observed and compared across more fields.

    The article said that the wealth gap is also more extreme than the income gap, according to the Fed. The article demonstrated another disparity and compared net worth and median household income with a few figures: 

    “The Times had estimated the threshold for being in the top 1 percent in household income at about $380,000, 7.5 times median household income, using census data from 2008 through 2010. But for net worth, the 1 percent threshold for net worth in the Fed data was nearly $8.4 million, or 69 times the median household’s net holdings of $121,000.”

    Half of those in the top 1 percent measured by income and those according to wealth fell into both groups; the others not falling into both groups were at least in the top 5 percent.

    The article also divulges other statistical tidbits near its conclusion. Here’s one to tickle your curiosity:

    “Money may not buy happiness, but the Fed survey suggests it buys good health. About 90 percent of the 1 percenters describe themselves as being in excellent or good health, compared with 75 percent of everybody else. About 85 percent expect to live into their 80s, compared with 68 percent of everybody else.”

    The most telling quote in the original article was this one, collected from interviews.
    “Of the 1 percenters interviewed for this article, almost all — conservatives and liberals alike — said the wealthy could and should shoulder more of the country’s financial burden, and almost all said they viewed the current system as unfair. But they may prefer facing cuts to their own benefits like Social Security than paying more taxes.”

    What is the geographic distribution of the residences of the top 1 percent?

    How did the economic downturn affect the top 1 percent compared to the “average” American household?

    At what age do 1 percenters die compared to everyone else?
    What statistics are available for other countries? Are 1 percenters in the U.S. outliers or average?

  • Jon Rehagen

    The Las Vegas Review-Journal article entitled “142 Dead, and Rising” (http://www.lvrj.com/news/deadly-force/142-dead-and-rising/las-vegas-police-rank-high-in-shootings-134255763.html) was the second installment of a five part series called “Deadly Force”. The story that ran showed the statistics regarding the Las Vegas Valley authorities and the numbers of fatalities in which they are involved in after discharging their firearms. The article mainly discussed different scenarios in which fatalities occurred as a result of confrontation with Las Vegas police officers. The authors of the article compiled crime and fatality statistics from 16 metropolitan areas from 1990 to 2011, and concluded that Las Vegas was among the top cities for deaths resulting from police shootings. The main kick in the story comes when the reader discovers that the rate of fatalities is rising quicker than other cities with similar issues, and that studies have shown that some could have been avoided.

    The authors use a lot of the statistics to help form a number of visual aids in his story, which really helps the understanding of all the information they somewhat overwhelm the reader with. Saying that, the number of statistics are a bit high. So much that as a reader of the article I felt that as I read it, I sometime felt my eyes glaze over. However the numbers were VERY informative and really put the story into a great context that coincided with the “Deadly Force” series. 

    I found a few statistics very interesting. First the authors reveal that 42 percent of all the officer-caused deaths take place in 7 of the 136 zip codes in Clark County (Las Vegas Metropolitan Area). That statistic helps the story progress emphasize that Las Vegas isn’t a bad city, because nearly half of these incidents occur in a few particular regions. Next, I found it interesting that the authors put in the statistic that from 1990 there had been 33 times when an unarmed suspect had been shot, but that 7 injuries including one death happened in a 16 month span from 2009-2011. They tend to suggest that these kinds of occurrences are on the rise as a little under a third of these situations have occurred recently.

    A few things could have been done to possibly improve this article. I wondered during this why there was so many numbers? Yes they had a lot of information, but they could have really cut some of it out for the flow of the story. Also, it seemed like they used a lot of statistics but still remained somewhat vague so I wonder if there were any statistics they could have mainly focussed on and really exploited it to narrow it down a little bit more. Maybe diving into the statistic that there was a concern that the hire-frenzy of police officers may have resulted in inadequately trained officers released onto active duty who weren’t prepared. However the statistics show that the average experience of officers are 7 years and the age is around 35 years old. 

    • http://twitter.com/davidherzog David Herzog

      A great, in-depth project. It’s good to see newspapers committed to doing reporting like this. I really like the graphic/database of people killed by the PD. It’s a good way to humanize the victims.

  • Matthew Patane

    The New York Times Article, “Mexico Updates Death Toll…” (http://www.nytimes.com/2012/01/12/world/americas/mexico-updates-drug-war-death-toll-but-critics-dispute-data.html) is an example of how data analysis was attempted in a story and of how more data analysis could be used to improve the information provided. The article focuses on the Mexican government’s estimate that  over 47,000 people have been killed in Mexico since 2006 due to drug violence.

    As anyone who follows events in Mexico or has even heard the country mentioned on the news briefly should know, people living in the country are being terrorized by the fact that they live in a state essentially run by drug cartels and fueled by drug money. Since 2006, Mexico has quickly turned into a country where people are murdered daily and the perpetrators are never caught or even investigated since people live in fear of the cartels and the police seeking retribution for their crimes (the article touches on this but much of this information comes from other articles I have read and this book: http://www.npr.org/templates/story/story.php?storyId=125427225).

    In this story, the reporter talks about how the government has released data stating one number as the total death toll, while critics and investigators have cited a different and much higher number for the actual death toll. For a situation such as Mexico, where government officials are either corrupt, scared or do not know how to handle and account for the situation they are in, it is extremely difficult to ascertain a specific number and reliable data. Since this story provides information about different numbers that come from sets of data with the same population, it is a good example how reporters can use data, or the lack of it, to better explain an issue. 

    That being said, there are some questions I have about the article. Mainly I would ask the reporter why there aren’t more numbers involved in the story. Even if the addition of numbers would only show how much dispute there is between official accounts of the death toll in Mexico, these disputes would lend a hand in explaining how messed up the situation in the country is. I would have also liked to see numbers detailing the drug trade or, if possible, the number of cartel members to police members and how much money the drug industry is bringing in to the country. Despite my hopes for more information, I understand that the situation in Mexico is difficult enough to get a handle on without trying to do a data analysis of the death toll and impact of the drug industry, but it would be very interesting to see if such an analysis could be accomplished. 

    • http://twitter.com/davidherzog David Herzog

      Nice illustration of why it’s important to Know Thy Data: question the source, how they collect it, etc.

  • Jared Grafman

    This USA Today story, http://www.usatoday.com/weather/storms/tornadoes/story/2012-01-30/january-tornado/52893512/1
    (2012 off to furious start in tornadoes) published this morning, takes a look
    at the data of January tornadoes in America. According to the article, this
    year has seen the third-most tornados in January since records have been kept
    about tornado occurrences beginning in 1950.

    The article attributes the high number of tornadoes to unseasonably
    warm temperatures, which triggered the thunderstorms preceding the tornadoes.
    According to the article, 2,800 record highs were either tied or broken in
    January 2012, while 160 record lows were tied or broken. The nation is in a
    climate pattern named La Nina – which produces large tornado outbreaks from
    January to April and stems from cooler tropical Pacific water temperatures –
    which happened in America last year as well, according to the article. It
    states that last year was the deadliest for tornadoes since 1925.

    I know that this story was framed for the month of January, and
    although the article does tell the reader April and May are most often the deadliest
    months, I would like to have seen this compared to some other countries. Or are
    tornadoes only in America? (I could Google the question and pretend I know the
    answer, but I don’t and I’m curious.) I’d also like to have read about some of
    the most common areas for Tornados to hit, which I do know are in “Tornado
    Alley” but what’s the most common city to have a tornado? What’s the deadliest
    city on record? These are a few other questions I think could be answered with
    the data used to create the story.

  • Wendy Wang

    I read the article “California’s addiction to immigrant labor” (http://www.ocregister.com/news/-265578–.html

    ) posted on The Orange County Register,
    Sep. 8, 2010.

     

    Based on a data analysis of the immigrant
    population in California, the story explored the history of immigration here,
    reasons underlying this process as well as its great impacts on the California
    economy.

     

    What impressed me most was the large number
    of links to those spreadsheets for analyzing four-decades of data from the U.S.
    Census Bureau. For each conclusion, the reporter provided the relevant data
    immediately, just like those references listed in a paper. I just wonder whether it’s always what a CAR story goes like. For me, it seems more like a collection of data analysis on some topic than a news story.

    • http://twitter.com/davidherzog David Herzog

      Yes, the Register really churned through a lot of Census data to help tell this story. A lot of CAR stories do include links to source data, methods and documents. It’s a good way of being transparent about how we do our work.

  • Johanna Somers

    Johanna Somers

    @font-face {
    font-family: “Cambria”;
    }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: “Times New Roman”; }div.Section1 { page: Section1; }

    In the article, State wastes millions helping sex predators
    avoid lockup, Christine Willmsen, describes how state prosecutors and defense
    attorneys send huge bills to the state of Washington as sex offenders, who have
    finished their prison terms, are tried for civil-commitment.  Willmsen’s database made up of financial
    statements and billing invoices revealed that defense teams have had as many as
    five experts working for them.  Willmsen was able to show the public that the state is
    spending over $12 million a year in legal bills. 

    She also wrote this article and
    package, at a very important time. 
    One sex offender in civil-commitment, David McCuistion, sued the state,
    “saying that the law violated his constitutional rights.”  He argument was that he should have
    another hearing to introduce new evidence.  The case went to the Supreme Court and the court voted in
    McCuistion’s favor.  However, the
    Attorney General’s Office asked the Supreme Court to take another look at the
    decision and the Supreme Court’s new decision is expected for “later this
    year.”  If the Supreme Court’s
    original decision remains, sex offenders could be having hearings each year,
    costing millions more dollars.

                She
    also collected data on how many sex offenders actually got civil-commitment out
    of all sex offenders in Washington prisons.  In a graphic, Mark Nowlin showed that 1,150 sex offenders out
    of 1,187 were released back into society in 2010 and that 37 went on to the
    Sexually Violent Predator Subcommittee. 
    Only a small number of those actually got civil-commitment.  There are currently 284 sex offenders in
    civil-commitment in Washington.  She
    also collected data from 2007 on the number of sex offenders around the country
    and which states they were located in.

                This
    data was accompanied by a vivid account of a woman who was raped after a jury
    released one of these high-risk sex offenders instead of placing him in
    civil-commitment.

     

    • http://twitter.com/davidherzog David Herzog

      Great find, a nice example of blending in data work to help shine the light on a little know program.

  • Kip Hill

    I read, “Fewer Homicides in New York City on Rainy Days, Analysis Shows” from the New York Times. The headline pretty much tells you everything you need to know. Looking at homicide statistics for the years 2003 to 2008 provided by NYPD and weather reports from the National Weather Service, Times reporters were able to determine that, on rainy days, the homicide rate was lower than on days with good weather. This effect, they argue, is amplified in the summer; while the average homicide rate over 10 days only drops by about 3 on rainy days throughout the year from 17 to 14, during the summer the rate of ten good-weather Saturdays begins at a much higher 24 and drops to around 16 per day after an inch of rain.

    The reporters even go so far as to analyze the amount of rain and the homicide rate. Using the provided weather reports, they are able to determine that rain of between half an inch and a full inch of rain causes the rate to rise from 14 to 15, and when there is less than half an inch of rain the difference in the homicide rate is negligible. The reporters temper the predictive nature of these statistics by pointing out that only 1 in 20 days during the summer feature rainfalls of half an inch or more, pointing out the statistical significance of the ongoing anomaly in 2009.

    Where the article begins to stray from statistical backing is probably its weakest in terms of persuasive value, but the anecdotal observations of actual NYPD detectives provides the most interesting and memorable details of the story, emphasizing the need for to supplement statistics with a “human face.” For example, the story makes the assertion that homicides are more difficult to solve because of the rain, but never provides the statistical backing that must exist in the data to make that claim (surely the NYPD reports distinguish solved homicides from open cases). Instead, they use the observations of a retired detective and his recollection of an unsolved shooting during a downpour to point out that, logically, homicides during a deluge are more difficult to solve.

    You can’t recreate the quote “Everybody’s out partying, people start drinking, old beefs pop up, and people get their beer muscles out and start fighting” using stats. It’s just not possible.

    • http://twitter.com/davidherzog David Herzog

      This is a good example of how you can find a story by cross-referencing different data sets. The Times also has looked at the relationship between heat and homicides. The lead reporter, Andy Lehren, got his master’s degree here.

      • http://www.facebook.com/profile.php?id=1300560063 Kip Hill

        Mr. Lehren did some excellent work on religious nonprofits with Diana Henriques as well.

Set your Twitter account name in your settings to use the TwitterBar Section.