Monday, August 1, 2011

Data journalism at the guardian: what is it and how we do it?

Error in deserializing body of reply message for operation 'Translate'. The maximum string content length quota (8192) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 2, position 8264.
Error in deserializing body of reply message for operation 'Translate'. The maximum string content length quota (8192) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 1, position 8780.
Data abstract Data journalism. What is it and how is it changing? Photograph: Alamy

Here's an interesting thing: data journalism is becoming part of the establishment. Not in an Oxbridge elite kind of way (although here's some data on that) but in the way it is becoming the industry standard.

Two years ago, when we launched the Datablog, all this was new. People still asked if getting stories from data was really journalism and not everyone had seen Adrian Holovaty's riposte. But once you've had MPs expenses and Wikileaks, the startling thing is that no-one asks those questions anymore. Instead, they want to know, "how do we do it?"

Meanwhile every day brings newer and more innovative journalists into the field, and with them new skills and techniques. So, not only is data journalism changing in itself, it's changing journalism too.

These are some of the threads from my recent talks I thought it would be good to put in one place - especially now we've got an honourable mention in the Knight Batten award for journalistic innovation. This is about how we do it at the Guardian. In 10 brief points.

Nightingale graphic Florence Nightingale's 'coxcomb' diagram on mortality in the army

Data journalism has been around as long as there's been data - certainly at least since Florence Nightingale's famous graphics and report into the conditions faced by British soldiers of 1858. The first ever edition of the Guardian's news coverage was dominated by a large (leaked) table listing every school in Manchester, its costs and pupil numbers.

The big difference? Data was published in books, very expensive books where graphics are referred to as 'figures'. Now we have spreadsheets and files formatted for computers. Which means we can make the computers ask the questions.

But now statistics have become democratised, no longer the preserve of the few but of everyone who has a spreadsheet package on their laptop, desktop or even their mobile and tablet. Anyone can take on a fearsome set of data now and wrangle it into shape. Of course, they may not be right, but now you can easily find someone to help you. We are not wandering alone any more.

Straight Statistics will give you a thousand examples of journalists taking those numbers and running with them in completely the wrong direction, but you don't have to go too far to find decent data journalism taking place. Even if it's not woven into the fabric of many of the oldest newspapers and news organisations, there are plenty of agile independent groups - see ProPublica, Wheredoesmymoneygo? and the Sunlight Foundation - who know what they're doing. Data journalism is all about diverse sources.

At the Guardian, being part of the news process means that we're part of the news desk (news organisations are obsessed with internal geography), go to the key news meetings and try to make sure that data is part of editorial debate.

Sometimes. There's now so much data out there in the world that we try to provide the key facts for each story - and finding the right information can be as much of a lengthy journalistic task as finding the right interviewee for an article. We've started providing searches into world government data and international development data.

Read more about this map

The datasets are getting massive - 391,000 records for Wikileaks' Iraq release, millions for the Treasury Coins database. The indices of multiple deprivation, which is how the government measures poverty across England, has 32,482 records. Increasingly government data comes in big packages about tiny things. Making that data more accessible and easier to do stuff with has become part of the datajournalism process.

It just is. We spend hours making datasets work, reformatting pdfs, mashing datasets together. You can see from this prezi how much we go through before we get the data to you. Mostly, we act as the bridge between the data (and those who are pretty much hopeless at explaining it) and the people out there in the real world who want to understand what that story is really about.

Traditionally, some of the worst data journalism involved spending weeks on a single dataset, noodling around and eventually producing something mildly diverting. Some of the best involves weeks of investigative data management before coming up with incredible scoops. But increasingly there's a new short-form of data journalism, which is about swiftly finding the key data, analysing it and guiding readers through it while the story is still in the news. The trick is to produce these news data analyses, using the tech we have, as quickly as we can. And still get it right.

Especially with the free tools we use such as Google Fusion Tables, Many Eyes, Google Charts or Timetric - and you can see some of the stuff our users have produced and posted on our Flickr group.

Good design still really matters. Something like this guide to the senior civil service (designed by Guardian graphic artist Jenny Ridley), or who knows who in the News of the World phone hacking affair (produced by journalist James Ball and designer Paul Scruton) work because they're designed, not by machine, but by humans who understand the issues involved.

Civil service map Civil service map. Click image to get interactive graphic by Jenny Ridley Photograph: Guardian

You can become a top coder if you want. But the bigger task is to think about the data like a journalist, rather than an analyst. What's interesting about these numbers? What's new? What would happen if I mashed it up with something else? Answering those questions is more important than anything else.

Nato attacks on Libya interactive Interactive guide to Nato attacks on Libya

This stuff works best when it's a combination of both. This guide to Nato operations in Libya is dynamically fed from a spreadsheet, which updates from the Nato daily action briefing. It looks good because it's been well-designed; it works because it's easy to update every day.

Data journalism is not graphics and visualisations. It's about telling the story in the best way possible. Sometimes that will be a visualisation or a map (see the work of David McCandless or Jonathan Stray).

But sometimes it's a news story. Sometimes, just publishing the number is enough.

If data journalism is about anything, it's the flexibility to search for new ways of storytelling. And more and more reporters are realising that. Suddenly, we have company - and competition. So being a data journalist is no longer unusual.

It's just journalism.

Simon Rogers edits the Guardian Datastore and Datablog (@smfrogers, @datastore)

Tell us in the comment field below

Data journalism and data visualisations from the Guardian

• Search the world's government data with our gateway

• Search the world's global development data with our gateway

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

• Get the A-Z of data
• More at the Datastore directory

• Follow us on Twitter
• Like us on Facebook


View the original article here

No comments:

Post a Comment