Making the most of big data

Big data is coming, and with it the potential for publishers to reconstitute the meaning of quality journalism.

Martin column

The Guardian’s data map shows local life expectancy mapped against London’s tube stations

The Pew Research Centre recently released a study showing that while there is a great deal of anticipation there is also a great deal of trepidation about the Age of Big Data.

Amongst other questions, the survey asks who will control all this data? How and where will it be stored? Will the potential positives outweigh the negatives, such as loss of privacy? There are plenty of reasons for concerns about how massive volumes of data will be treated.

And as a result, there is a key role for publishers to play – but it relies on publishers getting smarter about how they work within this evolving data landscape.

“For publishers parsing the ever increasing volume of data presents a big challenge and an equally big opportunity”

So firstly, what is Big Data? In essence, it’s the “gathering, analysis and application of enormous amounts of information generated by computers and other technology”. Of course, it is human usage of computers and other technology that generates this data. The biggest outstanding question, from a practical perspective, is how efficiently it can be analysed.

The other key point is that, for now, most of the data collection is about consumer behaviour. While that has obvious value for publishing sales teams, it has the potential to contribute to a consumer backlash. Many respondents to the Pew survey were ambivalent, at best, about the collection of more and more information about their habits, describing it as “creepy”.

For publishers parsing the ever increasing volume of data presents a big challenge and an equally big opportunity. Facebook is one example of a platform that uses high volumes of consumer data to present more relevant pathways and information sets to its users – both individuals and advertising customers. Google, of course, does the same.

Publishers need to take a leaf from the tech giants’ book and start building data analysis capability.

And from a journalistic perspective there is an ever-increasing amount of freely available data that can be used for the public good. Check out In combination with FOI requests and specialist commercial datasets, powerful new forms of storytelling are beginning to emerge around the world.

Wikileaks should have opened everyone’s eyes to the power of databases, but in many ways that example was not new. What is new is making data visual and then generating stories that would otherwise be hidden.

The Guardian has been working in this field for a couple of years now. Over that time the type of stories they have produced, and the kinds of datasets they have been able to access have evolved. The news organisation claims that “not only is data journalism changing in itself, it’s changing journalism too”.

The Guardian’s data visualisation blog lists recent examples, one of which is a map showing local life expectancy data mapped against London’s tube stations:

Another recent project shows real time reactions to the London Olympics, taking Twitter feeds across all events and presenting them in a unique application designed to run on web and Android –

The tools, software applications and datasets are integral, but as The Guardian’s data editor Simon Rogers writes, “data journalism is not graphics and visualisations. It’s about telling the story in the best way possible”.

Across the Atlantic, The Sacramento Bee has also been publishing searchable data and data driven graphics for several years too. It has combined segmented, or one-off, data and visualisations and longitudinal pieces updated over time (eg. tracking unemployment by region) to produce an archive of rich material.

In Australia, however, we’re still playing catch up. There are some promising signs, with new data-related appointments in Fairfax newsrooms, but it’s not yet enough to call it a trend. As the tools and sources become more prevalent for data-driven journalism in Australia, publishers have to start thinking harder about how they can take advantage of the new age of data to deliver smarter, more engaging journalism.

Pew Research Centre report:

Hugh Martin is CEO of Crown Content, publisher of Margaret Gee’s Media Guide.

Leave a comment