Skip to content

Latest commit

 

History

History
779 lines (457 loc) · 79.9 KB

06-deliveringdata.asc

File metadata and controls

779 lines (457 loc) · 79.9 KB

Delivering Data

06 00 cover

Once you’ve had a good look at your data and decided that there’s something interesting to write about, how can you deliver it to the public? This section opens with short anecdotes about how leading data journalists have served their data up to their readers—​from infographics to open data platforms to download links. Then we take a more extended look at how to build news apps, and the ins and outs of data visualization. Finally, we take a look at what you can do to engage your audience around your project.

Presenting Data to the Public

There are lots of different ways to present your data to the public—​from publishing raw datasets with stories, to creating beautiful visualizations and interactive web applications. We asked leading data journalists for tips on how to present data to the public.

To Visualize or Not to Visualize?

There are times when data can tell a story better than words or photos, and this is why terms like news application'' and data visualization'' have attained buzzword status in so many newsrooms of late. Also fueling interest is the bumper crop of (often free) new tools and technologies designed to help even the most technically challenged journalist turn data into a piece of visual storytelling.

Tools like Google Fusion Tables, Many Eyes, Tableau, Dipity, and others make it easier than ever to create maps, charts, graphs, or even full-blown data applications that heretofore were the domain of specialists. But with the barrier to entry now barely a speed bump, the question facing journalists is now less about whether you can turn your dataset into a visualization, but whether you should. Bad data visualization is worse in many respects than none at all.

Aron Pilhofer, New York Times

Using Motion Graphics

With a tight script, well-timed animations, and clear explanations, motion graphics can serve to bring complex numbers or ideas to life, guiding your audience through the story. Hans Rosling’s video lectures are a good example of how data can come to life to tell a story on the screen. Whether or not you agree with their methodology, I also think the Economist’s Shoe-throwers' index is a good example of using video to tell a numbers-based story. You wouldn’t, or shouldn’t, present this graphic as a static image. There’s far too much going on. But having built up to it step by step, you’re left with an understanding of how and why they got to this index. With motion graphics and animated shorts, you can reinforce what your audience is hearing. A voice-over with explanatory visuals provides a very powerful and memorable way of telling a story.

Lulu Pinney, freelance infographic designer

Telling the World

Our workflow usually starts in Excel. It is such an easy way to quickly work out if there’s something interesting in the data. If we have a sense that there is something in it, then we go to the news desk. We’re really lucky as we sit right next to the main news desk at the Guardian. Then we look at how we should visualize it or show it on the page. Then we write the post that goes with it. When I’m writing I usually have a cut-down version of the spreadsheet next to the text editor. Often I’ll do bits of analysis while I’m writing to pick out interesting things. Then I’ll publish the post and spend a bit of time Tweeting about it, writing to different people, and making sure that it is linked to from all the right places.

Half of the traffic from some of our posts will come from Twitter and Facebook. We’re pretty proud that the average amount of time spent on a Datablog article is 6 minutes, compared to an average of 1 minute for the rest of the Guardian website. 6 minutes is a pretty good number, and time spent on the page is one of the key metrics when analyzing our traffic.

This also helps to convince our colleagues about the value of what we’re doing. That and the big data-driven stories that we’ve worked on that everyone else in the newsroom knows: COINS, WikiLeaks, and the UK riots. For the COINS spending data, we had 5-6 specialist reporters at the Guardian working to give their views about the data when it was released by the UK government. We also had another team of 5-6 when the UK government spending over £25k data was released—​including well-known reporters like Polly Curtis. WikiLeaks was also obviously very big, with lots of stories about Iraq and Afghanistan. The riots were also pretty big, with over 550k hits in two days.

But it is not just about the short term hits: it is also about being a reliable source of useful information. We try to be the place where you can get good, meaningful information on topics that we cover.

Simon Rogers, the Guardian

Publishing the Data

We often will embed our data onto our site in a visualization and in a form that allows for easy download of the dataset. Our readers can explore the data behind the stories through interacting in the visualization or using the data themselves in other ways. Why is this important? It increases the transparency of The Seattle Times. We are showing the readers the same data that we used to draw powerful conclusions. And who uses it? Our critics for sure, as well as those just interested in the story and all of its ramifications. By making the data available we also can enlist tips from these same critics and general readers on what we may have missed and what more we could explore—​all valuable in the pursuit of journalism that matters.

Cheryl Phillips, The Seattle Times

Opening Up Your Data

Giving news consumers easy access to the data we use for our work is the right thing to do for several reasons. Readers can assure themselves that we aren’t torturing the data to reach unfair conclusions. Opening up our data is in the social science tradition of allowing researchers to replicate our work. Encouraging readers to study the data can generate tips that may lead to follow-up stories. Finally, engaged readers interested in your data are likely to return again and again.

Steve Doig, Walter Cronkite School of Journalism, Arizona State University

Starting an Open Data Platform

At La Nación, publishing open data is an integral part of our data journalistic activities. In Argentina there is no Freedom of Information Act and no national data portal, so we feel strongly about providing our readers with access to the data that we use in our stories.

Hence we publish raw structured data through our integrated Junar platform as well as in Google Spreadsheets. We explicitly enable and encourage others to reuse our data, and we explain a bit about how to do this with documentation and video tutorials.

Furthermore, we’re presenting some of these datasets and visualizations in our Nación Data blog. We’re doing this in order to evangelize about data and data publishing tools in Argentina, and show others how we gathered our data, how we use it, and how they can reuse it.

Since we opened the platform in February 2012 , we’ve received suggestions and ideas for datasets, mostly from academic and research people, as well as students from universities that are very thankful every time we reply with a solution or specific dataset. People are also engaging with and commenting on our data on Tableau, and several times we have been the most commented and top viewed item on the service. In 2011, we had 7 out of the top 100 most viewed visualizations.

Angélica Peralta Ramos, La Nación (Argentina)

Making Data Human

As the discussion around big data bounds into the broader consciousness, one important part has been conspicuously missing—​the human element. While many of us think about data as disassociated, free-floating numbers, they are in fact measurements of tangible (and very often human) things. Data are tethered to the real lives of real people, and when we engage with the numbers, we must consider the real-world systems from which they came.

Take, for example, location data, which is being collected right now on hundreds of millions of phones and mobile devices. It’s easy to think of these data (numbers that represent latitude, longitude, and time) as "digital exhaust," but they are in fact distilled moments from our personal narratives. While they may seem dry and clinical when read in a spreadsheet, when we allow people to put their own data on a map and replay them, they experience a kind of memory replay that is powerful and human.

At the moment, location data is used by a lot of third parties—​application developers, big brands, and advertisers. While the second parties (telecoms and device managers) own and hold the data, the first party in this equation—​you—​has neither access or control over this information. At the NYTimes R&D group, we have launched a prototype project called OpenPaths to both allow the public to explore their own location data, and to experience the concept of data ownership. After all, people should have control of these numbers that are so closely connected to their own lives and experiences.

Journalists have a very important role in bringing this inherent humanity of data to light. By doing so, they have the power to change public understanding—​both of data and of the systems from which the numbers emerged.

Jer Thorp, Data Artist in Residence: New York Times R&D Group

Open Data, Open Source, Open News

2012 may well be the year of open news. It’s at the heart of our editorial ideology and a key message in our current branding. Amidst all this, it’s clear that we need an open process for data-driven journalism. This process must not only be fuelled by open data, but also be enabled by open tools. By the end of the year, we hope to be able to accompany every visualization we publish with access to both the data behind it and the code that powers it.

Many of the tools used in visualization today are closed source. Others come with restrictive licenses that prohibit the use of derivative data. The open source libraries that do exist often solve a single problem well but fail to offer a wider methodology. All together, this makes it difficult for people to build on each other’s work. It closes conversations rather than them opening up. To this end, we are developing a stack of open tools for interactive storytelling—​the Miso Project (@themisoproject).

We are discussing this work with a number of other media organizations. It takes community engagement to realize the full potential of open source software. If we’re successful, it will introduce a fundamentally different dynamic with our readers. Contributions can move beyond commenting to forking our work, fixing bugs, or reusing data in unexpected ways.

Alastair Dant, the Guardian

In the past few years, I’ve worked with a few gigabytes of data for projects or articles, from scans of typewritten tables from the 1960s to the 1.5 gigabytes of cables released by WikiLeaks. It’s always been hard to convince editors to systematically publish source data in an open and accessible format. Bypassing the problem, I added ``Download the Data'' links within articles, pointing to the archives containing the files or the relevant Google docs. The interest from potential reusers was in line with what we see in government-sponsored programs (i.e., very, very low). However, the few instances of reuse provided new insights or spurred conversations that are well worth the few extra minutes per project!

Nicolas Kayser-Bril, Journalism++

Know Your Scope

Know your scope. There’s a big difference between hacking for fun and engineering for scale and performance. Make sure you’ve partnered with people who have the appropriate skill set for your project. Don’t forget design. Usability, user experience, and presentation design can greatly affect the success of your project.

Chrys Wu, Hacks/Hackers

How to Build a News App

News applications are windows into the data behind a story. They might be searchable databases, sleek visualizations, or something else altogether. But no matter what form they take, news apps encourage readers to interact with data in a context that is meaningful to them: looking up crime trends in their area, checking the safety records of their local doctor, or searching political contributions to their candidate of choice.

More than just high-tech infographics, the best news apps are durable products. They live outside the news cycle, often by helping readers solve real-world problems, or answering questions in such a useful or novel way that they become enduring resources. When journalists at ProPublica wanted to explore the safety of American kidney dialysis clinics, they built an application that helped users check whether their hometown facility was safe. Providing such an important and relevant service creates a relationship with users that reaches far beyond what a narrative story can do alone.

Therein lies both the challenge and the promise of building cutting-edge news apps: creating something of lasting value. Whether you are a developer or a manager, any discussion about how to build a great news app should start with a product development mentality: keep a laser focus on the user, and work to get the most bang for your buck. So before you start building, it helps to ask yourself three questions, discussed in the following sections.

06 AA
Figure 01. Dialysis Facility Tracker (ProPublica)

Who Is My Audience and What Are Their Needs?

News apps don’t serve the story for its own sake—​they serve the user. Depending on the project, that user might be a dialysis patient who wants to know about the safety record of her clinic, or even a homeowner unaware of earthquake hazards near his home. No matter who it is, any discussion about building a news app, like any good product, should start with the people who are going to use it.

A single app might serve many users. For instance, a project called Curbwise, built by the Omaha (Nebraska) World-Herald serves homeowners who believe they are being overtaxed; curious residents who are interested in nearby property values; and real estate workers trying to keep track of recent sales. In each of those cases, the app meets a specific need that keeps users coming back.

Homeowners, for instance, might need help gathering information on nearby properties so they can argue that their taxes are unfairly high. Pulling together that information is time-consuming and complicated, a problem Curbwise solves for its users by compiling a user-friendly report of all the information they need to challenge their property taxes to local authorities. Curbwise sells that report for $20, and people pay for it because it solves a real problem in their lives.

Whether your app solves a real-world problem like Curbwise or supplements the narrative of a story with an interesting visualization, always be aware of the people who will be using it. Concentrate on designing and building features based on their needs.

How Much Time Should I Spend on This?

Developers in the newsroom are like water in the desert: highly sought-after and in short supply. Building news apps means balancing the daily needs of a newsroom against the long-term commitments it takes to build truly great products.

Say your editor comes to you with an idea: the City Council is set to have a vote next week about whether to demolish several historic properties in your town. He suggests building a simple application that allows users to see the buildings on a map.

As a developer, you have a few options. You can flex your engineering muscle by building a gorgeous map using custom software. Or you can use existing tools like Google Fusion Tables or open source mapping libraries and finish the job in a couple hours. The first option will give you a better app; but the second might give you more time to build something else with a better chance of having a lasting impact.

Just because a story lends itself to a complex, beautiful news app doesn’t mean you need to build one. Balancing priorities is critical. The trick is to remember that every app you build comes at a cost: namely, another potentially more impactful app you could have been working on instead.

How Can I Take Things to the Next Level?

Building high-end news apps can be time-consuming and expensive. That’s why it always pays to ask about the payoff. How do you elevate a one-hit wonder into something special?

Creating an enduring project that transcends the news cycle is one way. But so is building a tool that saves you time down the road (and open sourcing it!), or applying advanced analytics to your app to learn more about your audience.

Lots of organizations build Census maps to show demographic shifts in their cities. But when the Chicago Tribune news apps team built theirs, they took things to the next level by developing tools and techniques to build those maps quickly, which they then made available for other organizations to use.

At my employer, the Center for Investigative Reporting, we coupled a simple searchable database with a fine-grained event tracking framework that allowed us to learn, among other things, how much users value serendipity and exploration in our news apps.

At the risk of sounding like a bean-counter, always think about return on investment. Solve a generic problem; create a new way to engage users; open source parts of your work; use analytics to learn more about your users; or even find cases like Curbwise where part of your app might generate revenue.

Wrapping Up

News application development has come a long way in a very short time. News Apps 1.0 were a lot like Infographics 2.0—​interactive data visualizations, mixed with searchable databases, designed primarily to advance the narrative of the story. Now, many of those apps can be designed by reporters on deadline using open source tools, freeing up developers to think bigger thoughts.

News Apps 2.0, where the industry is headed, is about combining the storytelling and public service strengths of journalism with the product development discipline and expertise of the technology world. The result, no doubt, will be an explosion of innovation around ways to make data relevant, interesting and especially useful to our audience—​and at the same time, hopefully helping journalism do the same.

Chase Davis, Center for Investigative Reporting

News Apps at ProPublica

A news application is a big interactive database that tells a news story. Think of it like you would any other piece of journalism. It just uses software instead of words and pictures.

By showing each reader data that is specific to them, a news app can help each reader understand a story in a way that’s personally meaningful to them. It can help a reader understand their personal connection to a broad national phenomenon, and help them attach what they know to what they don’t know, and thereby encourage a deep understanding of abstract concepts.

We tend to build news apps when we have a dataset (or think we can acquire a dataset) that is national in scope yet granular enough to expose meaningful details.

A news app should tell a story, and just like any good news story, it needs a headline, a byline, a lead, and a nut graph. Some of these concepts can be hard to distinguish in a piece of interactive software, but they’re there if you look closely.

Also, a news app should be generative, meaning it should generate more stories and more reporting. ProPublica’s best apps have been used as the basis for local stories.

For instance, take our Dollars for Docs news app. It tracked, for the first time, millions of dollars of payments by drug companies to doctors, for consulting, speaking, and so on. The news app we built lets readers look up their own doctor and see the payments they’ve received. Reporters at other news organizations also used the data. More than 125 local news organizations, including the Boston Globe, Chicago Tribune, and the St. Louis Post-Dispatch did investigative stories on local doctors based on Dollars for Docs data.

A few of these local stories were the result of formal partnerships, but the majority were done quite independently—​in some cases, we didn’t have much, if any, knowledge that the story was being worked on until it came out. As a small but national news organization, this kind of thing is crucial for us. We can’t have local knowledge in 125 cities, but if our data can help reporters who have local knowledge tell stories with impact, we’re fulfilling our mission.

One of my favorite news apps is the Los Angeles Times’s Mapping L.A., which started out as a crowdsourced map of Los Angeles’s many neighborhoods, which up until Mapping L.A. launched, had no independent, widely-accepted set of boundaries. After the initial crowdsourcing project, the Times has been able to use neighborhoods as a framing device for great data reporting—​things like crime rate by neighborhood, school quality by neighborhood, etc., which they wouldn’t have been able to do before. So not only is Mapping L.A. both broad and specific, it’s generative, and it tells people’s own stories.

The resources necessary to build a news app range pretty widely. The New York Times has dozens of people working on news apps and on interactive graphics. But Talking Points Memo made a cutting edge political poll tracker app with two staffers, neither of whom had computer science degrees.

Like most newsroom-based coders, we follow a modified Agile methodology to build our apps. We iterate quickly and show drafts to the other folks in the newsroom we’re working with. Most importantly, we work really closely with reporters and read their drafts—​even early ones. We work much more like reporters than like traditional programmers. In addition to writing code, we call sources, gather information, and build expertise. It would be pretty difficult to make a good news app using material we don’t understand.

Why should newsrooms be interested in producing data-driven news apps? Three reasons: It’s great journalism, it’s hugely popular—​ProPublica’s most popular features are news apps—​and if we don’t do it, somebody else will. Think of all the scoops we’d miss! Most importantly, newsrooms should know that they can all do this too. It’s easier than it looks.

Scott Klein, ProPublica

Visualization as the Workhorse of Data Journalism

Before you launch into trying to chart or map your data, take a minute to think about the many roles that static and interactive graphic elements play in your journalism.

In the reporting phase, visualizations can:

  • Help you identify themes and questions for the rest of your reporting

  • Identify outliers: good stories, or perhaps errors, in your data

  • Help you find typical examples

  • Show you holes in your reporting

Visualizations also play multiple roles in publishing. They can:

  • Illustrate a point made in a story in a more compelling way

  • Remove unnecessarily technical information from prose

  • Particularly when they are interactive and allow exploration, provide transparency about your reporting process to your readers

These roles suggest you should start early and often with visualizations in your reporting, whether or not you start electronic data or records. Don’t consider it a separate step, something to be considered after the story is largely written. Let this work help guide your reporting.

Getting started sometimes means just putting the notes you’ve already taken in a visual form. Consider the graphic in Farm Subsidies Over Time (Washington Post), which ran in the Washington Post in 2006.

06 MM
Figure 02. Farm Subsidies Over Time (Washington Post)

It shows the portion of farm income associated with subsidies and key events over the past 45 years, and was built over a series of months. Finding data that could be used over time with similar definitions and similar meanings was a challenge. Investigating all of the peaks and troughs helped us keep context in mind as we did the rest of our reporting. It also meant that one chore was pretty much finished before the stories were written.

Here are some tips for using visualization to start exploring your datasets.

Tip 1: Use small multiples to quickly orient yourself in a large dataset

I used this technique at the Washington Post when we were looking into a tip that the George W. Bush administration was awarding grants on political, not substantive, grounds. Most of these aid programs are done by formula, and others have been funded for years, so we were curious whether we might see the pattern by looking at nearly 1,500 different discretionary streams.

I created a graph for each program, with the red dots indicating a presidential election year and the green dots indicating a congressional year. The problem: yes, there was a spike in the six months before the presidential election in several of these programs—​the red dots with the peak numbers next to them—​but it’s the wrong election year. The pattern consistently showed up during the 2000 presidential election between Al Gore and George W. Bush, not the 2004 election.

06 NN
Figure 03. HHS Grants: sparklines help in story-spotting (Washington Post)

This was really easy to see in a series of graphs rather than a table of numbers, and an interactive form let us check various types of grants, regions and agencies. Maps in small multiples can be a way to show time and place on a static image that’s easy to compare—​sometimes even easier than an interactive.

This example was created with a short program written in PHP, but it’s now much easier to do with Excel 2007 and 2010’s sparklines. Edward Tufte, the visualization expert, invented these ``intense, simple, word-like graphics'' to convey information in a glance across a large dataset. You now see them everywhere, from the little graphs under stock market quotations to win-loss records in sports.

Tip 2: Look at your data upside down and sideways

When you’re trying to understand a story or a dataset, there’s no wrong way to look at it; try it every way you can think of, and you’ll get a different perspective. If you’re reporting on crime, you might look at one set of charts with change in violent crimes in a year; another might be the percent change; the other might be a comparison to other cities; and another might be a change over time. Use raw numbers, percentages, and indexes.

Look at them on different scales. Try following the rule that the x-axis must be zero. Then break that rule and see if you learn more. Try out logarithms and square roots for data with odd distributions.

Keep in mind the research done on visual perception. William Cleveland’s experiments showed that the eye sees change in an image when the average slope is about 45 degrees. This suggests you ignore the admonitions to always start at zero and instead work toward the most insightful graphic. Other research in epidemiology has suggested you find a target level as a boundary for your chart. Each of these ways helps you see the data in different ways. When they’ve stopped telling you anything new, you know you’re done.

Tip 3: Don’t assume

Now that you’ve looked at your data a variety of ways, you’ve probably found records that don’t seem right—​you may not understand what they meant in the first place, or there are some outliers that seem like they are typos, or there are trends that seem backwards.

If you want to publish anything based on your early exploration or in a published visualization, you have to resolve these questions and you can’t make assumptions. They’re either interesting stories or mistakes; interesting challenges to common wisdom or misunderstanding.

It’s not unusual for local governments to provide spreadsheets filled with errors, and it’s also easy to misunderstand government jargon in a dataset.

First, walk back your own work. Have you read the documentation, its caveats and does the problem exist in the original version of the data? If everything on your end seems right, then it’s time to pick up the phone. You’re going to have to get it resolved if you plan to use it, so you might as well get started now.

That said, not every mistake is important. In campaign finance records, it’s common to have several hundred postal codes that don’t exist in a database of 100,000 records. As long as they’re not all in the same city or within a candidate, the occasional bad data record just doesn’t matter.

The question to ask yourself is: if I were to use this, would readers have a fundamentally accurate view of what the data says?

Tip 4: Avoid obsessing over precision

The flip side of not asking enough questions is obsessing over precision before it matters. Your exploratory graphics should be generally correct, but don’t worry if you have various levels of rounding, if they don’t add up to exactly 100 percent or if you are missing one or two years' data out of 20. This is part of the exploration process. You’ll still see the big trends and know what you have to collect before it’s time for publication.

In fact, you might consider taking away labeling and scale markers, much like the charts above, to even better get an overall sense of the data.

Tip 5: Create chronologies of cases and events

At the start of any complex story, begin building chronologies of key events and cases. You can use Excel, a Word document, or a special tool like TimeFlow for the task, but at some point you will find a dataset you can layer behind. Reading through it periodically will show you what holes are in your reporting that have to be filled out.

Tip 6: Meet with your graphics department early and often

Brainstorm about possible graphics with the artists and designers in your newsroom. They will have good ways to look at your data, suggestions of how it might work interactively, and know how to connect data and stories. It will make your reporting much easier if you know what you have to collect early on, or if you can alert your team that a graphic isn’t possible when you can’t collect it.

Tips For Publication

You might have spent only a few days or few hours on your exploration, or your story might have taken months to report. But as it becomes time to move to publication, two aspects become more important.

Remember that missing year you had in your early exploration? All of a sudden, you can’t go any further without it. All of that bad data you ignored in your reporting? It’s going to come back to haunt you. The reason is that you can’t write around bad data. For a graphic, you either have everything you need or you don’t, and there’s no middle ground.

Match the effort of the data collection with the interactive graphic

There’s no hiding in an interactive graphic. If you are really going to have your readers explore the data any way they want, then every data element has to be what it claims to be. Users can find any error at any time, and it could haunt you for months or years. If you’re building your own database, it means you should expect to proofread, fact check, and copyedit the entire database. If you’re using government records, you should decide how much spot-checking you’ll do, and what you plan to do when you find the inevitable error.

Design for two types of readers

The graphic—​whether it’s a standalone interactive feature or a static visualization that goes with your story—​should satisfy two different kinds of readers. It should be easy to understand at a glance, but complex enough to offer something interesting to people who want to go further. If you make it interactive, make sure your readers get something more than a single number or name.

Convey one idea, then simplify

Make sure there is one single thing you want people to see? Decide on the overwhelming impression you want a reader to get, and make everything else disappear. In many cases, this means removing information even when the Internet allows you to provide everything. Unless your main purpose is in transparency of reporting, most of the details you collected in your timeline and chronology just aren’t very important. In a static graphic, it will be intimidating. In an interactive graphic, it will be boring.

Sarah Cohen, Duke University

Using Visualizations to Tell Stories

Data visualization merits consideration for several reasons. Not only can it be strikingly beautiful and attention getting—​valuable social currency for sharing and attracting readers—​it also leverages a powerful cognitive advantage: fully half of the human brain is devoted to processing visual information. When you present a user with an information graphic, you are reaching them through the mind’s highest-bandwidth pathway. A well-designed data visualization can give viewers an immediate and profound impression, and cut through the clutter of a complex story to get right to the point.

But unlike other visual media—​such as still photography and video—​data visualization is also deeply rooted in measurable facts. While aesthetically engaging, it is less emotionally charged, more concerned with shedding light than heat. In an era of narrowly-focused media that is often tailored towards audiences with a particular point of view, data visualization (and data journalism in general) offers the tantalizing opportunity for storytelling that is above all driven by facts, not fanaticism.

Moreover, like other forms of narrative journalism, data visualization can be effective for both breaking news—​quickly imparting new information like the location of an accident and the number of casualties—​and for feature stories, where it can go deeper into a topic and offer a new perspective, to help you see something familiar in a completely new way.

Seeing the Familiar in a New Way

In fact, data visualization’s ability to test conventional wisdom is exemplified by an interactive graphic published by The New York Times in late 2009, a year after the global economic crisis began. With the United States' national unemployment rate hovering near 9 percent, users could filter the US population by various demographic and educational filters to see how dramatically rates varied. As it turned out, the rate ranged from less than 4% for middle-aged women with advanced degrees to nearly half of all young black men who had not finished high school, and moreover this disparity was nothing new—​a fact underscored by fever lines showing the historic values for each of these groups.

06 GG 01
Figure 04. The Jobless Rate for People Like You (New York Times)

Even after you’ve stopped looking it, a good data visualization gets into your head and leaves a lasting mental model of a fact, trend, or process. How many people saw the animation distributed by tsunami researchers in December 2004, which showed cascading waves radiating outward from an Indonesian earthquake across the Indian Ocean, threatening millions of coastal residents in South Asia and East Africa?

Data visualizations—​and the aesthetic associations they engender—​can even become cultural touchstones, such as the representation of deep political divisions in the United States after the 2000 and 2004 elections, when red'' Republican-held states filled the heartland and blue'' Democratic states clustered in the Northeast and far West. Never mind that in the US media before 2000, the main broadcast networks had freely switched between red and blue to represent each party, some even choosing to alternate every four years. Thus some Americans' memories of Ronald Reagan’s epic 49-state ``blue'' landslide victory for the Republicans in 1984.

But for every graphic that engenders a visual cliché, another comes along to provide powerful factual testimony, such as The New York Times' 2006 map that used differently sized circles to show where hundreds of thousands of evacuees from New Orleans were now living, strewn across the continent by a mixture of personal connections and relocation programs. Would these "stranded" evacuees ever make it back home?

So now that we’ve discussed the power of data visualization, it’s fair to ask: when should we use it, and when should we not use it? First, we’ll look at some examples of where data visualization might be useful to help tell a story to your readers.

Showing Change Over Time

Perhaps the most common use of data visualization—​as personified by the humble fever chart—​is to show how values have changed over time. The growth of China’s population since 1960 or the spike in unemployment since the economic crash of 2008 are good examples. But data visualization also can very powerfully show change over time through other graphic forms. The Portuguese researcher Pedro M. Cruz used animated circle charts to dramatically show the decline of western European empires since the early 19th century. Sized by total population, Britain, France, Spain, and Portugal pop like bubbles as overseas territories achieve independence. There go Mexico, Brazil, Australia, India, and wait for it…​there go many African colonies in the early sixties, nearly obliterating France.

A graph by the Wall Street Journal shows the number of months it took a hundred entrepreneurs to reach the magic number of $50 million in revenues. Created using the free charting and data analysis tool Tableau Public, the comparison resembles the trails of multiple airplanes taking off, some fast, some slow, some heavy, plotted over each other.

Speaking of airplanes, another interesting graph showing change over time plots the market share of major US airlines during several decades of industry consolidation. After the Carter administration deregulated passenger aviation, a slew of debt-financed acquisitions created national carriers out of smaller regional airlines, as this graphic by The New York Times illustrates.

06 GG 02 b
Figure 05. Converging Flight Paths (New York Times)

Given that almost all casual readers view the horizontal ``x'' axis of a chart as representing time, sometimes it’s easy to think that all visualizations should show change over time.

Comparing Values

06 GG 03
Figure 06. Counting the human cost of war (BBC)

However, data visualization also shines in the area of helping readers compare two or more discrete values, whether to put in context the tragic loss of servicemen and women in the Iraq and Afghan conflicts (by comparing them to the scores of thousands killed in Vietnam and the millions who died in World War II, as the BBC did in an animated slideshow accompanying their casualties database); or when National Geographic, using a very minimalist chart, showed how much more likely you were to die of heart disease (1 in 5 chance) or stroke (1 in 24) than, say airplane crashes (1 in 5,051) or a bee sting (1 in 56,789) by showing the relative odds of dying (all overshadowed by a huge arc representing the odds of dying overall: 1 in 1!).

BBC, in collaboration with the agency Berg Design, also developed the website "Dimensions", which let you overlay the outlines of major world events—​the Deepwater Horizon oil spill or the Pakistan floods, for example—​over a Google map of your own community.

Showing Connections and Flows

France’s introduction of high-speed rail in 1981 didn’t literally make the country smaller, but a clever visual representation shows how much less time it now takes to reach different destinations than by conventional rail. A grid laid over the country appears square in the before'' image, but is squashed centrally towards Paris in the after'' one, showing not just that outbound destinations are "closer," but that the greatest time gains occur in the first part of the journey, before the trains reach unimproved tracks and have to slow down.

For comparisons between two separate variables, look at Ben Fry’s chart evaluating the performance of Major League Baseball teams relative to their payrolls. In the left column, the teams are ranked by their record to date, while on the right is the total of their player salaries. A line drawn in red (underperforming) or blue (overperforming) connects the two values, providing a handy sense of which team owners are regretting their expensive players gone bust. Moreover, scrubbing across a timeline provides a lively animation of that season’s ``pennant race'' to the finish.

06 GG 04
Figure 07. Salary vs. performance (Ben Fry)

Designing With Data

Similar in a way to graphing connections, flow diagrams also encode information into the connecting lines, usually by thickness and/or color. For example, with the Eurozone in crisis and several members incapable of meeting their debts, The New York Times sought to untangle the web of borrowing that tied EU members to their trading partners across the Atlantic and in Asia. In one state'' of the visualization, the width of the line reflects the amount of credit passing from one country to another, where a yellow to orange color ramp indicates how worrisome'' it is—​i.e., unlikely to be paid back!

On a happier topic, National Geographic magazine produced a deceptively simple chart showing the connections of three US cities—New York, Chicago and Los Angeles—​to major wine-producing regions, and how the transportation methods bringing product from each of the sources could result in drastically different carbon footprints, making Bordeaux a greener buy for New Yorkers than California wine, for example.

``SourceMap,'' a project started at MIT’s business school, uses flow diagrams to take a rigorous look at global procurement for manufactured products, their components and raw materials. Thanks to a lot of heavy research, a user can now search for products ranging from Ecco brand shoes to orange juice and find out from what corners of the globe it was sourced from, and its corresponding carbon footprint.

Showing Hierarchy

In 1991, the researcher Ben Shneiderman invented a new visualization form called the "treemap" consisting of multiple boxes concentrically nested inside of each other. The area of a given box represents the quantity it represents, both in itself and as an aggregate of its contents. Whether visualizing a national budget by agency and subagency, visualizing the stock market by sector and company, or a programming language by classes and sub-classes, the treemap is a compact and intuitive interface for mapping an entity and its constituent parts. Another effective format is the dendrogram, which looks like a more typical organization chart, where subcategories continue to branch off a single originating trunk.

06 GG 06
Figure 08. OpenSpending.org (Open Knowledge Foundation)

Browsing Large Databases

While sometimes data visualization is very effective at taking familiar information and showing it in a whole new light, what happens when you have brand-new information that people want to navigate? The age of data brings with it startling new discoveries almost every day, from Eric Fischer’s brilliant geographic analyses of Flickr snapshots to New York City’s release of thousands of previously confidential teacher evaluations.

These datasets are at their most powerful when users can dig in and drill down to the information that is most relevant to them.

In early 2010, The New York Times was given access to Netflix’s normally private records of what areas rent which movies the most often. While Netflix declined to disclose raw numbers, The Times created an engaging interactive database that let users browse the top 100-ranked rentals in 12 US metro areas, broken down to the postal code level. A color-graded ``heatmap'' overlaid on each community enabled users to quickly scan and see where a particular title was most popular.

Toward the end of that same year, the Times published the results of the United States decennial census—just hours after it was released. The interface, built in Adobe Flash, offered a number of visualization options and allowed users to browse down to every single census block in the nation (out of 8.2 million) to see the distribution of residents by race, income, and education. Such was the resolution of the data that when looking through the dataset in the first hours after publication, you wondered if you might be the first person in the world to explore that corner of the database.

Similar laudable uses of visualization as a database front-end include the BBC’s investigation of traffic deaths, and many of the attempts to quickly index large data dumps like WikiLeaks' release of the Iraq and Afghanistan war logs.

The 65k Rule

Upon receiving the first dump of Afghan war log data from WikiLeaks, the team processing it started talking about how excited they were to have access to 65,000 military records.

This immediately set alarms ringing amongst those who had experience with Microsoft Excel. Thanks to an historic limitation in the way that rows are addressed, the Excel import tool won’t process more than 65,536 records. In this case, it emerged that a mere 25,000 rows were missing!

The moral of this story (aside from avoiding using Excel for such tasks), is to always be suspicious of anyone boasting about 65,000 rows of data.

Alastair Dant, the Guardian

06 GG 07
Figure 09. Every death on the road in Great Britain 1999-2010 (BBC)

Envisioning Alternate Outcomes

In The New York Times, Amanda Cox’s ``porcupine chart'' of tragically optimistic US deficit projections over the years shows how sometimes what happened is less interesting than what didn’t happen. Cox’s fever line showing the surging budget deficit after a decade of war and tax breaks shows how unrealistic expectations of the future can turn out to be.

06 GG 08
Figure 10. Budget Forecasts, Compared With Reality (New York Times)

Bret Victor, a longtime Apple interface designer (and originator of the "kill math" theory of visualization to communicate quantitative information), has prototyped a kind of reactive document. In his example, energy conservation ideas include editable premises, whereby a simple step like shutting off lights in empty rooms could save Americans the output of from 2 to 40 coal plants. Changing the percentage referenced in the middle of a paragraph of text causes the text in the rest of the page to update accordingly!

For more examples and suggestions, here is a list of different uses for visualizations, maps and interactive graphics compiled by Matthew Ericson of The New York Times.

When Not To Use Data Visualization

In the end, effective data visualization depends on good, clean, accurate, and meaningful information. Just as many good quotes, facts, and descriptions power good narrative journalism, data visualization is only as good as the data that fuels it.

When your story can be better told through text or multimedia

Sometimes the data alone does not tell the story in the most compelling way. While a simple chart illustrating a trend line or summary statistic can be useful, a narrative relating the real-world consequences of an issue can be more immediate and impactful to a reader.

When you have very few data points

It has been said, a number in isolation doesn’t mean anything.'' A common refrain from news editors in response to a cited statistic is, compared to what?'' Is the trend going up or down? What is normal?

When you have very little variability in your data, no clear trend, or conclusion

Sometimes you plot your data in Excel or a similar charting app and discover that the information is noisy, has a lot of fluctuation, or has a relatively flat trend. Do you raise the baseline from zero to just below the lowest value, in order to give the line some more shape? No! Sounds like you have ambiguous data and need to do more digging and analysis.

When a map is not a map

Sometimes the spatial element is not meaningful or compelling, or distracts attention from more pertinent numeric trends, like change over time or showing similarities between non-adjacent areas.

When a table would do

If you have relatively few data points but have information that might be of use to some of your readers, consider just laying out the data in tabular form. It’s clean, easy to read and doesn’t create unrealistic expectations of "story." In fact, tables can be a very efficient and elegant layout for basic information.

Geoff McGhee, Stanford University

Different Charts Tell Different Tales

In this digital world, with the promise of immersive 3D experiences, we tend to forget that for such a long time we only had ink on paper. We now think of this static, flat medium as a second class citizen, but in fact over the hundreds of years we’ve been writing and printing, we’ve managed to achieve an incredible wealth of knowledge and practices to represent data on the page. While interactive charts, data visualizations, and infographics are all the rage, they forego many of the best practices we’ve learned. Only when you look back through the history of accomplished charts and graphs can we understand that bank of knowledge and bring it forward into new mediums.

Some of the most famous charts and graphs came out of the need to better explain dense tables of data. William Playfair was a Scottish polyglot who lived in the late 1700s to early 1800s. He singlehandedly introduced the world to many of the same charts and graphs we still use today. In his 1786 book, Commercial and Political Atlas, Playfair introduced the bar chart to clearly show the import and export quantities of Scotland in a new and visual way.

He then went on to popularize the dreaded pie chart in his 1801 book Statistical Breviary. The need for these new forms of charts and graphs came out of commerce, but as time passed, others appeared and were used to save lives. In 1854 John Snow created his now famous "Cholera Map of London" by adding a small black bar over each address where an incident was reported. Over time, an obvious density of the outbreak could be seen and action taken to curb the problem.

As time passed, practitioners of these new chart and graphs got bolder and experimented further, pushing the medium toward what we know today. André-Michel Guerry was the first to publish the idea of a map where individual regions were different colors based on some variable. In 1829 he created the first choropleth by taking regions in France and shading them to represent crime levels. Today we see such maps used to show political polling regions, who voted for whom, wealth distribution, and many other geographically linked variables. It seems like such a simple idea, but even today, it is difficult to master and understand if not used wisely.

06 TT 01
Figure 11. An early bar chart (William Playfair)
06 TT 02
Figure 12. Cholera map of London (John Snow)
06 TT 03
Figure 13. Choropleth map of France showing crime levels (André-Michel Guerry)

There are many tools a good journalist needs to understand and have in their toolbox for constructing visualizations. Rather than jump right in at the deep end, an excellent grounding in charts and graphs is important. Everything you create needs to originate from a series of atomic charts and graphs. If you can master the basics, then you can move onto constructing more complex visualizations which are made up from these basic units.

Two of the most basic chart types are bar charts and line charts. While they are very similar in their use cases, they can also differ greatly in their meaning. Let’s take for instance, company sales for each month of the year. We’d get 12 bars representing the amount of money brought in each month (A simple bar chart: useful to represent discrete information).

06 TT 04
Figure 14. A simple bar chart: useful to represent discrete information

Let’s look into why this should be bars rather than a line graph. Line graphs are ideal for continuous data. With our sales figures, it is the sum of the month, not continuous. As a bar, we know that in January, the company made $100 and in February it made $120. If we made this a line graph, it would still represent $100 and $120 on the first of each month, but with the line graph we estimate that on the 15th it looks as it the company made $110. Which isn’t true. Bars are used for discrete units of measurement, whereas lines are used when it is a continuous value, such as temperature.

06 TT 05
Figure 15. Simple line graphs: useful to represent continuous information

We can see that at 8:00 the temperature was 20C and at 9:00 it was 22C. If we look at the line to guess the temperature at 8:30 we’d say 21C, which is a correct estimate since temperature is continuous and every point isn’t a sum of other values; it represents the exact value at that moment or an estimate between two exact measurements.

Both the bar and line have a stacked variation (A stacked bar graph). This is an excellent storytelling tool that can work in different ways. Let’s take, for example, a company that has 3 locations.

For each month we have 3 bars, one for each of the shops—​36 total for the year. When we place them next to each other (A grouped bar graph), we can quickly see in which month which store was earning the most. This is one interesting and valid story, but there is another hidden within the same data. If we stack the bars, so we only have one for each month, we now lose the ability to easily see which store is the biggest earner, but now we can see which months the company does the best business as a whole.

06 TT 06
Figure 16. A grouped bar graph
06 TT 07
Figure 17. A stacked bar graph

Both of these are valid displays of the same information, but they are two different stories using the same starting data. As a journalist, the most important aspect of working with the data is that you first choose the story you are interested in telling. Is it which month is the best for business, or is it which store is the flagship? This is just a simple example, but it is really the whole focus of data journalism—​asking the right question before getting too far. The story will guide the choice of visualization.

The bar chart and line graph are really the bread and butter of any data journalist. From there you can expand into histograms, horizon graphs, sparklines, stream graphs, and others, which all share similar properties and are suited for slightly different situations—​including the amount of data or data sources, and location of the graphic in terms of the text.

In journalism, one of the very commonly used charting features is a map. Time, amount, and geography are common to maps. We always want to know how much is on one area versus another or how the data flows from one area to another. Flow diagrams and choropleths are very useful tools to have in your skill set when dealing with visualizations for journalism. Knowing how to color-code a map properly without misrepresenting or misleading readers is key. Political maps are usually color-coded as all or nothing for certain regions, even if a candidate only won one part of the country by 1%. Coloring does not have to be a binary choice; gradients of color based on groups can be used with care. Understanding maps is a large part of journalism. Maps easily answer the WHERE part of the 5 W’s.

Once you have mastered the basic type of charts and graphs, you can then begin to build-up more fancy data visualizations. If you don’t understand the basics, then you are building on a shaky foundation. In much the way you learn how to be a good writer—​keeping sentences short, keeping the audience in mind, and not overcomplicating things to make yourself sound smart, but rather conveying meaning to the reader—​you shouldn’t go overboard with the data either. Starting small is the most effective way to tell the story, slowly building only when needed.

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell.

— William Strunk Jr.
Elements of Style (1918)

It is OK to not use every piece of data in your story. You shouldn’t have to ask permission to be concise, it should be the rule.

Brian Suda, (optional.is)

Data Visualization DIY: Our Top Tools

What data visualization tools are out there on the Web that are easy to use—​and free? Here on the Datablog and Datastore, we try to do as much as possible using the Internet’s powerful free options.

That may sound a little disingenuous, in that we obviously have access to the Guardian’s amazing graphics and interactive teams for those pieces where we have a little more time—​such as this map of public spending (created using Adobe Illustrator) or this Twitter riots interactive.

But for our day-to-day work, we often use tools that anyone can—​and create graphics that anyone else can too.

So, what do we use?

Google Fusion Tables

This online database and mapping tool has become our default for producing quick and detailed maps, especially those where you need to zoom in. You get all the high resolution of Google Maps but it can open a lot of data—​100 MB of CSV, for instance. The first time you try it, Fusion tables may seem a little tricky—​but stick with it. We used it to produce maps like the Iraq one in The WikiLeaks war logs (the Guardian) and also border maps like Homelessness interactive map (the Guardian), about homelessness.

06 LL 01
Figure 18. The WikiLeaks war logs (the Guardian)
06 LL 02
Figure 19. Homelessness interactive map (the Guardian)

The main advantage is the flexibility—​you can can upload a KML file of regional borders, say—​and then merge that with a data table. It’s also getting a new user interface, which should make it easier to use.

You don’t have to be a coder to make one—​and this Fusion layers tool allows you to bring different maps together or to create search and filter options, which you can then embed on a blog or a site.

Note

Use shpescape to convert official .shp files into Fusion tables for you to use. Also, watch out for overcomplicated maps—​Fusion can’t cope with more than a million points in one cell.

Tableau Public

If you don’t need the unlimited space of the professional edition, Tableau Public is free. With it you can make pretty complex visualizations with up to 100,000 rows simply and easily. We use it when we need to bring different types of charts together, as in this map of top tax rates around the world (which also has a bar chart).

Or you can even use it as a data explorer, which is what we did in 2012 Presidential Campaign Finance (the Guardian) with the US federal elections spending data (although we ran out of space in the free public version…​something to watch out for). Tableau also needs the data formatted in quite specific ways for you to get the most out of it. But get through that and you have something intuitive that works well. La Nación in Argentina has built its entire data journalism operation around Tableau, for instance.

06 LL 03
Figure 20. 2012 Presidential Campaign Finance (the Guardian)

Tableau has some good online tutorials for you to start with, at http://www.tableausoftware.com/learn/training.

Note

Tableau is designed for PCs, although a Mac version is in the works. Use a mirror such as parallels to make it work.

Google Spreadsheet Charts

You can access this tool at http://www.google.com/google-d-s/spreadsheets/.

06 LL 04
Figure 21. UK government spending and taxation (the Guardian)

After something simple (like a bar or line chart, or a pie chart), you’ll find that Google spreadsheets (which you create from the documents bit of your Google account) can create some pretty nice charts—​including the animated bubbles used by Hans Rosling’s Gapminder. Unlike the charts API, you don’t need to worry about code; it’s pretty similar to making a chart in Excel, in that you highlight the data and click the chart widget. The customization options are worth exploring too; you can change colors, headings, and scales. They are pretty design-neutral, which is useful in small charts. The line charts have some nice options too, including annotation options.

Note

Spend some time with the chart customization options; you can create your own color palette.

Datamarket

Better known as a data supplier, Datamarket is actually a pretty nifty tool for visualizing numbers too. You can upload your own or use some of the many datasets they have to offer, but the options do get better if you get the Pro account.

Note

Datamarket works best with time series data, but check out their extensive data range.

Many Eyes

If ever a site needed a bit of TLC, it’s IBM’s Many Eyes. When it launched, created by Fernanda B. Viégas and Martin Wattenberg, it was a unique exercise in allowing people to simply upload datasets and visualize them. Now, with its creators working for Google, the site feels a little unloved with its muted color palettes; it hasn’t seen much new in the way of visualizations for some time.

06 LL 06
Figure 22. Doctor Who villains; the Guardian
Note

You can’t edit the data once you’ve uploaded it, so make sure you get it right before you create it.

Color Brewer

Not strictly a visualization tool, Color Brewer is really for choosing map colors. You can choose your base color and get the codes for the entire palette.

And Some More

If none of these are for you, it’s also worth checking out this DailyTekk piece, which has even more options. The ones above aren’t the only tools, just those we use most frequently. There are lots of others out there too, including:

  • Chartsbin, a tool for creating clickable world maps

  • iCharts, which specializes in small chart widgets

  • Geocommons, which shares data and boundary data to create global and local maps

  • Oh, and there’s also piktochart.com, which provides templates for those text/numbers visualizations that are popular at the moment.

Simon Rogers, the Guardian

How We Serve Data at Verdens Gang

News journalism is about bringing new information to the reader as quickly as possible. The fastest way may be a video, a photo, a text, a graph, a table, or a combination of these. Concerning visualizations, the purpose should be the same: quick information. New data tools enable journalists to find stories they couldn’t otherwise find, and present stories in new ways. Here are a few examples showing how we serve data at the most read newspaper in Norway, Verdens Gang (VG).

Numbers

This story is based on data from the Norwegian Bureau of Statistics, taxpayer data, and data from the national Lotto monopolist. In this interactive graph, the reader could find different kinds of information from each Norwegian county and municipality. The actual table is showing the percent of the income used on games. It was built using Access, Excel, MySql, and Flash.

Networks

We used social network analysis to analyze the relations between 157 sons and daughters of the richest people in Norway. Our analysis showed that heirs of the richest persons in Norway also inherited their parents' network. Altogether, there were more than 26,000 connections, and the graphics were all finished manually using Photoshop. We used Access, Excel, Notepad, and the social network analysis tool Ucinet.

06 RR 01
Figure 23. Mapping taxpayers data and Lotto data (Verdens Gang)
06 RR 02
Figure 24. Rich birds of a feather flock together (Verdens Gang)

Maps

In this animated heatmap combined with a simple bar chart, you can see crime incidents occur on a map of downtown Oslo, hour by hour, over the weekend for several months. In the same animated heatmap, you can see the number of police officers working at the same time. When crime really is happening, the number of police officers is at the bottom. It was built using ArcView with Spatial Analyst.

06 RR 03
Figure 25. Animated heat map (Verdens Gang)

Text Mining

For this visualization, we text mined speeches held by the seven Norwegian party leaders during their conventions. All speeches were analyzed, and the analyses supplied angles for some stories. Every story was linked to the graph, and the readers could explore and study the language of politicians. This was built using Excel, Access, Flash, and Illustrator. If this had been built in 2012, we would have made the interactive graph in JavaScript.

06 RR 04
Figure 26. Text mining speeches from party leaders (Verdens Gang)

Concluding Notes

When do we need to visualize a story? Most of the times we do not need to, but sometimes we want to do so to help our readers. Stories containing a huge amount of data quite often need visualization. However, we have to be quite critical when choosing what kind of data we are going to present. We know all kinds of stuff when we report about something, but what does the reader really need to know for the story? Perhaps a table is enough, or a simple graph showing a development from year A to year C. When working with data journalism, the point is not necessarily to present huge amounts of data. It’s about journalism!

There has been a clear trend in the last 2-3 years to create interactive graphs and tables that enable the reader to drill down into different themes. A good visualization is like a good picture. You understand what it is about just by looking at it for a moment or two. The more you look at the visual, the more you see. The visualization is bad when the reader does not know where to start or where to stop, and when the visualization is overloaded by details. In this scenario, perhaps a piece of text would be better?

John Bones, Verdens Gang

Public Data Goes Social

Data is invaluable. Access to data has the potential to illuminate issues in a way that triggers results. Nevertheless, poor handling of data can put facts in an opaque structure that communicates nothing. If it doesn’t promote discussion or provide contextual understanding, data may be of limited value to the public.

Nigeria returned to democracy in 1999 after lengthy years of military rule. Probing the facts behind data was taken as an affront to authority and was seen to be trying question the stained reputation of the junta. The Official Secrets Act compelled civil servants not to share government information. Even thirteen years after the return to democracy, accessing public data can be a difficult task. Data about public expenditure communicates little to the majority of the public, who are not well-versed in financial accounting and complex arithmetic.

With the rise of mobile devices and an increasing number of Nigerians online, with BudgIT we saw a huge opportunity to use data visualization technologies to explain and engage people around public expenditure. To do this, we have had to engage users across all platforms and to reach out to citizens via NGOs. This project is about making public data a social object and building an extensive network that demands change.

06 YY
Figure 27. The BudgIT cut app (BudgIT Nigeria)

To successfully engage with users, we have to understand what they want. What does the Nigerian citizen care about? Where do they feel an information gap? How can we make the data relevant to their lives? BudgIT’s immediate target is the average literate Nigerian connected to online forums and social media. In order to compete for the limited attention of users immersed in a wide variety of interests (gaming, reading, socializing) we need to present the data in a brief and concise manner. After broadcasting a snapshot of the data as a Tweet or an infographic, there’s an opportunity for a more sustained engagement with a more interactive experience to give users a bigger picture.

When visualizing data, it is important to understand the level of data literacy of our users. As beautiful and sophisticated as they may be, complex diagrams and interactive applications might not meaningfully communicate to our users based on their previous experiences with interpreting data. A good visualization will speak to the user in a language they can understand, and bring forth a story that they can easily connect with.

We have engaged over 10,000 Nigerians over the budget, and we profile them into three categories to ensure that optimum value is delivered. The categories are briefly explained below:

Occasional users

These are users who want information simply and quickly. They are interested in getting a picture of the data, not detailed analytics. We can engage them via tweets or interactive graphics.

Active users

Users who stimulate discussion, and use the data to increase their knowledge of a given area or challenge the assumptions of the data. For these users, we want to provide feedback mechanisms and the possibility to share insights with their peers via social networks.

Data hogs

These users want raw data for visualization or analysis. We simply give them the data for their purposes.

With BudgIT, our user engagement is based on the following:

Stimulating discussion around current trends

BudgIT keeps track of online and offline discussions and seeks to provide data around these topics. For example, with the fuel strikes in January 2012, there was constant agitation among the protesters on the need to reinstate fuel subsidies and reduce extravagant and unnecessary public expenditure. BudgIT tracked the discussion via social media and in 36 busy hours, built an app that allows citizens to reorganize the Nigerian budget.

Good feedback mechanisms

We engage with users through discussion channels and social media. Many users want to know about stories behind the data and many ask for our opinion. We make sure that our responses only explain the facts behind the data and are not biased by our personal or political views. We need to keep feedback channels open, actively respond to comments, and engage the users creatively to ensure that the community built around the data is sustained.

Make it local

For a dataset targeted at a particular group, BudgIT aims to localize its content and to promote a channel of discussion that connects to the needs and interests of particular groups of users. In particular, we’re interested in engaging users around issues they care about via SMS.

After making expenditure data available on yourbudgit.com, we reach out to citizens through various NGOs. We also plan to develop a participatory framework where citizens and government institutions can meet in town halls to define key items in the budget that need to be prioritized.

The project has received coverage in local and foreign media, from CP-Africa to the BBC. We have undertaken a review of the 2002-2011 budgets for the security sector for an AP journalist, Yinka Ibukun. Most media organizations are "data hogs" and have requested data from us to use for their reportage. We are planning further collaborations with journalists and news organizations in the coming months.

Oluseun Onigbinde, BudgIT Nigeria

Engaging People Around Your Data

Almost as important as publishing the data in the first place is getting a reaction from your audience. You’re human; you’re going to make mistakes, miss things, and get the wrong idea from time to time. Your audience is one of the most useful assets that you’ve got. They can fact-check and point out things that you may not have considered.

Engaging that audience is tricky, though. You’re dealing with a group of people who’ve been conditioned over years of Internet use to hop from site to site, leaving nothing but a sarcastic comment in their wake. Building a level of trust between you and your users is crucial; they need to know what they’re going to get, know how they can react to it and offer feedback, and know that that feedback is going to be listened to.

But first you need to think about what audience you’ve got, or want to get. That will both inform and be informed by the kind of data that you’re working with. If it’s specific to a particular sector, then you’re going to want to explore particular communications with that sector. Are there trade bodies that you can get in touch with that might be willing to publicize the resources that you’ve got and the work that you’ve done to a wider audience? Is there a community website or a forum that you can get in touch with? Are there specialist trade publications that may want to report on some of the stories that you’re finding in the data?

Social media is an important tool, too, though it again depends on the type of data that you’re working with. If you’re looking at global shipping statistics, for example, you’re unlikely to find a group on Facebook or Twitter that’ll be especially interested in your work. On the other hand, if you’re sifting through corruption indices across the world, or local crime statistics, that’s likely to be something that’s going to be of interest to a rather wider audience.

When it comes to Twitter, the best approach tends to be to contact high-profile figures, briefly explaining why your work is important, and including a link. With any luck, they’ll retweet you to their readers, too. That’s a great way to maximize exposure to your work with minimum effort—​though don’t badger people!

Once you’ve got people on the page, you need to think about how your audience going to interact with your work. Sure, they might read the story that you’ve written and look at the infographics or maps, but giving your users an outlet to respond is immensely valuable. More than anything, it’s likely to give you greater insight into the subject you’re writing about, informing future work on the topic.

Firstly, it goes without saying that you need to publish the raw data alongside your articles. Either host the data in comma-separated plain text, or host it in a third-party service like Google Docs. That way, there’s only one version of the data around, and you can update it as necessary if you find errors in the data that need correcting later. Better still, do both. Make it as easy as possible for people to get hold of your raw materials.

Then start to think about if there’s other ways that you can get the audience to interact. Keep an eye on metrics on which parts of your datasets are getting attention—​it’s likely that the most trafficked areas could have something to say that you might have missed. For example, you might not think to look at the poverty statistics in Iceland, but if those cells are getting plenty of attention, then there might be something there worth looking at.

Think beyond the comment box, too. Can you attach comments to particular cells in a spreadsheet? Or a particular region of an infographic? While most embeddable publishing systems don’t necessarily allow for this, it’s worth taking a look at if you’re creating something a little more bespoke. The benefits that it can bring to your data can’t be underestimated.

Make sure that other users can see those comments too—​they have almost as much value as the original data, in a lot of cases, and if you keep that information to yourself, then you’re depriving your audience of that value.

Finally, other people might want to publish their own infographics and stories based on the same sources of data—​think about how best to link these together and profile their work. You could use a hashtag specific to the dataset, for example, or if it’s highly pictorial, then you could share it in a Flickr group.

Having a route to share information more confidentially could be useful too—​in some cases it might not be safe for people to publicly share their contributions to a dataset, or they might simply not be comfortable doing so. Those people may prefer to submit information through an email address, or even an anonymous comments box.

The most important thing you can do with your data is share it as widely and openly as possible. Enabling your readers to check your work, find your mistakes, and pick out things that you might have missed will make both your journalism—​and the experience for your reader—​infinitely better.

Duncan Geere, Wired.co.uk