NEWS & JOURNALISM  • Department / Unit Academy Communications Media

Computer Assisted Reporting

Journalists who can identify, refine, and interrogate the information they need within a large data set are in-demand and publishing the most exciting work in the industry. In the Computer Assisted Reporting course at Columbia Journalism School, you spend five days learning the fundamentals: how to scrape data from the web, digitize PDFs and use other digital sources to compile your own data sets; a primer on Python analysis tool pandas; and how to build your initial analysis and visualizations.

Prerequisites

This data journalism course is designed for those with basic analytic skills or experience (Excel and/or databases). All students will need a laptop with administrative install abilities.

Skills learnt

  • Coding: Python programming (Including Scraping, regular expressions, Panda, Selenium, BeautifulSoup), SQL, DataScript
  • Data: Identifying and using large datasets to extract stories

Course objectives

  • Find stories within the data, and work within your team and newsroom to report them out
  • Learn the fundamentals of Python and the Python data analysis tool pandas
  • Scrape and clean publicly available data from the web
  • Automatically submit online forms and scrape the results
  • Convert tabular data from PDFs into spreadsheet-compatible formats
  • Extract text from PDF images of documents (OCR)
  • Combine multiple data sets
  • Perform exploratory data analysis and visualization

Schedule

Course time will be full days, Monday to Friday. Instruction time will be divided between short lectures (90-120 minutes) and exercises done individually and in small teams.

Content outline

Monday

Obtaining data

Overview of data formats, including Excel, CSV, databases, geographic, PDF, etc.

Techniques for locating and acquiring datasets:

  • Public data laws
  • Open data sources
  • Private/paid/subscription data sources
  • Advanced search engine techniques

Strengths and weaknesses of different data sources

Beginnings of analysis

Common Excel techniques, including:

  • Combining, and separating, and cleaning columns with formulas
  • Aggregation formulas (summing, means, medians, etc.)
  • Cleaning dates and addresses
  • Pivot tables
  • Combining multiple sheets with INDEX/VLOOKUP

Assignment 1: Obtaining and analyzing a dataset with Excel

Tuesday

Review of Assignment 1

Continuation of Excel

Review and continuation of analytic topics from previous day

Processing PDFs and text documents

Extracting text and tables from simple PDFs for use in Excel

Using optical character recognition (OCR) tools to extract text from unfriendly PDFs

Cleaning data

Using non-programming tools to clean and organize your data

Introduction to regular expressions for cleaning data

Assignment 2: Process a provided PDF dataset into a spreadsheet-compatible format and provide analysis

Wednesday

Review of Assignment 2

Introduction to programming

Installing and using the Python programming language

Making use of the Jupyter Notebook programming environment

Literate programming and “showing your work”

Basic analysis and visualization using the pandas Python library

Group dinner

Assignment 3: Reproducible data analysis using the pandas Python library

Thursday

Review of Assignment 3

Introduction to data visualization

Overview of static (non-interactive) visualizations

Principles of visualization – Tufte, data ink, chart junk, etc.

Chart selection and refining

Data visualization tools

Discussion of different types of charting tools

  • Graphics-centric tools - Illustrator, etc
  • Business Intelligence tools - Tableau, etc
  • Web-based - ChartBuilder, Infogram, etc

Building meaningful graphics

Assignment 4: Building your own series of visualizations

Friday

Review of Assignment 4

Introduction to interactive visualization and the web

Basics of web technologies

  • HTML
  • CSS
  • JavaScript

Introduction to interactive visualization with D3

Leveraging tools for data visualization in “real life”

  • Embedding charts in stories
  • Working within the constraints of your CMS
  • Responsive design and mobile-friendly visualizations

Review, and putting it all together

How do all the pieces fit together?

Putting data to work for a story

Advice for operating in your newsroom

  • Supporting yourself as a solo newsroom developer
  • Working with and learning from other technical team members
  • Learning to learn and solve technical topics outside of the classroom

 

Assignment 5: Take-home templates for building your own projects and working with your new skills

Presentation of projects, certificates

Equipment needed

All students will need a laptop with administrative install abilities.

The School

The Columbia University Graduate School of Journalism is the premiere institution for the study and practice of journalism in the world. Led by our award-winning faculty of active reporters, editors, filmmakers and digital media specialists, our programs are intensive, rigorous, and demanding. Our professional development programs, fellowships and workshops offer opportunities for seasoned practitioners and media executives to advance their knowledge and expertise. 

Faculty member

Skill

Coding Data

Details

03 - 07 Jun 2019

Venue

The Columbia University Graduate School of Journalism
2950 Broadway, New York, NY 10027, USA

Registration Deadline

30 May 2019

Registration details

Fees

  • EUR 2670

The price includes tuition, course materials, lunches and coffee breaks. It does not include any other meals, travel or hotel costs. 

Payment and cancellation policy

Accommodation

EBU Academy does not make hotel reservations.
Please contact the hotel of your choice directly.

Working languages

English

Partners and Sponsors

Columbia_University_Graduate_School_of_Journalism_Logo.jpg

Contact detail

Frederic Frantz
Business Training Manager
+41 22 717 21 48
frantz@ebu.ch
LinkedIn

Stay Informed