Journalists who can identify, refine, and interrogate the information they need within a large data set are in-demand and publishing the most exciting work in the industry. In the Computer Assisted Reporting course at Columbia Journalism School, you spend five days learning the fundamentals: how to scrape data from the web, digitize PDFs and use other digital sources to compile your own data sets; a primer on Python analysis tool pandas; and how to build your initial analysis and visualizations.
This data journalism course is designed for those with basic analytic skills or experience (Excel and/or databases). All students will need a laptop with administrative install abilities.
Course time will be full days, Monday to Friday. Instruction time will be divided between short lectures (90-120 minutes) and exercises done individually and in small teams.
Overview of data formats, including Excel, CSV, databases, geographic, PDF, etc.
Techniques for locating and acquiring datasets:
Strengths and weaknesses of different data sources
Beginnings of analysis
Common Excel techniques, including:
Assignment 1: Obtaining and analyzing a dataset with Excel
Review of Assignment 1
Continuation of Excel
Review and continuation of analytic topics from previous day
Processing PDFs and text documents
Extracting text and tables from simple PDFs for use in Excel
Using optical character recognition (OCR) tools to extract text from unfriendly PDFs
Using non-programming tools to clean and organize your data
Introduction to regular expressions for cleaning data
Assignment 2: Process a provided PDF dataset into a spreadsheet-compatible format and provide analysis
Review of Assignment 2
Introduction to programming
Installing and using the Python programming language
Making use of the Jupyter Notebook programming environment
Literate programming and “showing your work”
Basic analysis and visualization using the pandas Python library
Assignment 3: Reproducible data analysis using the pandas Python library
Review of Assignment 3
Introduction to data visualization
Overview of static (non-interactive) visualizations
Principles of visualization – Tufte, data ink, chart junk, etc.
Chart selection and refining
Data visualization tools
Discussion of different types of charting tools
Building meaningful graphics
Assignment 4: Building your own series of visualizations
Review of Assignment 4
Introduction to interactive visualization and the web
Basics of web technologies
Introduction to interactive visualization with D3
Leveraging tools for data visualization in “real life”
Review, and putting it all together
How do all the pieces fit together?
Putting data to work for a story
Advice for operating in your newsroom
Assignment 5: Take-home templates for building your own projects and working with your new skills
Presentation of projects, certificates
All students will need a laptop with administrative install abilities.
The Columbia University Graduate School of Journalism is the premiere institution for the study and practice of journalism in the world. Led by our award-winning faculty of active reporters, editors, filmmakers and digital media specialists, our programs are intensive, rigorous, and demanding. Our professional development programs, fellowships and workshops offer opportunities for seasoned practitioners and media executives to advance their knowledge and expertise.
03 - 07 Jun 2019
The Columbia University Graduate School of Journalism
2950 Broadway, New York, NY 10027, USA
30 May 2019
The price includes tuition, course materials, lunches and coffee breaks. It does not include any other meals, travel or hotel costs.
EBU Academy does not make hotel reservations.
Please contact the hotel of your choice directly.
Partners and Sponsors