Loading Events

« All Events

  • This event has passed.

ONLINE COURSE – Reproducible Data Science using RMarkdown, Git, R packages, Docker, Make & Drake, and other tools (RDRP01) This course will be delivered live

29 June 2020 - 3 July 2020


This course will now be delivered live by video link in light of travel restrictions due to the COVID-19 (Coronavirus) outbreak.

This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.

TIME ZONE – Western European Time – however all sessions will be recorded and made available allowing attendees from different time zones to follow a day behind with an additional 1/2 days support after the official course finish date (please email oliverhooker@psstatistics.com for full details or to discuss how we can accommodate you).

Course Overview:

This course provides a comprehensive introduction to doing reproducible data analysis, which we define as analysis where the entire workflow or pipeline is as open and transparent as possible, making it possible for others, including our future selves, to be able to exactly reproduce any of its results. We cover this topic by providing a thorough introduction to a set of R based and general computing tools such as RMarkdown, Git & GitHub, R packages, Docker, Gnu Make and Drake, and show how they can be used together to do reproducible data analysis that can then be shared with others. After a general introduction on Day 1, where we introduce the core concept of a research compendium, we will begin by covering RMarkdown, knitr and related tools. These are vital tools for reproducible research that allow us to produce data analysis reports, i.e. articles, slides, posters, websites, etc., by embedding analysis code (R, Python, etc) within the text of the report that is then executed, and the results it produces are inserted into the final output document. On Day 2, we provide a comprehensive introduction to version control using Git, including using GitHub. Git and GitHub are vital tools for the organization, maintenance, and distribution of our code, especially for large scale and long term projects involving multiple collaborators. On Day 3, we cover how to create, maintain, distribute R packages. R packages are obviously the principal means of distributing reusable R code generally, and here, we will also look at how R packages can be used also to create, maintain, and distribute research compendia. On Day 4, we cover Docker, which is a now very popular means for producing reproducible computing environments across different devices, platforms, and operating systems. On Day 5, we cover build automation tools, particularly Gnu Make and Drake, which are used for automatically running complex analysis code that involves multiple inter-dependencies between files. Gnu Make is a general purpose build automation tool, while Drake is specifically designed for complex data analysis pipelines in R. On each day, therefore, we aim to provide a comprehensive and thorough introduction to a set of valuable and generally useful computing tools, each of which plays a key role in allowing us to do reproducible data science.

Intended Audience

This course is relevant to anyone doing data science, whether in industry or in academic research.

Venue – Delivered remotely

Time zone – Western European Time

Availability – 15 places

Duration – 5 days

Contact hours – Approx. 28 hours

ECT’s – Equal to 3 ECT’s

Language – English

PLEASE READ – CANCELLATION POLICY: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact oliverhooker@psstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees (and accommodation fees if booked through PS statistics) will be credited.

Dr. Mark Andrews

Teaching Format
Course Programme


29 June 2020
3 July 2020


Delivered remotely (United Kingdom)
Western European Time, United Kingdom + Google Map