ONLINE COURSE – Data visualization using GG plot 2 (R and Rstudio) (DVGG01) This course will be delivered live
20 August 2020 - 21 August 2020£300.00
This course will now be delivered live by video link in light of travel restrictions due to the COVID-19 (Coronavirus) outbreak.
This is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link, a good internet connection is essential.
TIME ZONE – Western European Time +1 – however all sessions will be recorded and made available allowing attendees from different time zones to follow a day behind with an additional 1/2 days support after the official course finish date (please email email@example.com for full details or to discuss how we can accommodate you).
In this two day course, we provide a comprehensive introduction to data visualization in R using ggplot. On the first day, we begin by providing a brief overview of the general principles data visualization, and an overview of the general principles behind ggplot. We then proceed to cover the major types of plots for visualizing distributions of univariate data: histograms, density plots, barplots, and Tukey boxplots. In all of these cases, we will consider how to visualize multiple distributions simultaneously on the same plot using different colours and “facet” plots. We then turn to the visualization of bivariate data using scatterplots. Here, we will explore how to apply linear and nonlinear smoothing functions to the data, how to add marginal histograms to the scatterplot, add labels to points, and scale each point by the value of a third variable. On Day 2, we begin by covering some additional plot types that are often related but not identical to those major types covered on Day 1: frequency polygons, area plots, line plots, uncertainty plots, violin plots, and geospatial mapping. We then consider more fine grained control of the plot by changing axis scales, axis labels, axis tick points, colour palettes, and ggplot “themes”. Finally, we consider how to make plots for presentations and publications. Here, we will introduce how to insert plots into documents using RMarkdown, and also how to create labelled grids of subplots of the kind seen in many published articles.
This is one module of a seven module series – you do not need to attend them all but they are designed to complement each other. Please see the links below
“1” June 23rd – 24th Introduction to R for ecologists and evolutionary biologists
“2”July 23rd – 24th Introduction to generalised linear models using R and Rstudio
“3” August 6th – 7th Introduction to mixed models using R an d Rstudio
“4” August 20th – 21st Data visualization using GG plot 2 (R and R studio)
“5” September 3rd – 4th Data wrangling using R and Rstudio
“6” October 1st – 2nd Introduction to Nonlinear Regression using Generalized Additive Models
https://www.psstatistics.com/course/introduction to nonlinear-regression-using-generalized-additive-models-gamr01
“7” Date to be confirmed Introduction to time series analysis using R
This course is aimed at anyone who is interested in doing data visualization using R. Data visualization is a major part of data science and statistical data analysis, and R is the most widely used program for data science and statistics. Data visualization using R is widely used throughout academic scientific research, as well as widely throughout the public and private sectors.
Venue – Delivered remotely
Time zone – Western European Time +1
Availability – 20 places
Duration – 5 days
Contact hours – Approx. 15 hours
ECT’s – Equal to 2 ECT’s
Language – English
PLEASE READ – CANCELLATION POLICY: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact firstname.lastname@example.org. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees (and accommodation fees if booked through PS statistics) will be credited.
Dr. Mark Andrews
This course will be practical, hands-on, and workshop based. At the beginning on the first day, there will be a minimal
amount of lecture style presentation, i.e., using slides, introducing and explaining some key concepts. After that brief
introduction, we will use RStudio and simultaneously write R code to work through all the content of the course. For
example, when covering some plot type, we will load up the same data set and write the ggplot code to make the plot
and then modify that code to produce different variants of that plot. Any code that the instructor produces during these
sessions will be uploaded to a publicly available GitHub site after each session. For the breaks between sessions, and
between days, optional exercises will be provided. Solutions to these exercises and brief discussions of them will take
place after each break.
The course will take place online using Zoom. On each day, the live video broadcasts will occur between (British
Summer Time, UTC+1, timezone) at:
All sessions will be video recorded and made available to all attendees as soon as possible, hopefully soon after each
Attendees in different time zones will be able to join in to some of these live broadcasts, even if all of them are not
convenient times. For example, attendees from North America may be able to join the live sessions at 1pm-3pm and
4pm-6pm, and then catch up with the 10am-12pm recorded session when it is uploaded (which will be soon after 6pm
each day). By joining any live sessions that are possible, this will allow attendees to benefit from asking questions and
having discussions, rather than just watching prerecorded sessions.
At the start of the first day, we will ensure that everyone is comfortable with how Zoom works, and we’ll discuss the
procedure for asking questions and raising comments.
Although not strictly required, using a large monitor or preferably even a second monitor will make the learning
experience better, as you will be able to see my RStudio and your own RStudio simultaneously.
All the sessions will be video recorded, and made available immediately on a private video hosting website. Any
materials, such as slides, data sets, etc., will be shared via GitHub.
Assumed quantitative knowledge
We will assume only a very minimal amount of familiarity with some general statistical concepts. Anyone who has
taken any undergraduate (Bachelor’s) level introductory course on (applied) statistics can be assumed to have
sufficient familiarity with these concepts.
Assumed computer background
Minimal prior experience with R and RStudio is required. Attendees should be familiar with some basic R syntax and
commands, how to write code in the RStudio console and script editor, how to load up data from files, etc.
Equipment and software requirements
Attendees of the course will need to use RStudio. Most people will want to use their own computer on which they install the RStudio desktop software. This can be done Macs, Windows, and Linux, though not on tablets or other mobile devices. Instructions on how to install and configure all the required software, which is all free and open source, will be provided before the start of the course. We will also provide time at the beginning of the workshops to ensure that all software is installed and configured properly. An alternative to using a local installation of RStudio is to use RStudio cloud (https://rstudio.cloud/). This is a free to use and full featured web based RStudio. It is not suitable for computationally intensive work but everything done in this class can be done using RStudio cloud. We will use a number of R packages and installation instructions for these will be posted on GitHub in advance of the course and shared with attendees.
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK email@example.com
Thursday 20th – Classes from 10:00 to 18:00
On each day, we will cover a set of topics. Some of these topics will be very brief, maybe requiring as little as 15 minutes, while others will require a number of hours. Also, at the beginning of this first day, we will deal with some general “housekeeping” before we start.
• Topic 1: What is data visualization. Data visualization is a means to explore and understand our data and should be a major part of any data analysis. Here, we briefly discuss why data visualization is so important and what the
major principles behind it are.
• Topic 2: Introducing ggplot. Though there are many options for visualization in R, ggplot is simply the best. Here, we briefly introduce the major principles behind how ggplot works, namely how it is a layered grammar of
• Topic 3: Visualizing univariate data. Here, we cover a set of major tools for visualizing distributions over single variables: histograms, density plots, barplots, Tukey boxplots. In each case, we will explore how to plot multiple groups of data simultaneously using different colours and also using facet plots.
• Topic 4: Scatterplots. Scatterplots and their variants are used to visualize bivariate data. Here, in addition to
covering how to visualize multiple groups using colours and facets, we will also cover how to provide marginal
plots on the scatterplots, labels to points, and how to obtain linear and nonlinear smoothing of the plots.
Friday 21st – Classes from 10:00 to 18:00
Topic 5: More plot types. Having already covered the most widely used general purpose plots on Day 1, we now
turn to cover a range of other major plot types: frequency polygons, area plots, line plots, uncertainty plots, violin plots, and geospatial mapping. Each of these are important and widely used types of plots, and knowing them will expand your repertoire.
• Topic 6: Fine control of plots. Thus far, we will have mostly used the default for the plot styles and layouts. Here, we will introduce how to modify things like the limits and scales on the axes, the positions and nature of the axis ticks, the colour palettes that are used, and the different types of ggplot themes that are available.
• Topic 7: Plots for publications and presentations: Thus far, we have primarily focused on data visualization as a
means of interactively exploring data. Often, however, we also want to present our plots in, for example, published
articles or in slide presentations. It is simple to save a plot in different file formats, and then insert them into a document. However, a much more efficient way of doing this is to use RMarkdown to run the R code and
automatically insert the resulting figure into a, for example, Word document, pdf document, html page, etc. In
addition, here we will also cover how to make labelled grids of subplots like those found in many scientific articles.