Introduction to Bayesian data analysis for social and behavioural sciences using R and Stan (BDRS01)
3 December 2018 - 7 December 2018£275.00 - £510.00
This course provides a general introduction to Bayesian data analysis using R and the Bayesian probabilistic programming language Stan. We begin with a gentle introduction to all the fundamental principles and concepts of Bayesian data analysis: the likelihood function, prior distributions, posterior distributions, high posterior density intervals, posterior predictive distributions, marginal likelihoods, Bayes factors, etc. We will do this using some simple probabilistic models that are easy to understand and easy to work with. We then proceed to more practically useful Bayesian analyses, starting with general linear models, followed by generalized linear models, including logistic regression and Poisson regression, followed by multilevel general and generalized linear models. For these analyses, we will use real world data sets, and carry out the analysis with Stan using the brms interface to Stan in R. With each example, we will explore general concepts such as model checking and improvement using posterior predictive checks, and model evaluation using cross-validation, WAIC, and Bayes factors. In the final part of the course, we will delve into some more advanced topics: understanding Markov Chain Monte Carlo in depth, Gaussian process regression, probabilistic mixture models.
This course is aimed at anyone who is interested to learn and apply Bayesian data analysis in any area of science, including the social sciences, life sciences, physical sciences. No prior experience or familiarity with Bayesian statistics is required.
Venue – PS statistics head office, 53 Morrison Street, Glasgow, G5 8LB – Google map
Availability – 30 places
Duration – 5 days
Contact hours – Approx. 28 hours
ECT’s – Equal to 3 ECT’s
Language – English
We offer COURSE ONLY and ACCOMMODATION PACKAGES;
• COURSE ONLY – Includes lunch and refreshments.
• ACCOMMODATION PACKAGE (to be purchased in addition to the course only option) – Includes breakfast, lunch, welcome dinner Monday evening, farewell dinner Friday evening, refreshments and accommodation. Self-catering facilities are available in the accommodation. Accommodation is approximately a 6-minute walk from the PR statistics head office. Accommodation is multiple occupancy (max 3-4 people) single sex en-suite rooms. Arrival Sunday 2nd December (after 5pm) and departure Friday 7th December (accommodation must be vacated by 9am).
Other payment options are available please email firstname.lastname@example.org
Cancellation policy: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered, contact email@example.com Failure to attend will result in the full cost of the course being charged. In the unfortunate event that PS statistics must cancel this course due to unforeseen circumstances a full refund for the course will be credited. However PS statistics cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.
Dr. Mark Andrews
This course will be hands-on and workshop based. Throughout each day, there will be some lecture style presentation, i.e., using slides, introducing and explaining key concepts. However, even in these cases, the topics being covered will include practical worked examples that will work through together.
Assumed quantitative knowledge
We assume familiarity with inferential statistics concepts like hypothesis testing and statistical significance, and some practical experience with commonly used methods like linear regression, correlation, or t-tests. Most or all of these concepts and methods are covered in a typical undergraduate statistics courses in any of the sciences and related fields.
Assumed computer background
R experience is desirable but not essential. Although we will be using R extensively, all the code that we use will be made available, and so attendees will just need to copy and paste and add minor modifications to this code. Attendees should install R and RStudio on their own computers before the workshops, and have some minimal familiarity with the R environment. If some additional familiarity with R is required, countless short video introductions to R and RStudio are available online (e.g., https://youtu.be/lVKMsaWju8w).
Equipment and software requirements
A laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs, Macs, and Linux computers. R may be downloaded by following the links here https://www.r-project.org/. RStudio may be downloaded by following the links here: https://www.rstudio.com/. In addition to R and RStudio, Stan for R should also be installed. Stan is also free and open source software and is available for PCs, Macs, and Linux computers. More information about Stan is available here http://mc-stan.org/, and Stan for R (i.e., RStan) can be installed from here https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started. Many supplementary R packages will be required. The list of necessary packages will be made available to all attendees prior to the course. These can all be installed from within RStudio will one click. It is highly recommended that all attendees come with all the necessary software and packages installed in advance. This will minimize troubleshooting during the workshop that might delay our progress.
UNSURE ABOUT SUITABLILITY THEN PLEASE ASK firstname.lastname@example.org
Meet at 43 Cook Street, Glasgow G5 8JN at approx. 17:00 onwards
Monday 3rd – Classes from 09:30 to 17:30
Class 1: We will begin with a overview of what Bayesian data analysis is in essence and how it fits into statistics as it practiced generally. Our main point here will be that Bayesian data analysis is effectively an alternative school of statistics to the traditional approach, which is referred to variously as the classical, or sampling theory based, or frequentist based approach, rather than being a specialized or advanced statistics topic. However, there is no real necessity to see these two general approaches as being mutually exclusive and in direct competition, and a pragmatic blend of both approaches is entirely possible.
Class 2: Introducing Bayes’ rule. Bayes’ rule can be described as a means to calculate the probability of causes from some known effects. As such, it can be used as a means for performing statistical inference. In this section of the course, we will work through some simple and intuitive calculations using Bayes’ rule. Ultimately, all of Bayesian data analysis is based on an application of these methods to more complex statistical models, and so understanding these simple cases of the application of Bayes’ rule can help provide a foundation for the more complex cases.
Class 3: Bayesian inference in a simple statistical model. In this section, we will work through a classic statistical inference problem, namely inferring the number of red marbles in an urn of red and black marbles. This problem is easy to analyse completely with just the use of R, but yet allows us to delve into all the key concepts of all Bayesian statistics including the likelihood function, prior distributions, posterior distributions, maximum a posteriori estimation, high posterior density intervals, posterior predictive intervals, marginal likelihoods, Bayes factors, model evaluation of out-of-sample generalization.
Tuesday 4th – Classes from 09:30 to 17:30
Class 4: Bayesian analysis of linear and normal models. Statistical models based on linear relationships and normal distribution are a mainstay of statistical analyses in general. They encompass models such as linear regression, Pearson’s correlation, t-tests, ANOVA, ANCOVA, and so on. In this section, we will describe how to do Bayesian analysis of linear and normal models, paying particular attention to Bayesian linear regression. One of the aims of this section is to identify some important and interesting parallels between Bayesian and classical or frequentist analyses. This shows how Bayesian and classical analyses can be seen as ultimately providing two different perspectives on the same problem.
Class 5: The previous section provides a so-called analytical approach to linear and normal models. This is where we can calculate desired quantities and distributions by way of simple formulae. However, analytical approaches to Bayesian analyses are only possible in a relatively restricted set of cases. However, numerical methods, specifically Markov Chain Monte Carlo (MCMC) methods can be applied to virtually any Bayesian model. In this section, we will re-perform the analysis presented in the previous section but using MCMC methods. For this, we will use the brms package in R that provides an exceptionally easy to use interface to Stan.
Class 6: This section continues the previous one, but explores a wider range of linear and normal models, namely the general linear models. These include models with multiple predictors, some or all of which may be categorical, and interactions between these predictors. We will use brms for all of these analyses. For all the examples covered here, we will use real world data-sets taken from a variety of different fields.
Wednesday 5th – Classes from 09:30 to 17:30
Class 7: Bayesian generalized linear models. Generalized linear models include models such as logistic regression, including multinomial and ordinal logistic regression, Poisson regression, negative binomial regression, and other models. Again, for these analyses we will use the brms package and explore this wide range of models using real world data-sets.
Class 8: Model evaluation and checking. A general topic in any analysis is to evaluate the suitability of the chosen or assumed statistical models in the analysis. This general topic incorporates hypothesis testing. In this section, we will discuss this topic in depth, paying particular attention to posterior predictive checks, cross-validation, information criteria, and Bayes factors. We will revisit many of the examples covered so far, and perform model checking and evaluation and hypothesis testing with the models that we used.
Thursday 6th – Classes from 09:30 to 17:30
Class 8: Multilevel general and generalized linear models. In this section, we will cover the multilevel variants of the regression models, i.e. linear, logistic, Poisson etc, that we have covered so far. The topic of multilevel (or hierarchical) models is a major one, and multilevel models are widely used throughout the sciences. In general, multilevel models arise whenever data are correlated due to membership of a group (or group of groups, and so on). For example, if we have data concerning how socioeconomic status relates to educational achievement, the data might come from individual children. But these children are in separate schools, the schools are in separate cities, and the cities are in separate countries. Thus, the entire data-sets comprises groups (of groups etc) of data subsets, and there may be important variation across these subsets. The entire day is devoted to multilevel regression models. We will, as before, use a wide range of real-world data-sets, and move between linear, logistic, etc., models are we explore these analyses. We will pay particular attention to considering when and how to use varying slope and varying intercept models, and how to choose between maximal and minimal models. Here, we will cover model checking and evaluation in the same depth as with the previous models.
Friday 7th – Classes from 09:30 to 16:00
Class 9: MCMC in depth. Although we will used MCMC methods extensively thus far, we will have hidden some of their technical details. As one approaches more advanced Bayesian topics, a deeper understanding of MCMC methods is required. In this section, we will begin by discussing simple Monte Carlo (MC) approaches like rejection sampling and importance sampling, and then proceed to Markov Chain Monte Carlo (MCMC) such as Gibbs sampling, Metropolis Hastings sampling, slice sampling, and Hamiltonian Monte Carlo.
Class 10: Customized and bespoke statistical models. Thus far, we have use the brms package for almost all of our analyses. While brms is an excellent tool, in some cases, especially in more advanced analyses, it is not possible to use a pre-defined statistical model, e.g. a linear or logistic regression model, and it is necessary to develop customized and bespoke probabilistic models directly in the Stan language itself. In this final section of the course, we will delve into how to write Stan code directly. We’ll first explore the Stan code that brms creates, and we’ll learn how to modify this code. We will then write customized models that perform nonlinear regression using Gaussian processes and radial basis functions, and also finite mixture models. Through these examples, we will learn how to write and analyse any type of custom statistical model and thus produce models that are well suited to whatever specialized problem we are working on.