BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//PS Statistics - ECPv4.6.2//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:PS Statistics
X-ORIGINAL-URL:https://www.psstatistics.com
X-WR-CALDESC:Events for PS Statistics
BEGIN:VEVENT
DTSTART;VALUE=DATE:20200421
DTEND;VALUE=DATE:20200425
DTSTAMP:20200408T193537
CREATED:20161111T092018Z
LAST-MODIFIED:20200326T124315Z
UID:1939-1587427200-1587772799@www.psstatistics.com
SUMMARY:ONLINE COURSE - Introduction to Bayesian hierarchical modelling using R (IBHM04) This course will be delivered live
DESCRIPTION:\nThis course will now be delivered live by video link in light of travel restrictions due to the COVID-19 (Coronavirus) outbreak.\nThis is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link\, a good internet connection is essential. \n\nPlease feel free to email oliverhooker@psstatistics.com with any questions\, full course detials below. \n\nCourse Overview:\nThis course will cover introductory hierarchical modelling for real-world data sets from a Bayesian perspective. These methods lie at the forefront of statistics research and are a vital tool in the scientist’s toolbox. The course focuses on introducing concepts and demonstrating good practice in hierarchical models. All methods are demonstrated with data sets which participants can run themselves. Participants will be taught how to fit hierarchical models using the Bayesian modelling software Jags and Stan through the R software interface. The course covers the full gamut from simple regression models through to full generalised multivariate hierarchical structures. A Bayesian approach is taken throughout\, meaning that participants can include all available information in their models and estimates all unknown quantities with uncertainty. Participants are encouraged to bring their own data sets for discussion with the course tutors. \n\n\nIntended Audience\nResearch postgraduates\, practicing academics and professionals in government and industry. \nVenue – Delivered remotely \nTime zone – UK (GMT) \nAvailability – 15 places \nDuration – 4 days \nContact hours – Approx. 30 hours \nECT’s – Equal to 3 ECT’s \nLanguage – English \nPLEASE READ – CANCELLATION POLICY: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered\, contact oliverhooker@psstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees (and accommodation fees if booked through PR statistics) will be credited. \n\n\n\n \nAndrew Parnell\n\n\n \n\n\n\n\n\nTeaching Format\n\nThere will be morning lectures based on the modules outlined in the course timetable. In the afternoon there will be practicals based on the topics covered that morning. Data sets for computer practicals will be provided by the instructors\, but participants are welcome to bring their own data. \nAssumed quantitative knowledge \nA basic understanding of regression methods and generalised linear models. \nAssumed computer background \nFamiliarity with R. Ability to import/export data\, manipulate data frames\, fit basic statistical models & generate simple exploratory and diagnostic plots. \nEquipment and software requirements \nA laptop/personal computer with a working version or R\, RStudio\, JAGS and stan installed. All are supported by both PC and MAC and can be downloaded for free by following these links. \nhttps://cran.r-project.org/ \nDownload RStudio \n\nhttp://mcmc-jags.sourceforge.net\nhttp://mc-stan.org/ \nIt is essential that you come with all necessary software and packages already installed (you will be sent a list of packages prior to the course) internet access may not always be available. \nUNSURE ABOUT SUITABLILITY THEN PLEASE ASK oliverhooker@prstatistics.com \n\n\n\nCourse Programme\n\nTuesday 21st – Classes from 09:00 to 17:00 \nModule 1: Introduction to Bayesian Statistics\nModule 2: Linear and generalised linear models (GLMs)\nPractical: Using R\, Jags and Stan for fitting GLMs \nWednesday 22nd – Classes from 09:00 to 17:00 \nModule 3: Simple hierarchical regression models\nModule 4: Hierarchical models for non-Gaussian data\nPractical: Fitting hierarchical models \nThursday 23rd – Classes from 09:00 to 17:00 \nModule 5: Hierarchical models vs mixed effects models\nModule 6: Multivariate and multi-layer hierarchical models\nPractical: Advanced examples of hierarchical models \nFriday 24th – Classes from 09:00 to 17:00 \nModule 7: Shrinkage and variable selection\nModule 8: Hierarchical models and partial pooling\nPractical: Shrinkage modelling \n\n\n\n
URL:https://www.psstatistics.com/course/introduction-to-bayesian-hierarchical-modelling-using-r-ibhm04/
LOCATION:United Kingdom
ATTACH;FMTTYPE=image/jpeg:https://www.psstatistics.com/wp-content/uploads/2016/11/IBHM-pic-new-resized.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20200504
DTEND;VALUE=DATE:20200509
DTSTAMP:20200408T193537
CREATED:20190424T190106Z
LAST-MODIFIED:20200326T205045Z
UID:3444-1588550400-1588982399@www.psstatistics.com
SUMMARY:ONLINE COURSE - Python for data science\, machine learning\, and scientific computing (PDMS02) This course will be delivered live
DESCRIPTION:\nThis course will now be delivered live by video link in light of travel restrictions due to the COVID-19 (Coronavirus) outbreak.\nThis is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link\, a good internet connection is essential. \nPlease feel free to email oliverhooker@psstatistics.com with any questions\, full course detials below. \nCourse Overview:\nPython is one of the most widely used and highly valued programming languages in the world\, and is especially widely used in data science\, machine learning\, and in other scientific computing applications. This course provides both a general introduction to programming with Python and a comprehensive introduction to using Python for data science\, machine learning\, and scientific computing. The major topics that we will cover include the following: the fundamentals of general purpose programming in Python; using Jupyter notebooks as a reproducible interactive Python programming environment; numerical computing using numpy; data processing and manipulations using pandas; data visualization using matplotlib\, seaborn\, ggplot\, bokeh\, altair\, etc; symbolic mathematics using sympy; data science and machine learning using scikit-learn\, keras\, and tensorflow; Bayesian modelling using PyMC3 and PyStan; high performance computing with Cython\, Numba\, IPyParallel\, Dask. Overall\, this course aims to provide a solid introduction to Python generally as a programming language\, and to its principal tools for doing data science\, machine learning\, and scientific computing. (Note that this course will focus on Python 3 exclusively given that Python 2 has now reached it end of life). \n \n\n\nIntended Audience\nThis course is aimed at anyone who is interested in learning the fundamentals of Python generally and especially how\nPython can be used for data science\, broadly defined. Python and Python based data science is applicable to academic\nresearch in all fields of science and engineering as well as data intensive industries and services such as finance\,\npharmaceuticals\, healthcare\, IT\, and manufacturing. \nVenue – Delivered remotely \nTime zone – UK (GMT) \nAvailability – 15 places \nDuration – 5 days \nContact hours – Approx. 28 hours \nECT’s – Equal to 3 ECT’s \nLanguage – English \nPLEASE READ – CANCELLATION POLICY: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered\, contact oliverhooker@psstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees (and accommodation fees if booked through PS statistics) will be credited. However\, PS statistics will not be held responsible/liable for any travel fees\, accommodation costs or other expenses incurred to you as a result of the cancellation. Because of this PS statistics strongly recommends any travel and accommodation that is booked by you or your institute is refundable/flexible and to delay booking your travel and accommodation as close the course start date as economically viable. \n\n\n\n \nDr. Mark Andrews\n\n\n\n\nTeaching Format\n\nThis course will be hands-on and workshop based. Throughout each day\, there will be some lecture style presentation\, i.e.\, using slides\, introducing and explaining key concepts. However\, even in these cases\, the topics being covered will include practical worked examples that will work through together. \nAssumed quantitative knowledge \nWe will assume only a minimal amount of familiarity with some general statistical and mathematical concepts. These\nconcepts will arise when we discuss numerical computing\, symbolic maths\, and statistics and machine learning.\nHowever\, expertise and proficiency with these concepts are not necessary. Anyone who has taken any undergraduate\n(Bachelor’s) level course on (applied) statistics or mathematics can be assumed to have sufficient familiarity with these concepts. \nAssumed computer background \nNo prior experience with Python or any other programming language is required. Of course\, any familiarity with any\nother programming will be helpful\, but is not required. \nEquipment and software requirements \nAttendees of the course should bring a laptop computer with Python (version 3) and the Python packages that we will\nuse (such as numpy\, pandas\, sympy\, etc) installed. All the required software is free and open source and is available on Windows\, MacOs\, and Linux. Instructions on how to install and configure all the software will be provided before the start of the course. We will also provide time during the workshops to ensure that all software is installed and configured properly. \nUNSURE ABOUT SUITABLILITY THEN PLEASE ASK oliverhooker@psstatistics.com \n\n\n\nCourse Programme\n\nSunday 3rd \nMeet at 43 Cook Street\, Glasgow G5 8JN between 17:00 – 21:00 \nMonday 4th – Classes from 09:30 to 17:30 \n• Topic 1: The What and Why of Python. In order to provide some general background and context\, we will describe\nPython where came from\, what its major design principles and intended use was originally\, and where and how it\nis now currently used. We will see that Python is now extremely widely used\, especially in powering the web\, in\ndata science and machine learning\, and system level programming. Here\, we also compare and contrast Python\nand R\, given that both are extremely widely used in data science. \n• Topic 2: Installing and setting up Python. There are many ways to write and execute code in Python. Which to use\ndepends on personal preference and the type of programming that is being done. Here\, we will explore some of the\ncommonly used Integrated Development Environments (IDE) for Python\, which include Spyder and PyCharm. Here\,\nwe will also mention and briefly describe Jupyter notebooks\, which are widely used for scientific applications of\nPython\, and are an excellent tool for doing reproducible interactive work. We will cover Jupyter more extensively\nstarting on Day 3. Also as part of this topic\, we will describe how to use virtual environments and package installers such as pip and conda. \n• Topic 3: Introduction to Python: Data Structures. We will begin our coverage of programming with Python by\nintroducing its different data structures.and operations on data structures This will begin with the elementary data types such as integers\, floats\, Booleans\, and strings\, and the common operations that can be applied to these data types. We will then proceed to the so-called collection data structures\, which primarily include lists\, dictionaries\, tuples\, and sets. \n• Topic 4: Introduction to Python: Programming. Having introduced Python’s data types\, we will now turn to how to\nprogram in Python. We will begin with iteration\, such as the for and while loops. We will then cover conditionals\nand functions. \nTuesday 5th – Classes from 09:30 to 17:30 \n• Topic 5: Modules\, packages\, and imports. Python is extended by hundreds of thousands of additional packages.\nHere\, we will cover how to install and import these packages\, and also how to write our own modules and\npackages. \n• Topic 6: Numerical programming with numpy. Although not part of Python’s official standard library\, the numpy\npackage is the part of the de facto standard library for any scientific and numerical programming. Here we will\nintroduce numpy\, especially numpy arrays and their built in functions (i.e. “methods”). \n• Topic 7: Data processing with pandas. The pandas library provides means to represent and manipulate data frames.\nLike numpy\, pandas can be see as part of the de facto standard library for data oriented uses of Python. \n• Topic 8: Object Oriented Programming. Python is an object oriented language and object oriented programming in\nPython is extensively used in anything beyond the very simplest types of programs. Moreover\, compared to other\nlanguages\, object oriented programming in Python is relatively easy to learn. Here\, we provide a comprehensive\nintroduction to object oriented programming in Python. \n• Topic 9: Other Python programming features. In this section\, we will cover some important features of Python not\nyet covered. These include exception handling\, list and dictionary comprehensions\, itertools\, advanced collection\ntypes including defaultdict\, anonymous functions\, decorators\, etc. \nWednesday 6th – Classes from 09:30 to 17:30 \n• Topic 10: Jupyter notebooks and Jupyterlab. Although we have already introduced Jupyter notebooks\, here we\nwill explore them properly. Jupyter notebooks are reproducible and interactive computing environment that\nsupport numerous programming languages\, although Python remains the principal language used in Jupyter\nnotebooks. Here\, we’ll explore their major features and how they can be shared easily using GitHub and Binder. \n• Topic 11: Data Visualization. Python provides many options for data visualization. The matplotlib library is a low level plotting library that allows for considerable control of the plot\, albeit at the price of a considerable amount of low level code. Based on matplotlib\, and providing a much higher level interface to the plot\, is the seaborn library. This allows us to produce complex data visualizations with a minimal amount of code. Similar to seaborn is ggplot\, which is a direct port of the widely used R based visualization library. In this section\, we will also consider a set of other visualization libraries for Python. These include plotly\, bokeh\, and altair. \n• Topic 12: Symbolic mathematics. Symbolic mathematics systems\, also known as computer algebra systems\, allow\nus to algebraically manipulate and solve symbolic mathematical expression. In Python\, the principal symbolic mathematics library is sympy. This allows us simplify mathematical expressions\, compute derivatives\, integrals\,\nand limits\, solve equations\, algebraically manipulate matrices\, and more. \n• Topic 13: Statistical data analysis. In this section\, we will describe how to perform widely used statistical analysis in Python. Here we will start with the statsmodels package\, which provides linear and generalized linear models as well as many other widely used statistical models. We will also introduce the scikit-learn package\, which we will more widely use on Day 4\, and use it for regression and classification analysis. \nThursday 7th – Classes from 09:30 to 17:30 \n• Topic 14: Machine learning. Python is arguably the most widely used language for machine learning. In this section\, we will explore some of the major Python machine learning tools that are part of the scikit-learn package. This section continues our coverage of this package that began in Topic 12 on Day 3. Here\, we will cover machine learning tools such as support vector machines\, decision trees\, random forests\, k-means clustering\, dimensionality reduction\, model evaluation\, and cross-validation. \n• Topic 15: Neural networks and deep learning. A popular subfield of machine learning involves the use of artificial neural networks and deep learning methods. In this section\, we will explore neural networks and deep learning using the keras library\, which is a high level interface to neural network and deep learning libraries such as Tensorflow\, Theano\, or the Microsoft Cognitive Toolkit (CNTK). Examples that we will consider here include image classification and other classification problems taken from\, for example\, the UCI Machine Learning Repository. \nFriday 8th – Classes from 09:30 to 16:00 \n• Topic 16: Bayesian models. Two probabilistic programming languages for Bayesian modelling in Python are PyMC3\nand PyStan. PyMC3 is a Python native probabilistic programming language\, while PyStan is the Python interface to\nthe Stan programming language\, which is also very widely used in R. Both PyMC3 and PyStan are extremely\npowerful tools and can implement arbitrary probabilistic models. Here\, we will not have time to explore either in\ndepth\, but will be able to work through a number of nontrivial examples\, which will illustrate the general feature\nand usage of both languages. \n• Topic 17: High performance Python. The final topic that we will consider in this course is high performance\ncomputing with Python. While many of the tools that we considered above extremely quickly because they\ninterface with compiled code written in C/C++ or Fortran\, Python itself is a high level dynamically typed and\ninterpreted programming language. As such\, native Python code does not execute as fast as compiled languages\nsuch as C/C++ or Fortran. However\, it is possible to achieve compiled language speeds in Python by compiling\nPython code. Here\, we will consider Cython and Numba\, both of which allow us achieve C/C++ speeds in Python\nwith minimal extensions to our code. Also\, in this section\, we will consider parallelization in Python\, in particular using IPyParallel and Dask\, both of which allow easy parallel and distributed processing using Python. \n\n\n\n
URL:https://www.psstatistics.com/course/python-for-data-science-machine-learning-and-scientific-computing-pdms02/
LOCATION:United Kingdom
ATTACH;FMTTYPE=image/jpeg:https://www.psstatistics.com/wp-content/uploads/2019/04/dsap.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20200525
DTEND;VALUE=DATE:20200530
DTSTAMP:20200408T193537
CREATED:20190424T185728Z
LAST-MODIFIED:20200327T044908Z
UID:3449-1590364800-1590796799@www.psstatistics.com
SUMMARY:ONLINE COURSE - Generalised Linear (MIXED) (GLMM)\, Nonlinear (NLGLM) And General Additive Models (MIXED) (GAMM) (GNAM02) This course will be delivered live
DESCRIPTION:\nThis course will now be delivered live by video link in light of travel restrictions due to the COVID-19 (Coronavirus) outbreak.\nThis is a ‘LIVE COURSE’ – the instructor will be delivering lectures and coaching attendees through the accompanying computer practical’s via video link\, a good internet connection is essential. \nPlease feel free to email oliverhooker@psstatistics.com with any questions\, full course detials below. \nCourse Overview:\nThis course provides a general introduction to nonlinear regression analysis\, covering major topics including\, but not limited to\, general and generalized linear models\, generalized additive models\, spline and radial basis function regression\, and Gaussian process regression. We approach the general topic of nonlinear regression by showing how the powerful and flexible statistical modelling framework of general and generalized linear models\, and their multilevel counterparts\, can be extended to handle nonlinear relationships between predictor and outcome variables. We begin by providing a comprehensive practical and theoretical overview of regression\, including multilevel regression\, using general and generalized linear models. Here\, we pay particular attention to the many variants of general and generalized linear models\, and how these provide a very widely applicable set of tools for statistical modeling. After this introduction\, we then proceed to cover practically and conceptually simple extensions to the general and generalized linear models framework using parametric nonlinear models and\npolynomial regression. We will then cover more powerful and flexible extensions of this modeling. framework by way of the general concept of basis functions. We’ll begin our coverage of basis function regression with the major topic of spline regression\, and then proceed to cover radial basis functions and the multilayer perceptron\, both of which are types of artificial neural networks. We then move on to the major topic of generalized additive models (GAMs) and generalized additive mixed models (GAMMs)\, which can be viewed as the generalization of all the basis function regression topics\, but cover a wider range of topic including nonlinear spatial and temporal models and interaction models. Finally\, we will cover the powerful Bayesian nonlinear regression method of Gaussian process regression. \n\n\n\nIntended Audience\nThis course is aimed at anyone who is interested to learn and apply nonlinear regression methods. These methods have major applications throughout the economics and other social sciences\, life sciences\, physical sciences\, and machine learning. \nVenue – Delivered remotely \nTime zone – UK (GMT) \nAvailability – 15 places \nDuration – 5 days \nContact hours – Approx. 28 hours \nECT’s – Equal to 3 ECT’s \nLanguage – English \nPLEASE READ – CANCELLATION POLICY: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered\, contact oliverhooker@psstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees (and accommodation fees if booked through PS statistics) will be credited. \n\n\n\n \nDr. Mark Andrews\n\n\n\n\nTeaching Format\n\nThis course will be hands-on and workshop based. Throughout each day\, there will be some lecture style presentation\, i.e.\, using slides\, introducing and explaining key concepts. However\, even in these cases\, the topics being covered will include practical worked examples that will work through together. \nAssumed quantitative knowledge \nWe assume familiarity with linear regression analysis\, and the major concepts of classical inferential statistics (p-values\, hypothesis testing\, confidence intervals\, model comparison\, etc). Some familiarity with common generalized linear models such as logistic or Poisson regression will also be assumed. \nAssumed computer background \nR experience is desirable but not essential. Although we will be using R extensively\, all the code that we use will be made available\, and so attendees will just to add minor modifications to this code. Attendees should install R and RStudio on their own computers before the workshops\, and have some minimal familiarity with the\nR environment. \nEquipment and software requirements \nA laptop computer with a working version of R or RStudio is required. R and RStudio are both available as free and open source software for PCs\, Macs\, and Linux computers. R may be downloaded by following the links here https://www.r-project.org/. RStudio may be downloaded by following the links here: https://www.rstudio.com/. All the R packages that we will use in this course will be possible to download and install during the workshop itself as and when they are needed\, and a full list of required packages will be made available to all attendees prior to the course. In some cases\, some additional open-source software will need to be installed to use some R packages. These include Stan for probabilistic modeling.; Keras for neural neural network modeling.; Prophet for forecasting. Directions on how to install this software will also be provided before and during \nUNSURE ABOUT SUITABLILITY THEN PLEASE ASK oliverhooker@psstatistics.com \n\n\n\nCourse Programme\n\nMonday 25th – Classes from 09:30 to 17:30 \nModule 1: General and generalized linear models\, including multilevel models. In order to provide a solid foundation for the remainder of the course\, we begin by providing a comprehensive practical and theoretical overview of the principles of general and generalized linear models\, also covering their multilevel (or hierarchical) counterparts. General and generalized linear models provide a powerful set of tools for statistical modeling.\, which are extremely widely used and widely applicable. Their underlying theoretical principles are quite simple and elegant\, and once understood\, it becomes clear how these models can be extended in many different ways to handle different statistical modeling. situations. \nFor this module\, we will use the very commonly used R tools such as lm\, glm\, lme4::lmer\, lme4::glmer. In addition\, we will also use the R based brms package\, which uses the Stan probabilistic programming language. This package allows us to perform all the same analyses that are provided by lm\, glm\, lmer\, glmer\, etc.\, using an almost identical syntax\, but also us to perform a much wider range of general and generalized linear model analyses. \nTuesday 26th – Classes from 09:30 to 17:30 \nHaving established a solid regression modeling. foundation\, on the second day we may cover a range of nonlinear modeling. extensions to the general and generalized linear modeling. framework. \nModule 2: Polynomial regression. Polynomial regression is both a conceptually and practically simple extension of linear modeling. It be easily accomplished using the poly function along with tools like lm\, glmer\, lme4::lmer\, lme4::glmer. Here\, we will use cover piecewise linear and polynomial regression\, using R packages such as segmented.\n\nModule 3: Parametric nonlinear regression. In some cases of nonlinear regression\, a bespoke parametric function for the relationship between the predictors and outcome variable is used. These are often obtained from scientific knowledge of the problem at hand. In R\, we can use the package nls to perform parametric nonlinear regression.\n\nModule 4: Spline regression: Nonlinear regression using splines is a powerful and flexible non-parametric or semi-parametric nonlinear regression method. It is also an example of a basis function regression method. Here\, we will cover spline regression using the splines::bs and splines::ns functions that can be used with lm\, glm\, lme4::lmer\, lme4::glmer\, brms\, etc.\n\nModule 5: Radial basis functions. Regression using radial basis functions is a set of methods that are closely related to spline regression. They have a long history of usage in machine learning and can also be viewed as a type of artificial neural network model. Here\, we will explore radial basis function models using the Stan programming language\, which will allow us to build powerful and flexible versions of the radial basis functions.\n\nModule 6: Multilayer perceptron. Closely related to radial basis functions are multilayer perceptrons. These and their variants and extensions are major building block of deep learning (machine learning) methods. We will explore multilayer perceptron in Stan\, but we will also use the powerful Keras library. \nWednesday 27th – Classes from 09:30 to 17:30 \nModule 7: Generalized additive models. We now turn to the major module of generalized additive models (GAMs). GAMs generalize many of concepts and module covered so far and represent a powerful and flexible framework for nonlinear modeling. In R\, the mgcv package provides a extensive set of tools for working with GAMs. Here\, we will provide an in-depth coverage of mgcv including choosing smooth terms\, controlling overfitting and complexity\,\nprediction\, model evaluation\, and so on.\n\nModule 9: Generalized additive mixed models. GAMs can also be used in linear mixed effects models where they are known as generalized additive mixed mmodels (GAMMs). GAMMs can also be used with the mgcv package. \nThursday 28th – Classes from 09:30 to 17:30 \nModule 10: Interaction nonlinear regression: A powerful feature of GAMs and GAMMs is the ability to model nonlinear interactions\, whether between two continuous variables\, or between one continuous and one categorical variable. Amongst other things\, interactions between continuous variables allow us to do spatial and spatio-temporal modeling. Interactions between categorical and continuous variables allow us to model how nonlinear relationships between a predictor and outcome change as a function of the value of different categorical variables. \nModule 11: Nonlinear regression for time-series and forecasting. One major application of nonlinear regression is for modeling. time-series and forecasting. Here\, we will explore the prophet library for time-series forecasting. This library\, available for both Python and R\, gives us a GAM-like framework for modeling. time-series and making forecasts. \nFriday 29th – Classes from 09:30 to 16:00 \nModule 12: Gaussian process regression. Our final module deals with a type of Bayesian nonlinear regression known as Gaussian process regression. Gaussian process regression can be viewed as a kind of basis function regression\, but with an infinite number of basis functions. In that sense\, it generalizes spline\, radial basis functions\, multilayer perceptron\, generalized additive models\, and provides means to overcome some practically challenging problems in nonlinear regression such as selecting the number and type of smooth functions. Here\, we will explore Gaussian process regression using Stan. \n\n\n\n
URL:https://www.psstatistics.com/course/generalised-linear-glm-nonlinear-nlglm-and-general-additive-models-gam-gnam02/
LOCATION:United Kingdom
ATTACH;FMTTYPE=image/jpeg:https://www.psstatistics.com/wp-content/uploads/2019/04/gnmr01.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20200629
DTEND;VALUE=DATE:20200704
DTSTAMP:20200408T193537
CREATED:20191218T000657Z
LAST-MODIFIED:20200326T204220Z
UID:6056-1593388800-1593820799@www.psstatistics.com
SUMMARY:ONLINE COURSE - Reproducible Data Science using RMarkdown\, Git\, R packages\, Docker\, Make & Drake\, and other tools (RDRP01) This course will be delivered live
DESCRIPTION:\nCourse Overview:\nThis course provides a comprehensive introduction to doing reproducible data analysis\, which we define as analysis where the entire workflow or pipeline is as open and transparent as possible\, making it possible for others\, including our future selves\, to be able to exactly reproduce any of its results. We cover this topic by providing a thorough introduction to a set of R based and general computing tools such as RMarkdown\, Git & GitHub\, R packages\, Docker\, Gnu Make and Drake\, and show how they can be used together to do reproducible data analysis that can then be shared with others. After a general introduction on Day 1\, where we introduce the core concept of a research compendium\, we will begin by covering RMarkdown\, knitr and related tools. These are vital tools for reproducible research that allow us to produce data analysis reports\, i.e. articles\, slides\, posters\, websites\, etc.\, by embedding analysis code (R\, Python\, etc) within the text of the report that is then executed\, and the results it produces are inserted into the final output document. On Day 2\, we provide a comprehensive introduction to version control using Git\, including using GitHub. Git and GitHub are vital tools for the organization\, maintenance\, and distribution of our code\, especially for large scale and long term projects involving multiple collaborators. On Day 3\, we cover how to create\, maintain\, distribute R packages. R packages are obviously the principal means of distributing reusable R code generally\, and here\, we will also look at how R packages can be used also to create\, maintain\, and distribute research compendia. On Day 4\, we cover Docker\, which is a now very popular means for producing reproducible computing environments across different devices\, platforms\, and operating systems. On Day 5\, we cover build automation tools\, particularly Gnu Make and Drake\, which are used for automatically running complex analysis code that involves multiple inter-dependencies between files. Gnu Make is a general purpose build automation tool\, while Drake is specifically designed for complex data analysis pipelines in R. On each day\, therefore\, we aim to provide a comprehensive and thorough introduction to a set of valuable and generally useful computing tools\, each of which plays a key role in allowing us to do reproducible data science. \n\n\n\nIntended Audience\nThis course is relevant to anyone doing data science\, whether in industry or in academic research. \nVenue – Delivered remotely \nTime zone – UK (GMT) \nAvailability – 15 places \nDuration – 5 days \nContact hours – Approx. 28 hours \nECT’s – Equal to 3 ECT’s \nLanguage – English \nPLEASE READ – CANCELLATION POLICY: Cancellations are accepted up to 28 days before the course start date subject to a 25% cancellation fee. Cancellations later than this may be considered\, contact oliverhooker@psstatistics.com. Failure to attend will result in the full cost of the course being charged. In the unfortunate event that a course is cancelled due to unforeseen circumstances a full refund of the course fees (and accommodation fees if booked through PS statistics) will be credited. \n\n\n\n \nDr. Mark Andrews\n\n\n\n\nTeaching Format\n\nThis course will be hands-on and workshop based. Each day\, there may be some lecture style presentation\, i.e.\, using slides\, introducing and explaining key concepts. However\, this will be minimal and our focus on each day will be the practical master of the computing tools we cover. \nAssumed quantitative knowledge \nThough we assume all participants will be experienced with some methods of statistical data analysis\, no knowledge of any specific topic is required or assumed. \nAssumed computer background \nWe will only assume a minimal familiarity with R and RStudio. More extensive R experience is desirable but not essential. No experience whatsoever with RMarkdown\, Git\, R package development\, Docker\, Make or Drake will be assumed. \nEquipment and software requirements \nA laptop with all the required software\, i.e. R/RStudio\, RMarkdown\, Git\, etc\, installed is necessary. All this software is free and open source and available on Windows\, MacOS\, and Linux. Instructions on how to install this software on each of the platforms will be distributed in advance of the workshop\, and in most cases\, can also be installed within minutes during the workshop itself. \nUNSURE ABOUT SUITABLILITY THEN PLEASE ASK oliverhooker@psstatistics.com \n\n\n\nCourse Programme\n\nMonday 29th – Classes from 09:30 to 17:30 \n• Topic 1: Doing reproducible data science. We begin by providing an overview of reproducible data analysis generally and this course in particular. We’ll address why reproducible data analysis is valuable and what are the wide range of tools that are available for accomplishing it. We’ll explain that reproducible data analysis is sometimes motivated in terms of open science\, which is committed to doing research where the data\, analysis code\, and results are made fully open and transparent to others. However\, reproducible data analysis can also be motivated simply as a means of doing more high quality\, trustworthy\, and robust data analysis\, even when that analysis is of a confidential nature. Here\, we will also introduce the central concept of a research compendium\, which is a bundling of the data\, analysis code\, and dynamic document files that produce the final reports of the analysis. We will then overview the wide range of tools for creating\, maintaining\, and distributing research compendia that we will cover in the remainder of the course. \n• Topic 2: RMarkdown. RMarkdown is a file format that contains a mixture of R code and text and from which we can produce data analysis reports (or slides\, web pages\, etc). The report is produced by automatically executing all the analysis code in the RMarkdown file and inserting the results\, such as tables\, figures\, etc.\, along with the text into the final pdf\, html\, or MS Word output document. While the basics of RMarkdown can be quickly learned\, our aim here is to provide a thorough and comprehensive introduction to RMarkdown so as to get the most out of it. This will include covering markdown syntax; mathematical typesetting with LaTeX; bibliography and citation management; cross references; formatting tables; controlling the placement of figures; scientific diagrams with TikZ; using alternative document templates; creating new customized templates. We will primarily focus on creating articles as the output format\, but will also focus on creating web pages and slides. \nTuesday 30th – Classes from 09:30 to 17:30 \n• Topic 3: Git & GitHub. The next major tool that we will cover is Git. Git is version control software\, and version control software generally is vital for the organization and development of a set of source code files\, especially when working collaboratively. We will argue that all the source code files\, including RMarkdown files\, in the data analysis project should be under version control from the beginning of the project. Using Git for this is an obvious choice because Git is powerful\, open source\, and is now the most popular and most widely version control system worldwide. In addition\, GitHub is an excellent\, free to use\, and popular hosting site for Git repositories. Here\, we will cover initializing Git repositories and cloning existing ones; staging and committing new files or modified files to the repository; writing commit messages; pushing and pulling to and from remote repositories; checking out previous versions of the repository; resetting or reverting to a previous state of the repository (i.e. undoing); branching and merging and rebasing. The last of these topics\, i.e. branching etc\, describes some especially powerful features of Git\, but ones that are vital for long term and complex projects\, especially those involving multiple collaborators. \nWednesday 1st – Classes from 09:30 to 17:30 \n• Topic 4: R packages. R packages are the means by which add-on or contributed R code is distributed\, usually through CRAN repositories and GitHub. In addition to being a general major tool for R users\, packages can also be used specifically for developing and distributing research compendia. In this section of the course\, we will provide a thorough introduction to developing R packages that can then be pushed to GitHub to be installed by others. We will cover all the major aspects of an R package: writing reusable functions; writing documentation for our code using roxygen2; writing tests to ensure that our code is working as expected; adding data files\, including their documentation; writing the DESCRIPTION where we provide all the information about what our package does\, how to use it\, what package dependencies it has\, who the authors etc are; writing vignettes\, which are long form documentation or tutorials; uploading to GitHub for distribution; using pkgdown to create a website for the package or compendium. This section of the course is intended to provide a comprehensive introduction to developing R packages generally\, and research compendia in particular. For the latter topic\, we will follow the general guidelines outlined in Marwick et al (2018) “Packaging Data Analytical Work Reproducibly Using R (and Friends)”. \nThursday 2nd – Classes from 09:30 to 17:30 \n• Topic 5: Docker. Docker is a powerful and now widely used “containerization” software that can be used to create reproducible environments and software stacks. This allows users to run software identically across different devices\, platforms\, and operating system without installing any software other than Docker itself\, which is open source and cross platform. Thus\, Docker allows us to write our code and perform our analyses on one machine in a container using a specific stack of software. We may then create a specification of this container that others can download and which allows them recreate the same environment with the same software stack on their devices. They can then run our code identically\, using identical versions of all the software\, including R packages\, and including the lower level code libraries. Distributing our research compendium to run in a Docker container is ultimate standard of reproducibility short of using identical hardware devices. In this section of the course\, we will learn how to pull general docker images from Docker repository and run containers based on them. We will then focus on an R based Docker image\, namely rocker\, that will allow us to install an R/RStudio based container. We will then extend this rocker image to create a customized R/RStudio environment with all the packages that we require to run our compendium. We will create a Dockerfile specification of this compendium that we can distribute online\, and that will allow others to download and recreate our environment exactly. Finally\, we will distribute this Docker based environment via the binder website which runs the container on server and that be then used interactively through a RStudio server session running the container. As part of this coverage\, we will also cover the packrat and checkpoint R packages that can be used for version control of R package dependencies. \nFriday 3rd – Classes from 09:30 to 16:00 \n• Topic 6: Build automation with Make and Drake: Executing simple analyses may be as simple as running a short R script or RMarkdown file. On the other hand\, complex analyses may involve dozens of scripts\, each pertaining to a particular part of the analysis pipeline\, and there may be complex inter-dependencies between files\, and the entire pipeline make take hours or even days to complete. Tools such as Gnu Make and Drake allow us to run our entire analysis pipeline using single shell or R commands. More importantly\, these tools identify the inter-dependencies in the code base and so allow us to run only those parts of the pipelines that are affected after any change is made. Gnu Make a generally useful tool for any software development\, and can be used for many analysis related tasks\, especially those that involve code in multiple different languages. Drake\, on the other hand\, was specifically designed for R based workflows\, particularly those that involve high performance and distributed computing. In this final section of the course\, therefore\, we will explore how to use Make and Drake to automate analysis workflows. To do so\, we will use some relatively simple but otherwise typical data analysis projects\, involving data cleaning\, modelling fitting\, followed by report generation. Here\, we will also deal with parallel and distributed computing workflows and how these may be automated by Make and Drake. \n\n\n\n
URL:https://www.psstatistics.com/course/reproducible-data-science-and-r-package-design-rdrp01/
LOCATION:53 Morrison Street\, Glasgow\, Scotland\,\, G5 8LB\, United Kingdom
GEO:55.8535874;-4.267977
X-APPLE-STRUCTURED-LOCATION;VALUE=URI;X-ADDRESS=53 Morrison Street Glasgow Scotland G5 8LB United Kingdom;X-APPLE-RADIUS=500;X-TITLE=53 Morrison Street:geo:-4.267977,55.8535874
ATTACH;FMTTYPE=image/jpeg:https://www.psstatistics.com/wp-content/uploads/2019/12/RPKG01.jpg
END:VEVENT
END:VCALENDAR