Keynote and invited speakers

Keynote Speakers

Track: Dataviz

Sharon Machlis

Sharon Machlis is Director of Editorial Data & Analytics at IDG Communications, a global technology media company that publishes Computerworld, CIO, PCWorld, and Macworld, among other brands. In her current role, she hosts InfoWorld’s “Do More With R” video series, writes R how-to articles, helps colleagues with data journalism projects, codes tools for colleagues around the world, and analyzes editorial web metrics. Sharon is the author of “Practical R for Mass Communication & Journalism” and the Computerworld Beginner’s Guide to R She has also written a number of R packages, among which SunsetTSA.

Track: Production

Kelly O'Briant

Kelly O’Briant is a Solutions Engineer at RStudio interested in configuration and workflow management with a passion for R administration. She has a background in data science, software engineering and cloud computing, with degrees in bioinformatics and computational sciences. In 2016, Kelly founded the Washington DC chapter of R-Ladies, a global organization dedicated to increasing gender diversity in the R community.

Track: World

Tomas Kalibera

Tomas Kalibera is researcher at the Czech Technical University. As an active R Core developer Tomas has been contributing improvements and bug fixes to multiple parts of the R runtime including the byte-code compiler and interpreter, source reference handling, Windows specific code, package installation, memory management, object dispatch and function invocation. He implemented and maintains rchk, a tool for finding PROTECT bugs, which he uses regularly to check CRAN packages. He has been helping the CRAN team with checking and debugging packages, and is a member of the R Foundation.

Track: Machine Learning

Francesco Bartolucci

Francesco Bartolucci is a Professor of Statistics in the Department of Economics of the University of Perugia, where he also coordinated the Doctorate program in “Mathematical and Statistical Methods for the Economic and Social Sciences”. His main research interests of Francesco Bartolucci are focused on: Longitudinal and panel data, Latent variable and mixture models, Marginal models for categorical data, Optimization and Markov chain Monte Carlo algorithms. He co-authored the R packages LMest, MultiCIRT, MLCIRTwithin, cquad and extRC.

Track: Applications

Jared Lander

Jared Lander is Chief Data Scientist at Lander Analytics, and Adjunct Professor at Columbia Business School, where he teaches the Introduction to Programming in R course. He is organizer of the New York Open Statistical Programming Meetup and Series Editor for Pearson. He is author of the R pacakges coefplot, useful, resumer and RepoGenerator, as well as the book “R for Everyone: Advanced Analytics and Graphics”. His work with the Minnesota Vikings in the 2015 NFL Draft has been featured in in the Wall Street Journal.

Track: Lifesciences

Stephanie Hicks

Stephanie Hicks is an Assistant Professor in the Department of Biostatistics at Johns Hopkins Bloomberg School of Public Health. She is also a faculty member of the Johns Hopkins Data Science Lab, co-host of The Corresponding Author podcast discussing data science in academia, and co-founder of R-Ladies Baltimore. Her research interests are at the intersection of statistics, genomics, and data science. She actively contributes software packages to Bioconductor and is involved in teaching courses for data science and the analysis of genomics data.

Invited Speakers

Ioannis Kosmidis

Track: Machine Learning

Reader in Data Science
University of Warwick

brquasi: Improved quasi-likelihood estimation

Reduction of bias in the estimation of generalized linear models has seen extensive applied usage. The main reasons for this are the availability of comprehensive R packages for its application (e.g. brglm2; https://cran.r-project.org/package=brglm2) and the desirable side-effects that bias reduction has in models for categorical responses (see, e.g. http://arxiv.org/abs/1812.01938). Nevertheless, a significant limitation of the methods behind software like brglm2 is that they rely on the full specification of the model, and do not apply, for example, for the modelling of overdispersed data using quasi-likelihoods.

In this talk, we introduce the brquasi R package that allows users, for the first time, to reduce bias in the quasi-likelihood estimation of regression models, without the need to resort to resampling (as jackknife and bootstrap typically do). The only requirement is one extra smoothness assumption, which holds for the well-used quasi-likelihood models implemented in R.

We illustrate the properties of the new estimators, and the inference and prediction capabilities in brquasi, using examples from the modelling of overdispersed data and relative risk regression.

Colin Gillespie

Track: Production

Co-founder
Jumping Rivers

And you thought CRAN was harsh

Many R workflows revolve around packages and git. Typically, they use some form of continuous integration, such as Travis, or Gitlab CI. The general idea is that R developers are notified if a commit causes the package to fail some checks. This talk will describe the additional rigorous steps that we apply to our checks via the inteRgrate package. Using this package, allows us to standardise code style, catch errors quicker, and produce more readable commits.

Maëlle Salmon

Track: Production

Research software Engineer
rOpenSci

How can you write a good R package?

Read a starter guide, the “R packages” book, “Writing R Extensions”, and then?

This talk will feature ways to take your R package, and your R development skills, to the next level!

We shall present tools that equip you with helpful flags and metrics, from R CMD check locally and on the cloud, to lintr::lint_package() and covr::package_coverage().

We shall also mention tools that automagically improve your package or its docs: the styler and pkgdown packages.

Furthermore, we shall explain how rOpenSci Software Peer Review helps package authors receive human and humane feedback.

Finally, we shall advocate for learning from others’ experience, be it by directly reading code; or by reading blogs such as the R-hub blog and developers’ forums such as R-package-devel mailing list.

We don’t aim to go through a catalogue: we hope you’ll get away from this talk with one or a few lifechanging habits for your daily R development work.”

Edwin Thoen

Track: World

Data Scientist
funda

Building Agile data products leveraging the R package structure

Building a complex data science product can span many months or even years. This poses at least two risks to the successful completion of the project. The data scientist might work too long in isolation, losing touch with stakeholders and the intended users of the product. And as the product grows in complexity, reproducibility might get low.

During the first half of the talk we will discuss the benefits of the Agile approach to data science. What are the pros of early deployment and continuous delivery? How can one build a Minimal Viable Product? How to deal with the uncertainty in data science? It will be data science generic and not specific for R users. In the second part, however, it will be apparent that the R package structure is ideal for an Agile approach. We can clearly separate the product code (R folder) from the exploration code (Vignettes folder). All the best practices for high-quality software development will be enforced upon the product by using R packages instead of individual scripts. Meanwhile hypothesis-testing can be done quick-and-dirty without polluting the product. Finally, I propose to create a data pipeline from the product code, using drake.

Szilard Pafka

Track: Machine Learning

Chief Data Scientist
Epoch

Better than Deep Learning: Gradient Boosting Machines (GBM) – with 2020 updates

With all the hype about deep learning and “AI”, it is not well publicized that for structured/tabular data widely encountered in business applications it is actually another machine learning algorithm, the gradient boosting machine (GBM) that most often achieves the highest accuracy in supervised learning/prediction tasks. In this talk we’ll review some of the main open source GBM implementations such as xgboost, h2o, lightgbm, catboost, Spark MLlib (all of them available from R) and we’ll discuss some of their main features and characteristics (such as training speed, memory footprint, scalability to multiple CPU cores and in a distributed setting, GPU implementations, prediction speed etc). If you have seen an earlier version of my talk with the same title (for example at eRum 2018, or the video recording from several other conferences/meetups), this talk will have plenty of updates that will make it worth hearing it (for example more details on the GPU implementations, new results on catboost, or exciting updates on Spark MLlib).

Britta Velten

Track: Lifesciences

Post-doc
DKZF German Cancer Research Center

Multi-Omics Factor Analysis Plus: A probabilistic framework for comprehensive and scalable integration of multi-modal data

Technological advances have led to a growing number of studies that profile multiple molecular layers simultaneously on large cohorts of samples (e.g., patient tissues) and, more recently, at single cell resolution. While this promises a more comprehensive understanding, computational strategies for an integrative analysis are essential to obtain valuable insights into the underlying biological processes. Furthermore, such methods need to handle complex experimental designs that include multiple data modalities and multiple groups of samples as well as scale to the growing dimensions of the data. We present Multi-Omics Factor Analysis (MOFA) and its extension MOFA+ that provide a statistical framework for the comprehensive and scalable integration of such multi-modal data. MOFA+ reconstructs a low-dimensional representation of the data using a fast stochastic variational inference scheme and employs flexible sparsity constraints that enable joint modelling of variation across multiple sample groups and data modalities. It enables decomposing variation into factors present in all, some, or single data modalities or sample groups and promotes interpretable factors that can directly be linked to molecular drivers. Once learnt, the factors enable a variety of downstream analyses, including the inference of differentiation trajectories, denoising, feature selection or the identification of sample subgroups.

Colin Fay

Track: Production

Data Scientist & R Hacker
ThinkR

Testing Shiny: What, why, and how.

It’s now a common saying that in software engineering, everything that can be tested should be tested. And R code is no exception: the more you test, the more you’re likely to catch errors at an early stage of your project. And there are a lot of tools available out there to do exactly that: test your R code to protect your project against bugs.

But Shiny is a little bit different, as some parts of the app need a Shiny runtime, some part are pure back-end functions, and some other are pure web elements. So, how can we test Shiny efficiently? What do we test, and why?

In this talk, Colin will cover some of the lesser known tools that Shiny developers can integrate in their workflow.

Robin Lovelace

Track: Applications

Post-doc
Univserity of Leeds

From writing code to infoRming policy: a case study of reproducible research in transport planning

R provides unparalleled support for reproducible research. Its command-line interface and scriptable nature is revolutionary for people who previously relied on explaining a long series of steps in a graphical user interface to enable others to reproduce their work. Furthermore, R has many tools to enable the efficient replication of results in everything ranging from minimal examples (e.g. via the function dput() and the package reprex) to large projects (e.g. via Makefiles and workflow management packages such as drake).

It has been well-known that reproducibility (and its corollary falsifiability) are cornerstones of science since the time of Karl Popper, but few have considered the implications for policy. This presentation will outline ways in which research design decisions can maximise the chances of informing evidence-based policies. This includes choice of software and the way in which code underlying research is written, maintained and disseminated. Case studies from my work on the Propensity to Cycle Tool (the results of which are freely available at www.pct.bike), which has informed government transport policies, and the package for accessing road traffic casualty data stats19 will illustrate these points. The talk will conclude with concrete steps that everyone can take to maximise the reproducibility of not only their code but also the key results of research to encourage scientific debate and evidence-based decisions.

Romain Francois

Track: Applications

Software Engineer
RStudio

dplyr 1.0.0

dplyr, the data manipulation package of the tidyverse, recently reached the 1.0.0 milestone. This version brings new functions, enhances error messages throughout the package, consolidates the foundations of the package thanks to vctrs, embraces selection capabilities thanks to tidyselect, makes the interface nicer with across() and is blessed with a new amazing logo. Several blog posts before and after the release have covered the changes. In this talk, we’ll focus on summarise() one of the main dplyr functions that has received new interesting super powers.

Dean Attali

Track: Dataviz

Founder & Lead R-Shiny Consultant
AttaliTech

CRANalerts: A Shinyapp-as-a-Service for Impatient R Users

You’re super excited for your favorite package to get updated soon. It has to be soon, you’ve been waiting so long for those cool new features! You keep checking CRAN and GitHub and Twitter every few days to see news about the new release, but you’re always afraid you may miss it. There’s got to be a better way… CRANalerts to the rescue! In this talk I’ll discuss my experience building a Shiny app to solve this problem, along with the issues that come with it.

Invited CovidR Winners

Emanuele Guidotti

PhD Student
University of Neuchâtel

Mark Hanly

Lecturer
Centre for Big Data Research in Health, UNSW Sydney

COVID-19 Data Hub

Built with R, available in any language, the Data Hub provides a worldwide, fine-grained, unified dataset helpful for a better understanding of COVID-19. The user can instantly download up-to-date, structured, historical daily data across several official sources. The data are hourly crunched by the R package COVID19 and made available in csv format on a cloud storage, so to be easily accessible from Excel, R, Python… and any other software. We welcome external contributors to join and extend the number of supporting data sources. All sources are properly documented, along with their citation. COVID-19 Data Hub can spot misalignments between data-sources and automatically inform authorities of possible errors. All logs are available at https://covid19datahub.io Available on CRAN by Emanuele Guidotti and David Ardia.

COVOID: Modelling COVID-19 Transmission and Interventions

COVOID is an evolving but fully-functional R package and accompanying Shiny app for simulation modelling of both COVID-19 transmission and the interventions intended to reduce that transmission, using deterministic compartmental models (DCMs). The package contains an expanding API for simulating and estimating homogeneous and age-structured SIR, SEIR and extended models. In particular, COVOID allows the simultaneous simulation of setting-specific (e.g. school closures) and general interventions over varying time intervals. This is informed through incorporation of publicly available data on population demographics from the United Nations, age and setting-specific contact rates from previously published research and COVID-19 incidence counts from the European CDC. The built-in Shiny app enables ease of use and demonstration of key concepts to those without R programming backgrounds. Coauthors: Oisin Fitzgerald and Tim Churches,

Keynote and invited speakers

Keynote Speakers

Sharon Machlis

Kelly O'Briant

Tomas Kalibera

Francesco Bartolucci

Jared Lander

Stephanie Hicks

Invited Speakers

Ioannis Kosmidis

Colin Gillespie

Maëlle Salmon

Edwin Thoen

Szilard Pafka

Britta Velten

Colin Fay

Robin Lovelace

Romain Francois

Dean Attali

Invited CovidR Winners

Emanuele Guidotti

Mark Hanly

Contacts

For program-related matters:

For sponsors:

For general enquiries:

For hackaton info:

Follow us

Code of conduct