1 Introduction
openstatsware
Workshop: Good Software Engineering Practice for R Packages
October 16, 2023
Disclaimer
Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.
Doug
Bioengineering Ph.D. from UC Berkeley (2017)
Brief stint as a Data Scientist in the shipping logistics industry
Working as a Data Scientist at Roche
Woefully bad at wetlab biology, turned to computers in shame
Turns out good software is a big bottleneck for statisticians & scientists
Find me on @dgkf and
Phil
Recently completed a Ph.D. in biostatistic at UC Berkeley
Former intern at Genentech and Roche (supervised by Daniel, among others)
Associate at the Analysis Group in Montreal
Authored and contributed to a variety of R packages available on Bioconductor, CRAN and GitHub
Feel free to connect
Daniel
Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 3 years
Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
Feel free to connect
openstatsware
Since: 19 August 2022 - just celebrated our 1 year birthday!
Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP)
Who: Currently 38 statisticians from 28 organizations
Old name: ASA BIOP Software Engineering Working Group (SWE WG)
What: Engineer packages and spread best practices
What you will learn here
Understand the basic structure of an R package
Create your own R
Learn about & apply professional development workflow
Learn & apply fundamentals of quality control for R
Get crash-course in version control to stay organized
Try out modern collaboration techniques on GitHub.com
Learn how to make an R available to others
Get a starting point for sustainable Shiny app development
Program outline: Day 1
13:00 - 13:30
Introduction and outline
13:30 - 14:30
R Package Syntax
14:30 - 15:00
Break
15:00 - 16:00
Software Engineering Workflow
16:00 - 16:55
Package Quality
Program outline: Day 2
13:00 - 14:00
Collaboration via GitHub
14:00 - 14:45
Publication of R Packages
14:45 - 15:15
Break
15:15 - 16:15
Shiny Development
16:15 - 16:30
Summary
What you will need
Github.com (free) account
Recommended: posit.cloud
Free tier sufficient
Comes with everything installed
Alternative: local R development environment with
Curiosity 🦝
Positive attitude 😄
What do we mean by GSWEP4R*?
Applying concept of GxP to SWE with R
Improve quality of R code/packages, particularly in regulated enviroments but not limited to!
Not a universal standard; we share our perspectives
Collection of best practices
Do not reinvent the wheel: learn from IT/open source space
Why care about GSWEP4R?
Move to / integration of R in pharma is clear trend
R is a powerful yet complex ecosystem
Core component: R packages
Mature analysts: users & contributors
Deep understanding crucial, even to just assess quality
Analyses increasingly require complex scripts/programs \(\leadsto\) line between programming and data analysis blurs
Value: de-risking use of R and efficiency gains
Start small - from script to package
Encapsulate behavior (functions)
Avoid global state/variables
Adopt consistent coding style
Document well
Add test cases
Version your code
Share as ‘bundle’
\(\leadsto\) R package
The R package ecosystem - huge success
GxP + R =
Core infrastructure packages only through industry
Quality, burden sharing: open-source pharmaverse and others
Open methodological packages can de-risk innovative methods
R packages make (statistical/methodological) code
testable (with documented evidence thereof, CFR Part 11 )
reusable
shareable
easier to document