1 Introduction

openstatsware Workshop: Good Software Engineering Practice for R Packages

Doug, Phil, Daniel

October 16, 2023

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Doug

  • Bioengineering Ph.D. from UC Berkeley (2017)
  • Brief stint as a Data Scientist in the shipping logistics industry
  • Working as a Data Scientist at Roche
  • Woefully bad at wetlab biology, turned to computers in shame
  • Turns out good software is a big bottleneck for statisticians & scientists
  • Find me on @dgkf and

Phil

  • Recently completed a Ph.D. in biostatistic at UC Berkeley
  • Former intern at Genentech and Roche (supervised by Daniel, among others)
  • Associate at the Analysis Group in Montreal
  • Authored and contributed to a variety of R packages available on Bioconductor, CRAN and GitHub
  • Feel free to connect

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 3 years
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect

openstatsware

  • Since: 19 August 2022 - just celebrated our 1 year birthday!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP)
  • Who: Currently 38 statisticians from 28 organizations
  • Old name: ASA BIOP Software Engineering Working Group (SWE WG)
  • What: Engineer packages and spread best practices

What you will learn here

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Get crash-course in version control to stay organized
  • Try out modern collaboration techniques on GitHub.com
  • Learn how to make an R available to others
  • Get a starting point for sustainable Shiny app development

Program outline: Day 1

13:00 - 13:30 Introduction and outline
13:30 - 14:30 R Package Syntax
14:30 - 15:00 Break
15:00 - 16:00 Software Engineering Workflow
16:00 - 16:55 Package Quality

Program outline: Day 2

13:00 - 14:00 Collaboration via GitHub
14:00 - 14:45 Publication of R Packages
14:45 - 15:15 Break
15:15 - 16:15 Shiny Development
16:15 - 16:30 Summary

House-keeping

What you will need

  • Github.com (free) account
  • Recommended: posit.cloud
    • Free tier sufficient
    • Comes with everything installed
    • Alternative: local R development environment with
      • git
      • Rtools/R/Rstudio IDE
  • Curiosity 🦝
  • Positive attitude 😄

Enter menti.com: 5224 0445

What do we mean by GSWEP4R*?

  • Applying concept of GxP to SWE with R
  • Improve quality of R code/packages, particularly in regulated enviroments but not limited to!
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from IT/open source space

Why care about GSWEP4R?

  • Move to / integration of R in pharma is clear trend
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
    \(\leadsto\) line between programming and data analysis blurs
  • Value: de-risking use of R and efficiency gains

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Version your code
  7. Share as ‘bundle’

\(\leadsto\) R package

The R package ecosystem - huge success

GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and others
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CFR Part 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information