1 Introduction

Good Software Engineering Practice for R Packages

Daniel Sjoberg, Laura Harris, Matt Secrest

July 20, 2023

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of Roche.

Daniel

  • Senior Principle Data Scientist at Genentech.
  • Previously, Senior Biostatistician at Memorial Sloan Kettering Cancer Center in New York City.
  • Research interests include adaptive methods in clinical trials, precision medicine, and predictive modeling.
  • Winner of the 2021 American Statistical Association (ASA) Innovation in Statistical Programming and Analytics award.

Matt Secrest

  • Senior Data Scientist at Genentech
  • Masters in Epidemiology from McGill University
  • RWD epidemiologist for 7 years, including 3 in RWD product development
  • Developer of the open-source psborrow2 R package for Bayesian dynamic borrowing analyses as well as internal Roche packages
  • Feel free to connect at LinkedIn or Github

Laura

  • Masters in Public Health, Biostatistics
  • Biostatistician at Denali for 2 years; Head of Data Science & Statistical Programming R team at Denali for 4 years; 12 years in Statistical Programming, OMNI Biomarker Development, and Clinical Pharmacology Modeling and Simulation at Genentech; Biostatistician at a Pharma CRO and at UCSF.
  • Development of data simulation packages and Shiny analysis applications at Denali
  • Feel free to connect at LinkedIn or Github

What you will learn today

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Get crash-course in version control to stay organized
  • Try out modern collaboration techniques on GitHub.com
  • Learn how to make an R available to others
  • Get a starting point for sustainable Shiny app development

Program outline

09:00 - 09:30 Introduction
09:30 - 10:30 R Packages
10:30 - 10:45 Coffee Break
10:45 - 11:45 An R Package Engineering Workflow
11:45 - 12:45 Lunch Break
12:45 - 13:45 Ensuring Quality
13:45 - 14:45 Version Control & Collaboration
14:45 - 15:00 Coffee Break
15:00 - 15:45 Publication
15:45 - 16:30 Shiny Development
16:30 - 17:00 Conclusion

House-keeping

Enter Gitter Chat Channel

What you will need

  • Github.com (free) account
  • Recommended: posit.cloud
    • Free tier sufficient
    • Comes with everything installed
    • Alternative: local R development environment with
      • git
      • Rtools/R/Rstudio IDE
  • Curiosity 🦝
  • Positive attitude 😄

What do we mean by GSWEP4R*?

  • Applying concept of GxP to SWE with R
  • Improve quality of R code/packages, particularly in regulated enviroments but not limited to!
  • Not a fixed term, we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from IT/open source space

Why care about GSWEP4R?

  • Move to / integration of R in pharma is clear trend
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
    \(\leadsto\) line between programming and data analysis blurs
  • Value: de-risking use of R and efficiency gains

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Version your code
  7. Share as ‘bundle’

\(\leadsto\) R package

The R package ecosystem - huge success

GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and others
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CRF 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information