1 Introduction

R/Pharma workshop: Good Software Engineering Practice for R Packages

Daniel Sabanés Bové and Joe Zhu

October 28, 2024

Disclaimer

Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 4 years
  • Co-founder of RCONIS
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect at LinkedIn, GitHub or RCONIS

Joe

  • Ph.D. in Statistics
  • Postdoc at the University of Oxford for 6 years, Data Scientist at Roche for the last 4 years, technical engineering lead for the NEST SME team, technical lead for auto-translation and slide automation initiatives at Roche.
  • Multiple open-source packages on Github and CRAN, see this page for details.
  • Feel free to connect at LinkedIn or Github

openstatsware

  • openstatsware.org
  • Since: 19 August 2022 - more than 2 years already!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP), European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)
  • Who: Currently more than 50 statisticians from more than 30 organizations
  • What: Engineer packages and spread best practices

What you will learn here

  • Understand the basic structure of an R package
  • How to Create your own R
  • Learn about the professional development workflow
  • Learn and apply fundamentals of quality control for R
  • Learn how to make an R available to others

Program outline

Time Topic
12:00 - 12:15 GMT+8 Introduction and outline
12:15 - 12:55 GMT+8 R packages, what are they?
12:55 - 13:25 GMT+8 Workflow for creating R packages
13:25 - 13:40 GMT+8 Break
13:40 - 14:20 GMT+8 Package quality
14:20 - 14:50 GMT+8 Publication
14:50 - 15:00 GMT+8 Conclusion

House-keeping

What you will need

  • Local R development environment with
    • git
    • Rtools/R/Rstudio IDE
  • Install additional R packages using the installation script
  • Curiosity 🦝
  • Positive attitude 😄

Speed intros and what would you like to learn?

  • Please enter the following in the chat - in one sentence:
    • Your name 🐵
    • Motivation for this workshop/ what would you like to learn 🧠

What do we mean by GSWEP4R*?

  • Applying concept of “Good XYZ Practice” to SWE with R
  • Improve quality and longevity of R code/packages
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from the community

Why care about GSWEP4R?

  • R is one of the most successfull statistical programming languages
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
  • The concepts are applicable to other languages, too (Python, Julia, etc.)

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Refactor and optimize code
  7. Version your code
  8. Share as ‘bundle’

\(\leadsto\) R package

The R package ecosystem - huge success

Pharma perspective: GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and openstatsware
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CFR Part 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information