2 R packages

R/Pharma workshop: Good Software Engineering Practice for R Packages

Daniel Sabanés Bové

October 28, 2024

Introduction

What You Know Already

  • Packages provide a mechanism for loading optional code, data, and documentation
  • A library is a directory into which packages are installed
  • install.packages() is used to install packages into the library
  • library() is used to load and attach packages from the library
    • “Attach” means that the package is put in your search list — objects in the package can be used directly
  • Remember that package \(\neq\) library

What We Want to Talk About Now

  • How to write, build, test, and check your own package 😊
  • How to do this in a methodical and sustainable way
  • Give tips and tricks based on practical experience

How Familiar Are You with R Packages?

Please enter in the chat:

  • 0 if you have never written an R package
  • 1 if you have written an R package but never submitted to CRAN
  • 2 if you have written an R package and submitted to CRAN

Contents of a Package

How is a Package Structured?

Package source = directory with files and subdirectories

  • Mandatory:
    • DESCRIPTION
    • NAMESPACE
    • R
    • man
  • Typically also includes:
    • data
    • inst
    • src
    • tests
    • vignettes
    • NEWS

How to Get Started Quickly

Once upon a time, developers would set up this structure manually 🥱

Nowadays, it’s super fast with:

  • usethis::create_package()
  • RStudio > File > New Project > New Directory > R Package

DESCRIPTION File

  • Package: Choose the name of your package
    • Not unimportant!
    • Check CRAN to see if your name is available
  • Title: Add a Title for Your Package (Title Case)
  • Version: Start with a low package version
    • Major.Minor.Patch syntax
  • Authors@R: Add authors and maintainer
  • Description: Like an abstract, including references

DESCRIPTION File (cont’d)

  • License: Important for open sourcing
    • Consider permissive licenses such as Apache and MIT
  • Depends:
    • Which R version users need to have at a minimum
    • Ideally don’t put any package here
    • Packages will be loaded and attached upon library your package
  • Imports: Packages which you import functions, methods, classes from
  • Suggests: Packages for documentation processing (roxygen2), running examples, tests (testthat), vignettes

R Folder

  • Only contains R code files (recommended to use .R suffix)
    • Can create a file with usethis::use_r("filename")
  • Assigns R objects, i.e. mostly functions, but could also be constant variables, data sets, etc.
  • Should not have any side effects, i.e. avoid require(), options() etc.
  • If certain code needs to be sourced first, use on top of file (which will update the Collate field of DESCRIPTION automatically)
#' @include dependency.R
NULL

NAMESPACE File

  • Defines the namespace of the package, to work with R’s namespace management system
  • Namespace directives in this file allow to specify:
    • Which objects are exported to users and other packages
    • Which are imported from other packages

NAMESPACE File (cont’d)

  • Controls the search strategy for variables:
    1. Local (in the function body etc.)
    2. Package namespace
    3. Imports
    4. Base namespace
    5. Normal search() path

man Folder

  • Contains documentation files for the objects in the package in the .Rd format
    • The syntax is a bit similar to LaTeX
  • All user level objects should be documented
  • Internal objects don’t need to be documented — but I recommend it!
  • Once upon a time, developers would set up these .Rd files and the NAMESPACE manually 🥱
  • Fortunately, nowadays we have roxygen2! 🚀

roxygen2 to the Rescue!

  • We can include the documentation source directly in the R script on top of the objects we are documenting
  • Syntax is composed of special comments #' and special macros preceded with @
  • In RStudio, running Build > More > Document will render the .Rd files and the NAMESPACE file for you
  • Get started with usethis::use_roxygen_md()
  • Placing your cursor inside a function in RStudio, create a roxygen2 skeleton with Code > Insert Roxygen Skeleton

Setting up roxygen2 in your project

roxygen2 Source

R/my_sum.R:

#' My Summation Function
#'
#' This is my first function and it sums two numbers.
#'
#' @param x first summand.
#' @param y second summand.
#'
#' @return The sum of `x` and `y`.
#' @export
#' 
#' @note This function is a bit boring but that is ok.
#' @seealso [Arithmetic] for an easier way.
#'
#' @examples
#' my_sum(1, 2)
my_sum <- function(x, y) {
  x + y
}

roxygen2 Output

man/my_sum.Rd:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bla.R
\name{my_sum}
\alias{my_sum}
\title{My Summation Function}
\usage{
my_sum(x, y)
}
\arguments{
\item{x}{first summand.}

\item{y}{second summand.}
}
\value{
The sum of \code{x} and \code{y}.
}
\description{
This is my first function and it sums two numbers.
}
\note{
This function is a bit boring but that is ok.
}
\examples{
my_sum(1, 2)
}
\seealso{
\link{Arithmetic} for an easier way.
}

roxygen2 Output (cont’d)

NAMESPACE:

# Generated by roxygen2: do not edit by hand

export(my_sum)

tests Folder

  • Where store the unit tests covering the functionality of the package
  • Get started with usethis::use_testthat() and usethis::use_test() and populate tests/testthat folder with unit tests
  • Rarely, tests cannot be run within testthat framework, then these can go into R scripts directly in tests directory
  • We will look at unit tests in detail later

data Folder

  • For (example) data that you ship in your package to the user
    • Get started with usethis::use_data()
    • Note: Usually we use lazy data loading, therefore no data() call needed before using the data
  • If you generate the example data, save the R script, too
    • Put that into data-raw folder, start with usethis::use_data_raw()

inst Folder

  • Contents will be copied recursively to installation directory
    • Be careful not to interfere with standard folder names
  • For data that is used by functions in the package itself
    • Those would typically go into inst/extdata folder
    • Load with system.file("path/file", package = "mypackage")
  • CITATION: For custom citation() output
    • Create it with usethis::use_citation()
  • inst/doc can contain documentation files (typically pdf)

src Folder

  • Contains sources and headers for any code that needs compilation
  • Should only contain a single language here
    • Because R uses it, mixing C, C++ and Fortran usually works with OS native compilers
  • Much more complex to write and maintain than an R only package
  • Typically only makes sense for
    • Wrapping existing libraries for use in R
    • Speeding up complex computations (see optimization chapter)

vignettes Folder

  • Special case of documentation files (pdf or html) created by compiling source files
  • Package users don’t need to recompile the vignettes - they are delivered with the package
  • Start a new vignette with usethis::use_vignette()
    • Starts an Rmd vignette, compiled with knitr
  • Important for the user to understand the high-level ideas
  • Complements function-level documentation from our roxygen2 chunks

NEWS File

  • Lists user-visible changes worth mentioning
  • In each new release, add items at the top under the version they refer to
  • Don’t discard old items: leave them in the file after the newer items
  • Start one with usethis::use_news_md()

Building the Package

Documenting the Package

  • The first step is to produce the documentation files and NAMESPACE
  • In RStudio: Build > More > Document
  • In the console: devtools::document()

Checking the Package

  • R comes with pre-defined check command for packages: “the R package checker” aka R CMD check
  • About 22 checks are run (so quite a lot), including things like:
    • Can the package be installed?
    • Is the code syntax ok?
    • Is the documentation complete?
    • Do tests run successfully?
    • Do examples run successfully?
  • In RStudio: Build > Check
  • In the console: devtools::check()

Building the Package

  • The R package folder can be compressed into a single package file
  • Typically we manually only build “source” package
    • In RStudio: Build > More > Build Source Package
    • In the console: devtools::build()
  • Makes it easy to share the package with others and submit to CRAN

Installing the Package

  • R comes with pre-defined install command for packages: R CMD INSTALL
  • In RStudio: Build > Install
  • In the console: devtools::install()
  • Note: During development it’s usually sufficient to use Build > More > Load All
    • Runs devtools::load_all()
    • Roughly simulates what happens when package would be installed and loaded
    • Unexported objects and helpers under tests will also be available
    • Key: much faster!

Exercise

Let’s try this out now 😊

  1. Set up a new R package with a fancy name
  2. Fill out the DESCRIPTION file
  3. Include a new function
  4. Add roxygen2 documentation
  5. Export the function to the namespace
  6. Produce the package documentation
  7. Run checks
  8. Build the package

References

License Information