Support

Package Management for Offline RStudio Connect Installations

Follow

If you are administering RStudio Connect in an offline environment, you’ll need to follow certain steps to ensure the R packages used by your team are available in Connect. These steps are different from the process you might have used in the past to provide packages to RStudio.

Package Library vs Package Repository

A repository is a directory containing uninstalled R source files or platform-specific binaries. A repository contains a PACKAGES file with important information about the repository’s content.

A library is a directory containing installed R packages.

Though a repository and a library look very similar, they are two distinct entities.

Usually, two steps are required to use an R package:

  1. The package is installed from a repository into a library using install.packages().
  2. The package is loaded from the library for an analysis using library().

In offline RStudio environments, it is common for administers to skip setting up a local package repository, and instead install packages directly from an online CRAN repository into a system library. The system library - a set of folders - is then moved to the offline environment or located on shared storage. R users access packages from the system library using the library() function. The administrator tells R to look in the correct directory by defining R_LIBS_SITE in the Renviron.site file or using the R function .libPaths() in the Rprofile.site file.

Offline use of RStudio Connect, however, requires admins to set up a package repository. This requirement is necessary to enable Connect to manage and isolate package dependencies for deployed content. In brief, Connect installs packages from the local repository into private libraries for each piece of deployed content. Individual libraries guarantee that the content will have the correct packages, even if other content on the server requires a different version of the same package. (In practice, the process is optimized to cache packages while guaranteeing the correct versions are always available).

In environments with both RStudio and RStudio Connect, a local package repository should be used for both. Specifically, users or administrators should install packages in RStudio Server’s system library from the local repository. Installing packages into the system library from a different repository can result in a mismatch between Connect and RStudio that will cause deployment failures.

Setting up a local repository

A package repository is a set of source files arranged inside of a specific folder hierarchy. It is possible to set up the scaffolding for a repository manually, but an easier approach is to use the miniCRAN and packrat R packages.

From within an online environment:

  1. Open R

  2. Install packrat and miniCRAN by running the R command: install.packages('miniCRAN', 'packrat').

  3. Create a package repository by running: packrat::repos_create('path/to/directory/repoName/')

  4. Create a list of R packages to add to the local repository: pkgs <- c('<YOUR_PACKAGES_HERE>', '<EACH_IN_QUOTES>', '<SEPERATED BY SPACES>')

  5. Add the packages to the repository and update the PACKAGES file: miniCRAN::addPackage(pkgs, path = 'path/to/directory/repoName')

Step 5 will copy all the packages, and their dependencies, into the local repository. This can take some time. At the end, path/to/directory/repoName should include a tar file for each R package and a PACKAGES file listing information on each package. The tar files will be located in: /path/to/directory/repoName/src/contrib.

Next, the entire directory should be copied to a location accessible by the offline R environment (RStudio and RStudio Connect).

What if I already have a system library?

If you currently have a system library, it is important to ensure compatibility between the local repository and the packages currently in-use by RStudio. To do so:

Follow steps 1-3 above.

  1. Enumerate the packages currently available by using: pkgs <- as.data.frame(installed.packages())

  2. Add the version of each package available in the system library to the repository: miniCRAN::addOldPackage(pkgs$Package, path = 'path/to/directory/repoName', vers = pkgs$Version, deps = FALSE)

Maintaining a Repository

To add a package to the local repository, the same miniCRAN function addPackage is used. After each addition, the repository should be copied to the offline location.

The addPackage function will automatically handle package versions. For example, say you install version 1.0 of a package. Later version 2.0 of the package is released. Running addPackage again will install version 2.0 alongside of version 1.0. By default, install.packages uses the latest version. Analysts can manually install older versions into RStudio using devtools::install_version. RStudio Connect automatically installs the appropriate version based on the version in-use by RStudio during deployment.

Telling R about the System Library (Required Step!)

To use the local package repository you have to tell R where the repository lives. This declaration is similar to setting R_LIBS_SITE or modifying .libPaths().

To do so, run the R code:

packrat::repos_add(repoName = "file://path/to/directory/repoName/")

If the server is offline, it is often useful to remove the default CRAN repository:

packrat::repos_remove('CRAN')

The packrat line of code is a wrapper around R’s options functions:

r <- getOption("repos")
r["repoName"] <- "file://path/to/directory/repoName/"
options(repos = r) 

In short, add the following lines of code to the Rprofile.site:

packrat::repos_add(repoName = "file://path/to/directory/repoName/")
packrat::repos_remove('CRAN')

Comments