Support

Comparing RStudio Package Manager and Other Repository Tools

Follow
RStudio Package Manager is a new repository management server that organizes and centralizes R packages across your organization. For more information visit www.rstudio.com.

Many organizations have a central repository management tool that can be used to support multiple languages: Java, C++, etc. Some of these tools offer limited support for R. If your organization has a central repository tool, you might wonder: Why would you want another tool just for R packages?  

Requirements Unique to R

R packages are structured and distributed differently than packages in other languages. It is easy to run into conflicts when working with R packages if you are not careful.

1. Package Metadata

First, check if your repository supports the metadata associated with an R repository. Placing package tar files into a folder is not enough; R functions like install.packages rely on additional metadata that has to be updated anytime a package in the repository changes. This metadata is stored in a data format unique to R.

2. Package Archive

Second, R repositories have a specific structure for storing older versions of packages. RStudio Package Manager accommodates this structure for CRAN packages, but it also helps users update internal packages or packages from Git(Hub) by automatically archiving prior versions. Archiving prior package versions is a critical component of reproducibility tools (like the packrat package), and is a strict requirement for organizations using RStudio Connect. 

Features Unique to RStudio Package Manager

RStudio implemented several features so that working with R packages would be reliable, consistent, and optimized.

1. Additional Metadata

Other repository tools make R's public repository (CRAN) available by proxying requests. When a user asks for an R package from CRAN, the repository passes the request to CRAN and downloads the package. Unfortunately, this simplistic approach makes it hard to get consistent updates, because CRAN continues to change while packages are downloaded and cached over time. Instead of consistent updates, mysterious installation failures and caching occurs. RStudio remedies this problem by providing every Package Manager with important metadata on all of CRAN and updating it daily. This metadata supplements the default metadata provided by CRAN and enables users to browse CRAN without ever downloading packages. The additional metadata ensures that Package Manager delivers consistent, correctly caches packages, with lazy downloads.

2. Subsets of CRAN

The metadata described above also gives Package Manager a unique ability to host subsets of CRAN. Users can provide the top-level packages they want, and Package Manager does all the work to determine the full set of necessary dependencies. Package Manager also uses its knowledge of dependencies to provide a preview of changes prior to the addition of new packages or package updates.

3. Checkpoints

Every change to RStudio Package Manager is tracked and versioned. These changes are available in a calendar, and users can easily "time travel" to prior points in the repository's history, making it easy to reproduce older package environments.

4. System Dependencies

Future versions of RStudio Package Manager will provide supplemental information on the system requirements needed for R packages.

5. Package Binaries

When a user install R packages on Linux, traditionally those packages are compiled from source. Compiling packages from source can be time consuming for IT and for data scientists. Future versions of RStudio Package Manager will also be adding support for delivering Linux Binaries for CRAN packages, which are generally not publicly available and greatly increase the performance when using R on Linux servers. 

Using RStudio Package Manager in Production

RStudio Package Manager includes the same best-in-class security measures and usage tracking you'd find in other repository tools. Likewise, Package Manager has a flexible storage model, support for high availability, integrations with Git, and options for offline operation. While Package Manager meets the requirements of most IT organizations, it doesn't require an IT organization to operate. Package Manager has been specifically designed to run without root, and can even be co-located with RStudio Server or RStudio Connect. Often, this allows R administrators to manage the service.

It is possible to use RStudio Package Manager in conjunction with another repository tool, by making use of repository proxies. In this scenario, RStudio Package Manager would be placed behind a central repository, and the central repository would be configured to proxy requests to Package Manager. This can be useful if the IT organization wants the benefits of Package Manager while satisfying any compliance regulations that users interact with the central repository.  

Finally, it is important to know that R is not the only language where dedicated package managers exist. It is worth checking if there is a precedent at your organization for language-specific tools like Anaconda (Python) or npm (Javascript). 

Comments