Support

Running RStudio with Docker containers

Follow

Important! We do not recommend deploying independent containers for individual users. Instead we recommend using the load balancing features of RStudio professional products to balance user sessions and connections.

Overview

RStudio professional products — RStudio Server Pro, RStudio Connect, and RStudio Package Manager — are designed to run on Linux servers. The traditional server model assumes that each product be installed on a server with each product typically running on its own server. These servers can be scaled out in a cluster to load balance sessions and support high availability.

Today, most customers run RStudio products on virtual machines (VM's); however, it's increasingly common to run RStudio professional products in Docker containers, especially in cloud environments. Below are are a few recommendations for running RStudio with containers.

Recommendations

The architectural model for running RStudio with containers is similar to running RStudio with VM's, where containers are substituted for VM's. You can think of it as the same model with a different target. For example, the same architecture used with vSphere on VMware could also apply to containers on AWS's ECS. Like VM's, your containers will also follow the RStudio professional product requirements including installing and running RStudio products as root.

1. Use persistent storage to preserve state

In a container model you still must preserve content and metadata with persistent storage that can be attached to your containers. Persistent storage will preserve things like code, packages, project files as well as application data, metadata, and logs. 

2. Use an always up container model

We recommend always keeping your container up and running. Chances are good that users will access content, schedule jobs, or leave sessions running during off hours in order to complete their tasks. If you desire to add or subtract containers in a load balanced environment, we recommend at least one container remains up at all times. Container orchestration systems such as Kubernetes, Mesos, or Docker Swarm can handle the low level details around managing containers.

3. Treat infrastructure as code

We strongly recommend you treat infrastructure as code so your environment is reproducible. Consider using tools like Terraform, CloudFormation, Ansible, Chef, Puppet, or OpsWorks for infrastructure and configuration management. If you are new to writing infrastructure as code, start by writing a simple recipe for installing and configuring your R environment.

A note about Launcher

RStudio Server Pro v1.2 and above is able to launch interactive and batch jobs on different backend services such as Kubernetes. These jobs run on Docker images that can be modified by your organization. See Using Docker images with RStudio Server Pro, Launcher, and Kubernetes, or contact sales@rstudio.com to learn more.

Tips and best practices

  • Avoid frequent reads/writes to Docker containers for performance reasons. Instead, bake in versions of R, Python, packages, and drivers so they are not installed at runtime.
  • Use an init subsystem such as tini or dumb-init so that RStudio is not running as PID 1, which kills the container when the RStudio service is stopped.
  • Run RStudio Server Pro with a supervisor so that it logs verbosely.

References

Comments