Important! We do not recommend deploying independent containers for individual users. Instead we recommend using the load balancing features of RStudio professional products to balance user sessions and connections.
RStudio professional products — RStudio Workbench (previously RStudio Server Pro), RStudio Connect, and RStudio Package Manager — are designed to run on Linux servers. The traditional server model assumes that each product be installed on a server with each product typically running on its own server. These servers can be scaled out in a cluster to load balance sessions and support high availability.
Today, most customers run RStudio products on virtual machines (VM's); however, it's increasingly common to run RStudio professional products in Docker containers, especially in cloud environments.
In order to run any RStudio professional product in a docker-based environment, the following requirements must be fulfilled:
1. Privileged containers
Running any RStudio Workbench Docker image requires the container to run using the
Running any RStudio professional products inside docker requires a valid license for the product. It is very important to deactivate the license before stopping the container or it will count as an active license.
3. Persistent storage
In a container model you still must preserve content and metadata with persistent storage that can be attached to your containers. Persistent storage will preserve things like code, packages, project files as well as application data, metadata, and logs.
More details on implementing each RStudio professional product in a docker environment can be found here.
The architectural model for running RStudio with containers is similar to running RStudio with VM's, where containers are substituted for VM's. You can think of it as the same model with a different target. For example, the same architecture used with vSphere on VMware could also apply to containers on AWS's ECS. Like VM's, your containers will also follow the RStudio professional product requirements including installing and running RStudio products as root.
1. Use an always up container model
We recommend always keeping your container up and running. Chances are good that users will access content, schedule jobs, or leave sessions running during off hours in order to complete their tasks. If you desire to add or subtract containers in a load balanced environment, we recommend at least one container remains up at all times. Container orchestration systems such as Kubernetes, Mesos, or Docker Swarm can handle the low level details around managing containers.
2. Treat infrastructure as code
We strongly recommend you treat infrastructure as code so your environment is reproducible. Consider using tools like Terraform, CloudFormation, Ansible, Chef, Puppet, or OpsWorks for infrastructure and configuration management. If you are new to writing infrastructure as code, start by writing a simple recipe for installing and configuring your R environment.
A note about Launcher
RStudio Server Pro v1.2 and above (now RStudio Workbench) is able to launch interactive and batch jobs on different backend services such as Kubernetes. These jobs run on Docker images that can be modified by your organization. See Using Docker images with RStudio Workbench, Launcher, and Kubernetes, or contact firstname.lastname@example.org to learn more.
Tips and best practices
- Avoid frequent reads/writes to Docker containers for performance reasons. Instead, bake in versions of R, Python, packages, and drivers so they are not installed at runtime.
- Use an init subsystem such as
dumb-initso that RStudio is not running as PID 1, because RStudio does not know how to handle orphaned processes.
- Avoid the temptation to bake all server state into a docker container. The persistent storage (the Home Directory for RStudio Workbench and the Data Directory for RStudio Connect and RStudio Package Manager) above should stick around between containers stopping / starting.