Important! We do not recommend deploying independent containers for individual users. Instead we recommend using the load balancing features of RStudio professional products to balance user sessions and connections.
RStudio professional products — RStudio Server Pro, RStudio Connect, and RStudio Package Manager — are designed to run on Linux servers. The traditional server model assumes that each product be installed on a server with each product typically running on its own server. These servers can be scaled out in a cluster to load balance sessions and support high availability.
Today, most customers run RStudio products on virtual machines (VM's); however, it's increasingly common to run RStudio professional products in Docker containers, especially in cloud environments. Below are are a few recommendations for running RStudio with containers.
The architectural model for running RStudio with containers is similar to running RStudio with VM's, where containers are substituted for VM's. You can think of it as the same model with a different target. For example, the same architecture used with vSphere on VMware could also apply to containers on AWS's ECS. Like VM's, your containers will also follow the RStudio professional product requirements including installing and running RStudio products as root.
1. Use persistent storage to preserve state
In a container model you still must preserve content and metadata with persistent storage that can be attached to your containers. Persistent storage will preserve things like code, packages, project files as well as application data, metadata, and logs.
2. Use an always up container model
We recommend always keeping your container up and running. Chances are good that users will access content, schedule jobs, or leave sessions running during off hours in order to complete their tasks. If you desire to add or subtract containers in a load balanced environment, we recommend at least one container remains up at all times. Container orchestration systems such as Kubernetes, Mesos, or Docker Swarm can handle the low level details around managing containers.
3. Treat infrastructure as code
We strongly recommend you treat infrastructure as code so your environment is reproducible. Consider using tools like Terraform, CloudFormation, Ansible, Chef, Puppet, or OpsWorks for infrastructure and configuration management. If you are new to writing infrastructure as code, start by writing a simple recipe for installing and configuring your R environment.
A note about Launcher
RStudio Server Pro v1.2 and above is able to launch interactive and batch jobs on different backend services such as Kubernetes. These jobs run on Docker images that can be modified by your organization. See Using Docker images with RStudio Server Pro, Launcher, and Kubernetes, or contact email@example.com to learn more.
Tips and best practices
- Avoid frequent reads/writes to Docker containers for performance reasons. Instead, bake in versions of R, Python, packages, and drivers so they are not installed at runtime.
- Use an init subsystem such as
dumb-initso that RStudio is not running as PID 1, because RStudio does not know how to handle orphaned processes.
- Avoid the temptation to bake all server state into a docker container. The persistent storage (the Home Directory for RStudio Server Pro and the Data Directory for RStudio Connect and RStudio Package Manager) above should stick around between containers stopping / starting.