Scaling and Performance Tuning in RStudio Connect


What if my content will be viewed by a large number of people?



TL;DR: Understanding Connect's architecture is important to ensure appropriate scaling and performance tuning. But, if you're short on time, jump to Guidelines for Parameters.

RStudio Connect can host content intended for a wide audience. The primary concern for scaling content in RStudio Connect is understanding when and how R code will be executed on the server. This article will cover the options available to ensure that R scales with viewership to make the best use of the resources on the Connect server.

Not Covered: Other considerations for widely accessed content are high availability and horizontal scaling. Both require content to be hosted by more than one server. This type of scaling is currently not supported in RStudio Connect, but is available in Shiny Server Pro.

Content published on RStudio Connect can broadly fall into two categories:

1. Static Content

Static content is content that does not require an R backend. Examples include PDF documents, plots, and HTML files. HTML files can include interactive elements, such as visualizations created by htmlwidgets, as long as the interactive elements do not require R.

2. Shiny Content

One of the unique aspects of Connect is the ability to host content that uses the Shiny framework. Examples include shiny applications and R Markdown documents with runtime::shiny specified. Shiny works by connecting an end user's browser (the client) with an R process running on the server. When a user changes an input, the client sends a message to the server and a portion of R code is re-run. The server sends back the result as output.

For Shiny Content, Connect enables scaling through four parameters that determine the content's scheduler.

Content Scheduler
When a user requests content with Shiny components, RStudio Connect opens a channel between the user and an R process. Now, suppose a second user requests the same content. There are two potential scenarios:

In Scenario A, both users are linked to the same R process. Because R is single-threaded, if user A and user B both change an input and trigger R calculations, their requests will be handled sequentially. (User B will have to wait for user A’s calculation to complete, and then for their own calculation to complete before they will see an updated output).

In Scenario B, Connect will link each user to their own R process. If user A and user B both change an input, their calculations will happen simultaneously. 

Why wouldn’t Connect always select Scenario B? The answer has to do with memory and initial load time. When 2 users are connected to the same R process, they get to share everything that is loaded outside of the server function*.

To see this, consider when the different pieces of Shiny application code are executed:

This works because Shiny makes use of R’s unique scoping rules (read more here). In Scenario B, all of the shiny code has to be re-run, including loading any globally available data. This means the memory usage is 2x what it would be in Scenario A. Additionally, spinning up an R process and executing all of the shiny code takes time. While the application is more responsive to both users after the web page is loaded, it will take longer for them to connect in the first place.

*R Markdown Documents with runtime::shiny

R Markdown documents are a bit different. In essence, an Rmd with runtime::shiny specified places everything (data loading, UI creation, etc) inside of the server function. The implication is an Rmd with runtime::shiny will always consume more memory as users connect.

Coming Soon: Prerendered Shiny Documents that alleviate this problem.

The scheduling parameters tell Connect to act somewhere in between Scenario A and Scenario B, to maximize the trade-off between app responsiveness and memory consumption/load time. Four parameters specify this behavior.

Guidelines for Parameters

Max processes - Determines the maximum number of R processes that will be created. Max processes x Max connections per process = total number of concurrent connections. Default value is 3.

Pick a value that will support the expected number of concurrent users.

Min processes - Determines the minimum number of R processes that will always be running. When a user connects to a running R process, only the server function is run. Everything in global.R is already processed. This can speed up the app load time, but incurs the cost of running an R process around the clock with loaded data.

Pick a number close to Max processes if your application loads a large amount of data to be shared by every user. Pick a small number to minimize the amount of memory consumed.

Max connections per process - The maximum number of connections per R process. Default value is 20.

Pick a small number if your application involves heavy computation. Pick a larger number if your application takes a long time to load, but after loading is very responsive to user selections.

Load factor  - Determines how aggressively new R processes will be created.  A value close to 0 means new processes will be spun up aggressively to try and keep the number of connections per process small. A value close to 1 means the number of connections per process will be close to max connections. Default value is 0.5.

Pick a small number if your application loads quickly but involves expensive computation. Pick a number closer to 1 if your application loads slowly, but after loading is fast OR if you want to minimize the amount of memory.

My application is still slow …

If the deployed application is slow to respond after tuning these parameters, then try refactoring your application code. No settings will make up for poorly written code - profiling code is highly recommended before deploying an app. A good place to start is using the Profiler to understand where code is running slowly. It can also be important to check reactive dependencies - watch this video to learn more.