Support

Best Practices for Python in RStudio Connect

Follow

RStudio connect allows users to deploy Shiny applications, rmarkdown notebooks and Plumber REST APIs that use Python via the reticulate package. This allows data science teams to create next level content by combining the best features and libraries of both languages.

Adding a new language to a data science project increases the complexity of development and deployment of any applications and notebooks, now you are not only dealing with R and its dependencies but also with Python and its dependencies. This increased complexity and the differences between a development environment in your local machine and a deployment in the RStudio Connect server such as a different OS, R, Python, and other pieces can make deployment of these applications difficult. Read more about environments.

The recipe and tips described on this page will help reduce the number of moving pieces and make the deployment of reticulated apps easier and less prone to errors and frustration.

Use Python native tools for environments and package management

reticulate comes with some handy functions to install Python packages and manage environments such as: py_install(), conda_create(), virtualenv_create(), use_python().

This functions serve as an easy way for R users to get started with reticulate and Python but we recommend to move to standard Python tooling like virtualenv and pip when you are ready to deploy a project and are more confortable with Python.

These functions will likely lead to errors and inconsistencies between development in a local machine and deployment in RStudio Connect. Also never call this functions inside an R script because they will get executed at deployment time and will likely make a deployment unsuccessful.

Which Python to use

In the same spirit of using standard Python tooling, you should not use the system Python that is included in Linux distributions and MacOS. Installing/upgrading libraries of this Python installation can lead to corruption on some of the system functionality.

In general is better to install an standalone Python installation from Python.org or Anaconda. This gives you other advantages like managing multiple versions of Python in the same system with no conflicts.

Use a virtualenv in every project

Like software projects, all data science work should be reproducible in order to make deployments easier, in R you should be using packrat and in Python you should use virtualenv or conda in every project.

Both virualenv and conda allow you to pin the Python version that will be used in the environment, conda will automatically install the version for you and with virualenv you need to have that version already installed. It's always a good idea to pin this version in a per project the same way you pin the Python dependencies.

We recommend having the virtual env directory in the root of the project because it's easier to track. For example, to create a virtual environment in the project directory you can do:

# Go to the project directory
cd <PROJECT DIR>

# Using virtualenv, using the Python that is currently on the PATH
virtualenv .venv

# Using virtualenv, using another version of Python explicitly
virtualenv .venv --python=python3.6

# Or using conda
conda create -p .venv python=3.6

You can then add .venv to the ignored files in version control system such as .gitignore

It is also recommended that you have a way to the dependencies of a Python environment, the most common one being requirements.txt, this file will not be used by rsconnect but will help you in the future.

Be sure to always have numpy installed in the environment as this is a requirement for moving data between R and Python.

The RETICULATE_PYTHON environment variable

Once you have a created virtual environment you need to tell reticulate which Python to use. The recommended way is to use the RETICULATE_PYTHON environment variable.

This variable is used by the rsconnect package when deploying to RStudio Connect to determine the dependencies of a Python project. The easiest way to set this is in a per project basis, for example in the .Rprofile of a project:

Sys.setenv(RETICULATE_PYTHON = ".venv/bin/python")

When deploying the app using the publish wizard in RStudio do not add .Rprofile to the bundle as RStudio connects recreates the environment and manages this for you on the deployment environment.

Python versions

Using virtualenv and the RETICULATE_PYTHON  allows you to pin the Python version that will be used by RStudio Connect to recreate the environment, after that the RAdmin just need to be sure that version exists on the server.

Given the nature of reticulate there might be some incompatibilities when using it with newer versions of Python. The recommended version for reticulate 1.12 is Python 3.6.

See also

  1. FAQ on Using Python with RStudio Connect
  2. Troubleshooting Python with RStudio Connect

Comments