Support

Best Practices for Using Python with RStudio Connect

Follow

RStudio Connect allows you to deploy Shiny applications, R Markdown reports and Plumber APIs that use Python via the reticulate package. This allows data science teams to create content that combines the best features and libraries of both R and Python.

The concepts and best practices described on this page will help you work with Python in RStudio and make the deployment of Python content to RStudio Connect less prone to errors and frustration.

 

Reproducible Python and R Environments

Adding a new language to a data science project increases the complexity of development and deployment of any applications and notebooks since you need to manage R code and its dependencies as well as Python code and its dependencies.

This increased complexity and the differences between the development environment on your local machine and the RStudio Connect environment can make deployment of these applications difficult. For more information on environments and considerations, refer to the best practices on reproducible environments for data science projects.

 

Use Python native tools for environments and package management

reticulate includes some convenient functions to install Python packages and manage environments such as: py_install(), conda_create(), virtualenv_create(), use_python().

This functions serve as an easy way for R users to get started with reticulate and Python. However, these functions are likely to result in errors and inconsistencies between development on a local machine and deployment in RStudio Connect. It is not recommended to use these functions inside in a project that you deploy to RStudio Connect because the functions will get executed at deployment time and will likely make a deployment unsuccessful.

In general, we recommend migrating to standard Python tooling such as virtualenv and pip when you are more comfortable with Python and are ready to deploy a project to RStudio Connect.

 

Which version of Python to use

Under the same recommendation of using standard Python tooling, you should not use the system Python that is included on systems such as macOS or Linux. Installing and upgrading libraries within the system/framework installation of Python can corrupt core system functionality.

In general, we recommend installing a standalone Python installation from Python.org or Anaconda. This gives you other advantages such as managing multiple versions of Python on the same system without package and version conflicts.

 

Use a virtualenv in every project

Like all software projects, data science projects should be reproducible and portable, which will make them easier to deploy with RStudio Connect. In R, you should be using packrat and in Python you should use virtualenv or conda in every project.

Both virtualenv and conda allow you to pin the Python version that will be used in the environment, conda will automatically install the version for you and with virtualenv you need to have that version already installed. It's always a good idea to pin this version in a per project the same way you pin the Python dependencies.

We recommend having the virtualenv directory in the root of your R project because it's easier to track. For example, to create a virtual environment in the project directory you can do:

# Go to the project directory
cd <PROJECT DIR>

# Using virtualenv, using the Python that is currently on the PATH
virtualenv .venv

# Using virtualenv, using another version of Python explicitly
virtualenv .venv --python=python3.6

# Or using conda
conda create -p .venv python=3.6

You can then add .venv to the ignored files in version control system such as .gitignore

It is also recommended that you capture the packages related to your Python environment, the most common one being requirements.txt.

Note that you should always have the numpy Python package installed in your environments because this is a requirement for reticulate to move data between R and Python.

 

The RETICULATE_PYTHON environment variable

Once you have a created virtual environment, you need to point reticulate to the correct version of Python. The recommended way is to use the RETICULATE_PYTHON environment variable.

This environment variable is used by the rsconnect package when deploying to RStudio Connect to discover the dependencies of a Python project. The easiest way to set this is in a per project basis, for example in the .Rprofile of a project:

Sys.setenv(RETICULATE_PYTHON = ".venv/bin/python")

When deploying the app using the publish wizard in RStudio do not add .Rprofile to the bundle as RStudio connects recreates the environment and manages this for you on the deployment environment.

 

Python versions

Using virtualenv and the RETICULATE_PYTHON environment variable allows you to pin the Python version that will be used by RStudio Connect to recreate the environment, after that the administrator just needs to be sure that the correct versions of Python are installed on the server. Refer to the support article on Configuring Python with RStudio Server Pro and RStudio Connect for more information.

Given the nature of reticulate there might be some incompatibilities when using it with newer versions of Python. The recommended version for reticulate 1.12 is Python 3.6.

See also

  1. FAQ on Using Python with RStudio Connect
  2. Troubleshooting Python with RStudio Connect

Comments