Understanding the RStudio Product Databases


RStudio Connect and RStudio Package Manager rely on databases to store metadata. Out of the box, both products comes with a SQLite database. If you are running either on a single server, you don't need anything else.



However, if you are running RStudio Connect or RStudio Package Manager on multiple servers, you will need to provide an external Postgres installation. The individual products will manage all of the data and tables inside the Postgres installation.



Common Questions:

Can I use an existing Postgres installation for RStudio Connect or Package Manager?

Yes! The RStudio product requires read/write access to a dedicated Postgres schema, but the schema can live in a Postgres installation that houses other schemas as well.

Can I use a different database provider like Oracle, MySQL, or SQL Server?

No, not at this time.

Do I need a dedicated DBA for the RStudio Connect or Package Manager database(s)?

No. The product manages all of the data inside the database including data permissions. A DBA can assist with the initial setup and potentially data backups, but consider this database an application requirement not a part of your data organization.

What is stored in the database?

The RStudio Connect or RStudio Package Manager databases store metadata about content, users, packages, and settings. The databases also store metrics including content or package usage.

In particular, the RStudio Connect database does not store the data used by the applications or reports hosted on RStudio Connect. For example, if you have a dashboard that shows sales forecasting data, that data is accessed by the application code and references your company data warehouse. The RStudio Connect database would not contain any sales data.

How big are the databases?

The size of each database will depend on the amount of content and activity on the server. A good rule of thumb is to start with 1 GB of storage for a Postgres installation or 1 GB of disk space for the SQLite database (located at /var/lib/rstudio-connect/db or /var/lib/rstudio-pm/db by default).


Can I migrate from a single-node  server using SQLite to a multi-node configuration with a Postgres installation?

RStudio Connect supports database migrations, see the admin guide chapter on migrations. RStudio Package Manager does not support database migrations at this time.

How should I handle data backups for the product databases?

SQLite: RStudio Connect has built-in support for backing up the SQLite database while the RStudio Connect service is running, see the admin guide. For RStudio Package Manager, stop the service and make a copy of the SQLite database.

Postgres: Postgres has native support for backups. For example, a cron job can be set to use the pgdump command to create backups on a schedule.

RStudio Connect and Package Manager also rely on disk storage. The database and disk storage should be kept in sync and backed up together at the same time.

What about the file storage requirements?

In addition to the database, RStudio Connect requires file storage that includes the source code deployed to RStudio Connect, rendered reports, log files,  and R packages. The admin guide outlines the on-disk storage requirements.

RStudio Package Manager also relies on storage, and administrators can pick between using shared files or S3. For more details refer to the admin guide.


For a comprehensive overview, please see the admin guide chapter on databases.