Support

Importing Data with RStudio

Follow

This feature is currently only available in RStudio Preview, 0.99.1130 or higher.

Introduction

Importing data into R is a necessary step that, at times, can become time intensive. To ease this task, RStudio includes new features to import data from: csv, xls, xlsx, sav, dta, por, sas and stata files. 

Importing data

The data import features can be accessed from the environment pane or from the tools menu. The importers are grouped into 3 categories: Delimited data, Excel data and statistical data. To access this feature, use the "Import Dataset" dropdown from the "Environment" pane:

Or through the "Tools" menu, followed by the "Import Dataset" submenu:

Importing data from CSV files

The CSV importer provides support to:

  • Import from the file system or a url
  • Change column data types
  • Skip or include-only columns
  • Rename the data set
  • Skip the first N rows
  • Use the header row for column names
  • Trim spaces in names
  • Change the column delimiter
  • Encoding selection
  • Select quote, escape, comment and NA identifiers

For example, one can import with ease a csv form data.gov by pasting this url https://data.montgomerycountymd.gov/api/views/6rqk-pdub/rows.csv?accessType=DOWNLOAD and selecting "import".

Importing data from Excel files

The Excel importer provides support to:

  • Import from the file system or a url
  • Change column data types
  • Skip columns
  • Rename the data set
  • Select an specific Excel sheet
  • Skip the first N rows
  • Select NA identifiers

For example, one can import with ease an xls file from data.gov by pasting this url http://www.fns.usda.gov/sites/default/files/pd/slsummar.xls and selecting "import".

Notice that this file contains to tables and therefore, requires the first few rows to be removed.

We can clean this up by skipping 6 rows from this file and unchecking the "First Row as Names" checkbox.

 

The file is looking better but some columns are being displayed as strings when they are clearly numerical data. We can fix this by selecting "numeric" from the column dropdown.

The final step is to click "import" to run the code under "Code Preview" and import the data into RStudio, the final result should look as follows:

Importing data from SPSS, SAS and Stata files

The SPSS, SAS and Stata importer provides support to:

  • Import from the file system or a url
  • Rename the data set
  • Specify a model file

 

Have more questions? Submit a request

Comments

  • Avatar
    attoumand

    I'm using RStudio Version 0.99.903. But i haven't see new features to import data from: csv, xls, xlsx, sav, dta, por, sas and stata files. I see only two options (local File and Web url).
    My OS is Windows 10 32bit.
    Please someone can help me?

  • Avatar
    Javier Luraschi

    0.99.903 does not yet contain this functionality, try installing a newer preview from here instead: https://www.rstudio.com/products/rstudio/download/preview/

  • Avatar
    mspinola10

    I am working with Rstudio preview 1.0.12 (Window 10, 64 bits)

    I am trying to read a txt file, but when I want to change on of my columns from character to factor, is asking me "Please enter the format string".
    What is that? and why is asking me that?

  • Avatar
    Javier Luraschi

    Thanks for the feedback, we are planning to improve this by asking for a comma separated list of factors.

    In the meantime, you can specify the factors as follows: c("factor1", "factor2", "factor3")

  • Avatar
    matjung

    I believe this function is available at rstudio-1.0.136 - Centos7 64 bit
    But: I get this message:
    Preparing data import requires an updated version of the readr package.

    Updateing the readr package fails.
    Based on the error messages, readr depends on curl
    For whatever reasons, StudioR does not find libcurl
    No package 'libcurl' found
    Package libcurl was not found in the pkg-config search path.
    Perhaps you should add the directory containing `libcurl.pc'
    to the PKG_CONFIG_PATH environment variable
    No package 'libcurl' found
    How can I fix that?

  • Avatar
    Camilla L. Nesbo

    I recently upgraded my R studio and am now having issues with set.names.
    I used to use
    FileT = setNames(data.frame(t(File[,-1])), File[,1])
    To put the column names in the File to be the row names in the transposed FileT.

    Now it just puts all the names into the first cell of the data frame....
    Anyone know what I can do to fix this?

  • Avatar
    Javier Luraschi

    @Matjung: See, https://github.com/jeroen/curl your probably want to install curl as `sudo yum install libcurl-devel` for Centos7.

  • Avatar
    Javier Luraschi

    @Camilia: I'm not aware of any changes in setNames. I would suggest opening a new question in our support forum to have some of my colleagues help you out.

  • Avatar
    Robert Scott

    The Import Dataset dropdown is a potentially very convenient feature, but would be much more useful if it gave the option to read csv files etc. as proper data frames. Currently it imports files as one of these *@!^* "tibble" things, which screws up a lot of legacy code and even some base R functions, often creating a debugging nightmare. It is particularly insidious since tibbles appear the same as data frames in the environment pane, and this support article does not even mention that the data is imported as a tibble. I am sure that Camilla is not the first, and will not be the last to be tripped up by this. Of course it is always possible to convert the tibble to a data frame after import, but that rather destroys the convenience of this feature. Would it be possible at least to give an option to import data as a data frame? You could still make tibbles the default, but at least people would be aware what class it is.

    @Camilla: This is the reason for your problem. Like most people, you were probably not aware that when imported using this feature, your "File" is not a data frame, hence [, 1] indexing does not work properly i.e. it returns another "tibble" instead of a vector.

  • Avatar
    Javier Luraschi

    @Robert, we've added back the option to import from CSV using base functions. It is currently available on the daily builds under the "From Text (base)..." drop-down option. Would this help?

  • Avatar
    Robert Scott

    I have had a quick try with this, and it works fine. N.B. I have not extensively tested all the options, but if this is simply re-implementing what existed before, that should not be necessary. Many thanks for the quick response.

  • Avatar
    Javier Luraschi

    @Robert Yes, the entry "From Text (base)..." launches exactly the same components from previous versions, so we are confident it works the same way it used to.

  • Avatar
    vidyasagar

    how do we perform descriptive statistics and all other statistical analysis on imported data from excel?

  • Avatar
    Hassan Alamdari

    I have set the excel importer and I have change the vector to import as numeric but it keeps importing as character. Any ideas on what to try to fix this problem?

  • Avatar
    Javier Luraschi

    @vidyasagar the question seems to generic to be answered in a comment, is there a more specific question/issue you can share?

    @Hassan Alamdari, I can't reproduce this issue, could you share which version of readxl you are using and a few rows/cols of data to reproduce this issue?

  • Avatar
    Jesse Spencer-Smith

    It would be tremendously helpful to be able to choose the pipe character "|" as a delimiter when importing .csv files.