The second post in my series on R! Last week I covered some of the basics in R. This week, I am going to cover what I think is the natural place to start: how to import data.
There are lots of different ways to import data into R. Actually, there’s lots of different ways to do pretty much everything in R! This was a point I quickly learned on my journey learning R.
I decided to focus on how to do something really well using one package. This cut down on trying to learn all the different ways to achieve the same task.
This led me to focus on the tidyverse. The tidyverse is a collection of R packages that cover everything from importing, transforming, and visualizing data.
The two tidyverse packages that I use the most for importing data are: readr and readxl.
The first step is to install the packages. If you haven’t done so, you can install all the tidyverse packages for this week and all subsequent posts by calling the install.packages() function on tidyverse.
The next step is to load the two relevant packages. This can be done by using the library() function and entering the package names as an argument.
Now we can import .csv data to the “data_csv” object using the read_csv() function from the readr package. The only argument that is required is the file path.
We can specify the function explicitly by first naming the package, following by two colons (::) and then the function. This ensures that we are using the correct function from the package that we expect.
data_csv <- readr::read_csv("H:/Feathers Analytics/R_import_data.csv")
To have a look at the “data_csv” object that we just created, we can apply the print() function. This prints the object. In this case, it is a data frame.
We can also examine the structure of an object by using the str() function. This is particularly useful for understanding a data set.
We can also import Excel data to the “data_excel” object using read_excel() function from readxl. Again, the only required argument is the file path, but there are other arguments that can be passed as well.
To find the documentation on a specific function or package, we can write a question mark before the name of the function.
data_xl <- read_excel("H:/Feathers Analytics/R_import_data.xlsx")
An object can also be printed by just calling the name of the object. The print() function is not explicitly required.
That’s my introduction to importing data into R! I mainly use the readr and readxl packages as I am typically dealing with data from Excel. In fact, I usually connect to data from other using Power Query in Excel and perform some manipulation there before loading to R.
This is mostly because I am more comfortable in that environment from using it so frequently in Power BI. However, next week I will cover some of the main data transformation verbs in the dplyr package!