Importing Data in R

Wobbly GeekWobbly Geek
4 min read

Process of loading and reading data into R from various resources.

5 types of data to import:

  • Flat files - Flat files are text-based files where data is organized in rows and columns.

  • Data from Excel

  • Databases

  • Web

  • Statistical software

read.csv()

specifically designed for reading comma-separated values (CSV) files, where columns are separated by commas by default.

data <- read.csv("data.csv", stringAsFactor = TRUE)

stringAsFactor TRUE is the default. Meaning character vector is converted to a factor.

read.csv2():

reads European-style CSV files where a semicolon is the file separator and a comma is the decimal point.

read_csv():

This is a part of readr package. Similar to read.csv()

but can also handle other formats not only commas-separated functions. The result is usually a tibble.

assumes numeric values are separated by a comma.

Example table

Name,Age,Salary
Alice,28,55,000
Bob,32,60,000
Charlie,22,48,000
David,35,72,000

if we use read_csv() the output will be:

# A tibble: 4 x 3
  Name     Age Salary
  <chr>  <dbl>  <dbl>
1 Alice      28     55
2 Bob        32     60
3 Charlie    22     48
4 David      35     72

read_csv2():

Provided in the readr package. Assumes that the decimal point in numeric values is represented by a comma. For example, 3,14 will be 3.14

The above example seemed erroneous but if we use read_csv2

the amount will be noted to be correct

# A tibble: 4 x 3
  Name     Age Salary
  <chr>  <int>  <dbl>
1 Alice     28  55000
2 Bob       32  60000
3 Charlie   22  48000
4 David     35  72000

read.delim()

function defaults to the tab character as the separator between values and the period as the decimal character.

reads .txt or .tab

read.delim(file, header = FALSE, sep = "\t", quote = "\"", ...)

can also be used simply as

# Read the tab-delimited file into a data frame
data <- read.delim("data.txt", header = TRUE)

# View the contents of the data frame
print(data)

read.table()

Is a more general function that can read data from a variety of tabular formats, including plain text files with custom delimiters and fixed-width files. You need to specify the delimiter and other parameters explicitly.

mydata <- read.table("c:/mydata.csv", header=TRUE,
  sep=",", row.names="id")

read_tsv()

Found in the readr package

This function is used to read tab-delimited text files into a data frame.

read_delim() is a more general function that reads any file with a deliminator.

read_tsv(file, col_names = TRUE, col_types = NULL, skip = 0, n_max = Inf)
read_delim(file, delim ="/")

read_xlsx()

read Excel files

excel_sheets() list the names of different sheets in an Excel workbook before importing data using read_excel().

library(xlsx)
mydata <- read.xlsx("c:/myexcel.xlsx", 1)

read_excel() imports data. Part of readxl package.

read_excel(path, sheet = 1, range = NULL, col_names = TRUE, col_types = NULL, ...)

From SPSS

get file='c:\mydata.sav'.
export outfile='c:\mydata.por'.

# in R
library(Hmisc)
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors

fread()

Part of data.table package

similar to read.table()

It infers column types and separators

extremely fast.

fread(input, sep = ",", header = "auto", data.table = FALSE, ...)
Data FormatImport FunctionSaving Function
CSVread_csv()write_csv()
TSVread_tsv()write_tsv()
Delimited Textread_delim()write_delim()
Excelread_excel()write_excel()
Tableread_table()write_table()
SASread_sas()Not Applicable
SPSSread_sav()Not Applicable
Featherread_feather()write_feather()
Parquetread_parquet()write_parquet()
Arrowread_arrow()write_arrow()

Import from Web

# Example: Reading data from a web URL
url <- "https://example.com/data.csv"
data <- read.table(url, header = TRUE, sep = ",")

Import API Data

library(httr)
library(jsonlite)

# Make an API request and parse JSON response
response <- GET("https://api.example.com/data")
data <- fromJSON(content(response, "text"))

The GET() function from httr is used to make HTTP GET requests, and fromJSON() from jsonlite is used to parse JSON responses.

Import JSON Data

library(jsonlite)

# Read JSON data from a file
data <- fromJSON("data.json")

Further Reading

https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf

0
Subscribe to my newsletter

Read articles from Wobbly Geek directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Wobbly Geek
Wobbly Geek

Hello, I'm passionate about transforming raw data into actionable insights, driven by a lifelong fascination with numbers. As a data analyst, I enjoy uncovering meaningful patterns and collaborating with like-minded individuals. I'm also a strong advocate for mental health and use data to contribute to this important cause. My background in the medical field enhances my analytical approach, bridging the gap between healthcare and data analysis.