Importing Data in R
Process of loading and reading data into R from various resources.
5 types of data to import:
Flat files - Flat files are text-based files where data is organized in rows and columns.
Data from Excel
Databases
Web
Statistical software
read.csv()
specifically designed for reading comma-separated values (CSV) files, where columns are separated by commas by default.
data <- read.csv("data.csv", stringAsFactor = TRUE)
stringAsFactor TRUE is the default. Meaning character vector is converted to a factor.
read.csv2()
:
reads European-style CSV files where a semicolon is the file separator and a comma is the decimal point.
read_csv()
:
This is a part of readr package. Similar to read.csv()
but can also handle other formats not only commas-separated functions. The result is usually a tibble.
assumes numeric values are separated by a comma.
Example table
Name,Age,Salary
Alice,28,55,000
Bob,32,60,000
Charlie,22,48,000
David,35,72,000
if we use read_csv() the output will be:
# A tibble: 4 x 3
Name Age Salary
<chr> <dbl> <dbl>
1 Alice 28 55
2 Bob 32 60
3 Charlie 22 48
4 David 35 72
read_csv2()
:
Provided in the readr
package. Assumes that the decimal point in numeric values is represented by a comma. For example, 3,14 will be 3.14
The above example seemed erroneous but if we use read_csv2
the amount will be noted to be correct
# A tibble: 4 x 3
Name Age Salary
<chr> <int> <dbl>
1 Alice 28 55000
2 Bob 32 60000
3 Charlie 22 48000
4 David 35 72000
read.delim()
function defaults to the tab character as the separator between values and the period as the decimal character.
reads .txt
or .tab
read.delim(file, header = FALSE, sep = "\t", quote = "\"", ...)
can also be used simply as
# Read the tab-delimited file into a data frame
data <- read.delim("data.txt", header = TRUE)
# View the contents of the data frame
print(data)
read.table()
Is a more general function that can read data from a variety of tabular formats, including plain text files with custom delimiters and fixed-width files. You need to specify the delimiter and other parameters explicitly.
mydata <- read.table("c:/mydata.csv", header=TRUE,
sep=",", row.names="id")
read_tsv()
Found in the readr
package
This function is used to read tab-delimited text files into a data frame.
read_delim()
is a more general function that reads any file with a deliminator.
read_tsv(file, col_names = TRUE, col_types = NULL, skip = 0, n_max = Inf)
read_delim(file, delim ="/")
read_xlsx()
read Excel files
excel_sheets()
list the names of different sheets in an Excel workbook before importing data using read_excel()
.
library(xlsx)
mydata <- read.xlsx("c:/myexcel.xlsx", 1)
read_excel()
imports data. Part of readxl
package.
read_excel(path, sheet = 1, range = NULL, col_names = TRUE, col_types = NULL, ...)
From SPSS
get file='c:\mydata.sav'.
export outfile='c:\mydata.por'.
# in R
library(Hmisc)
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors
fread()
Part of data.table
package
similar to read.table()
It infers column types and separators
extremely fast.
fread(input, sep = ",", header = "auto", data.table = FALSE, ...)
Data Format | Import Function | Saving Function |
CSV | read_csv() | write_csv() |
TSV | read_tsv() | write_tsv() |
Delimited Text | read_delim() | write_delim() |
Excel | read_excel() | write_excel() |
Table | read_table() | write_table() |
SAS | read_sas() | Not Applicable |
SPSS | read_sav() | Not Applicable |
Feather | read_feather() | write_feather() |
Parquet | read_parquet() | write_parquet() |
Arrow | read_arrow() | write_arrow() |
Import from Web
# Example: Reading data from a web URL
url <- "https://example.com/data.csv"
data <- read.table(url, header = TRUE, sep = ",")
Import API Data
library(httr)
library(jsonlite)
# Make an API request and parse JSON response
response <- GET("https://api.example.com/data")
data <- fromJSON(content(response, "text"))
The GET()
function from httr
is used to make HTTP GET requests, and fromJSON()
from jsonlite is used to parse JSON responses.
Import JSON Data
library(jsonlite)
# Read JSON data from a file
data <- fromJSON("data.json")
Further Reading
https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf
Subscribe to my newsletter
Read articles from Wobbly Geek directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Wobbly Geek
Wobbly Geek
Hello, I'm passionate about transforming raw data into actionable insights, driven by a lifelong fascination with numbers. As a data analyst, I enjoy uncovering meaningful patterns and collaborating with like-minded individuals. I'm also a strong advocate for mental health and use data to contribute to this important cause. My background in the medical field enhances my analytical approach, bridging the gap between healthcare and data analysis.