ClinTrialX: Simplifying Clinical Trials Data Fetching Using R
Visit: https://ineelhere.github.io/clintrialx/
The clintrialx
R package offers an easy and user-friendly way to interact with clinical trial data sources, specifically ClinicalTrials.gov and the AACT (Aggregate Analysis of ClinicalTrials.gov) database. In this guide, we’ll explore how to use clintrialx
to fetch, analyze, and work with clinical trial data efficiently.
Installation
First, you need to install the devtools
package if you haven't already:
install.packages("devtools")
Next, use devtools
to install the clintrialx
package from GitHub:
devtools::install_github("ineelhere/clintrialx")
Load the clintrialx
library:
library(clintrialx)
Retrieving Clinical Trial Data
Single Trial Data
To retrieve data for a single clinical trial, use the ctg_get_nct
function with the trial's NCT number:
trial_data <- ctg_get_nct("NCT04000165")
print(trial_data)
# A tibble: 1 × 30
`NCT Number` `Study Title` `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NCT04000165 A Dose-Finding … https://cl… NA COMPLETED "Background:\n… YES Sickle Ce… DRUG: AG-348
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
# `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
# Enrollment <chr>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
# `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <chr>,
# `Results First Posted` <chr>, `Last Update Posted` <chr>, Locations <chr>, `Study Documents` <chr>
Multiple Trials Data
You can also retrieve data for multiple trials by passing a vector of NCT numbers:
multiple_trials <- ctg_get_nct(c("NCT04000165", "NCT04002440"))
print(multiple_trials)
# A tibble: 2 × 30
`NCT Number` `Study Title` `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NCT04000165 A Dose-Finding … https://cl… NA COMPLETED "Background:\n… YES Sickle Ce… DRUG: AG-348
2 NCT04002440 Directed Use of… https://cl… DREAM-… ACTIVE_NOT_RE… "Among ambulat… NO Efficacy|… COMBINATION_…
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
# `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
# Enrollment <chr>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
# `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <chr>,
# `Results First Posted` <chr>, `Last Update Posted` <chr>, Locations <chr>, `Study Documents` <chr>
If you are interested in specific fields, you can specify them using the fields
parameter:
specific_fields <- ctg_get_nct(
c("NCT04000165", "NCT04002440"),
fields = c("NCT Number", "Study Title", "Study Status")
)
print(specific_fields)
# A tibble: 2 × 3
`NCT Number` `Study Title` `Study Status`
<chr> <chr> <chr>
1 NCT04000165 A Dose-Finding Study of AG-348 in Sickle Cell Disease COMPLETED
2 NCT04002440 Directed Use of REmote Patient Management System AMia to Achieve Prescribed Dry Weight ACTIVE_NOT_RECRUITING
Advanced Queries
Filtering by Condition, Location, and Status
You can filter trials based on conditions, locations, and statuses:
ctg_get_fields(condition = "diabetes", location = "Kolkata", status = "RECRUITING")
# A tibble: 8 × 30
`NCT Number` `Study Title` `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NCT05929066 A Study of Reta… https://cl… TRIUMP… RECRUITING "The purpose o… NO Obesity|O… DRUG: Retatr…
2 NCT06221969 A Research Stud… https://cl… NA RECRUITING "This study wi… NO Type 2 Di… DRUG: Cagril…
3 NCT06065540 A Research Stud… https://cl… REIMAG… RECRUITING "The study wil… NO Type 2 Di… DRUG: Cagril…
4 NCT05254002 A Study to Lear… https://cl… CONFID… RECRUITING "Finerenone wo… NO Type 2 Di… DRUG: Finere…
5 NCT04596631 A Research Stud… https://cl… PIONEE… RECRUITING "This study co… NO Diabetes … DRUG: Oral s…
6 NCT06370715 A Study of LY90… https://cl… NA RECRUITING "The purpose o… NO Diabetes … DRUG: Insuli…
7 NCT05929079 A Study of Reta… https://cl… TRIUMP… RECRUITING "The purpose o… NO Type 2 Di… DRUG: Retatr…
8 NCT06269107 A Research Stud… https://cl… COMBIN… RECRUITING "This study wi… NO Type 2 Di… DRUG: IcoSem…
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
# `Other Outcome Measures` <lgl>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
# Enrollment <dbl>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
# `Start Date` <date>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <date>,
# `Results First Posted` <lgl>, `Last Update Posted` <date>, Locations <chr>, `Study Documents` <lgl>
Filtering by Title and Status
To filter trials by title and status with a specified page size:
ctg_get_fields(title = "vaccine", status = "COMPLETED", page_size = 50)
# A tibble: 50 × 30
`NCT Number` `Study Title` `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NCT00783926 Phase 1 Study … https://cl… NA COMPLETED "The objective… NO Influenza… BIOLOGICAL: …
2 NCT00518726 Safety and Imm… https://cl… NA COMPLETED "To evaluate t… NO Influenza BIOLOGICAL: …
3 NCT00003357 Vaccine Therap… https://cl… NA COMPLETED "RATIONALE: Va… NO Breast Ca… BIOLOGICAL: …
4 NCT05398926 Immunogenicity… https://cl… NA COMPLETED "This is an op… NO COVID-19 BIOLOGICAL: …
5 NCT00657657 Safety and Imm… https://cl… NA COMPLETED "In this study… YES Hepatitis… BIOLOGICAL: …
6 NCT00323557 Immuno-Augment… https://cl… NA COMPLETED "The goal of t… YES Leukemia DRUG: Sargra…
7 NCT02453048 Study of BPZE1… https://cl… NA COMPLETED "This study ev… YES Pertussis… BIOLOGICAL: …
8 NCT00240526 LT F-up Study … https://cl… NA COMPLETED "To evaluate t… YES Hepatitis… BIOLOGICAL: …
9 NCT05107557 Immunogenicity… https://cl… NA COMPLETED "This study is… NO COVID-19 BIOLOGICAL: …
10 NCT00707148 Pertussis Vacc… https://cl… NA COMPLETED "The purpose o… NO Diphtheri… DRUG: Placeb…
# ℹ 40 more rows
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
# `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
# Enrollment <dbl>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
# `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <date>,
# `Results First Posted` <date>, `Last Update Posted` <date>, Locations <chr>, `Study Documents` <chr>
# ℹ Use `print(n = ...)` to see more rows
Counting Trials
To count the number of trials that match specific criteria:
ctg_count(
condition = "Cancer",
location = "India",
title = NULL,
intervention = "Drug",
status = "RECRUITING"
)
[1] 100
Bulk Data Fetching
Fetch All Data
To fetch all available data based on your query:
data <- ctg_bulk_fetch(title = "vaccine", status = "COMPLETED")
data
Fetching Page 6/6 - ================================= Completed 100% 🕒 00:00:17
# A tibble: 5,513 × 30
`NCT Number` `Study Title` `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NCT00783926 Phase 1 Study … https://cl… NA COMPLETED "The objective… NO Influenza… BIOLOGICAL: …
2 NCT00518726 Safety and Imm… https://cl… NA COMPLETED "To evaluate t… NO Influenza BIOLOGICAL: …
3 NCT00003357 Vaccine Therap… https://cl… NA COMPLETED "RATIONALE: Va… NO Breast Ca… BIOLOGICAL: …
4 NCT05398926 Immunogenicity… https://cl… NA COMPLETED "This is an op… NO COVID-19 BIOLOGICAL: …
5 NCT00657657 Safety and Imm… https://cl… NA COMPLETED "In this study… YES Hepatitis… BIOLOGICAL: …
6 NCT00323557 Immuno-Augment… https://cl… NA COMPLETED "The goal of t… YES Leukemia DRUG: Sargra…
7 NCT02453048 Study of BPZE1… https://cl… NA COMPLETED "This study ev… YES Pertussis… BIOLOGICAL: …
8 NCT00240526 LT F-up Study … https://cl… NA COMPLETED "To evaluate t… YES Hepatitis… BIOLOGICAL: …
9 NCT05107557 Immunogenicity… https://cl… NA COMPLETED "This study is… NO COVID-19 BIOLOGICAL: …
10 NCT00707148 Pertussis Vacc… https://cl… NA COMPLETED "The purpose o… NO Diphtheri… DRUG: Placeb…
# ℹ 5,503 more rows
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
# `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
# Enrollment <dbl>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
# `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <date>,
# `Results First Posted` <date>, `Last Update Posted` <date>, Locations <chr>, `Study Documents` <chr>
# ℹ Use `print(n = ...)` to see more rows
If you want to fetch all data, this query will do that for you.
ctg_bulk_fetch(
condition = NULL,
location = NULL,
title = NULL,
intervention = NULL,
status = NULL
)
Another example
For example, to fetch data for trials in India:
trials <- ctg_bulk_fetch(location="india")
print(trials)
AACT Database Connection and Custom Queries
Setting Up Environment Variables
First refer to this section - https://ineelhere.github.io/clintrialx/#setup-aact-account
Set your database credentials in the .Renviron
file and load them:
readRenviron(".Renviron")
Connecting to the Database
Establish a connection to the database using your credentials:
con <- aact_connection(Sys.getenv('user'), Sys.getenv('password'))
Running a Custom Query
You can run custom SQL queries on the database. For example, to select specific fields from the studies
table:
query <- "SELECT nct_id, source, enrollment, overall_status FROM studies LIMIT 5;"
results <- aact_custom_query(con, query)
print(results)
nct_id source enrollment overall_status
1 NCT06105710 University of California, San Francisco 24 NOT_YET_RECRUITING
2 NCT05813210 Kantonsspital Olten 772 NOT_YET_RECRUITING
3 NCT04868110 University of Kansas Medical Center 56 ACTIVE_NOT_RECRUITING
4 NCT04097210 Helsinki University Central Hospital 110 COMPLETED
5 NCT03184623 Medical Centre Leeuwarden 23 COMPLETED
The clintrialx
package thus provides a comprehensive set of tools for accessing and analyzing clinical trial data. Whether you need data for a single trial, multiple trials, or want to perform advanced queries, clintrialx
has you covered. By following this guide, you should be able to effectively utilize the package for your research and data analysis needs.
Subscribe to my newsletter
Read articles from Indraneel Chakraborty directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Indraneel Chakraborty
Indraneel Chakraborty
I am Indraneel Chakraborty - a recovering Bioinformatician in love with Technology, Data Science and DevOps. Solving problems (not limited to Bioinformatics) with code-first, data-centric approaches on cloud architecture is my primary focus. Currently, I'm working with Elucidata as a Bioinformatics Engineer, helping teams to scale up using advanced workflow management systems like Nextflow and cloud based solutions to effectively manage technological resources, thereby cutting costs and time taken in providing ML ready biomedical data. Other than these, I am also involved in development of webapps using R-Shiny (R programming) and Streamlit (Python). Apart from my full time job, I also volunteer as an application creator at Streamlit, open source lesson maintainer at The Carpentries, technical reviewer at Packt Publications, Community member at Data Science Festival London and beta tester at Coursera. Found my profile interesting? Lets talk!