ClinTrialX: Simplifying Clinical Trials Data Fetching Using R

Visit: https://ineelhere.github.io/clintrialx/

The clintrialx R package offers an easy and user-friendly way to interact with clinical trial data sources, specifically ClinicalTrials.gov and the AACT (Aggregate Analysis of ClinicalTrials.gov) database. In this guide, we’ll explore how to use clintrialx to fetch, analyze, and work with clinical trial data efficiently.

Installation

First, you need to install the devtools package if you haven't already:

install.packages("devtools")

Next, use devtools to install the clintrialx package from GitHub:

devtools::install_github("ineelhere/clintrialx")

Load the clintrialx library:

library(clintrialx)

Retrieving Clinical Trial Data

Single Trial Data

To retrieve data for a single clinical trial, use the ctg_get_nct function with the trial's NCT number:

trial_data <- ctg_get_nct("NCT04000165")
print(trial_data)
# A tibble: 1 × 30
  `NCT Number` `Study Title`    `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
  <chr>        <chr>            <chr>       <chr>   <chr>          <chr>           <chr>           <chr>      <chr>        
1 NCT04000165  A Dose-Finding … https://cl… NA      COMPLETED      "Background:\n… YES             Sickle Ce… DRUG: AG-348 
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
#   `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
#   Enrollment <chr>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
#   `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <chr>,
#   `Results First Posted` <chr>, `Last Update Posted` <chr>, Locations <chr>, `Study Documents` <chr>

Multiple Trials Data

You can also retrieve data for multiple trials by passing a vector of NCT numbers:

multiple_trials <- ctg_get_nct(c("NCT04000165", "NCT04002440"))
print(multiple_trials)
# A tibble: 2 × 30
  `NCT Number` `Study Title`    `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
  <chr>        <chr>            <chr>       <chr>   <chr>          <chr>           <chr>           <chr>      <chr>        
1 NCT04000165  A Dose-Finding … https://cl… NA      COMPLETED      "Background:\n… YES             Sickle Ce… DRUG: AG-348 
2 NCT04002440  Directed Use of… https://cl… DREAM-… ACTIVE_NOT_RE… "Among ambulat… NO              Efficacy|… COMBINATION_…
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
#   `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
#   Enrollment <chr>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
#   `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <chr>,
#   `Results First Posted` <chr>, `Last Update Posted` <chr>, Locations <chr>, `Study Documents` <chr>

If you are interested in specific fields, you can specify them using the fields parameter:

specific_fields <- ctg_get_nct(
  c("NCT04000165", "NCT04002440"),
  fields = c("NCT Number", "Study Title", "Study Status")
)
print(specific_fields)
# A tibble: 2 × 3
  `NCT Number` `Study Title`                                                                          `Study Status`       
  <chr>        <chr>                                                                                  <chr>                
1 NCT04000165  A Dose-Finding Study of AG-348 in Sickle Cell Disease                                  COMPLETED            
2 NCT04002440  Directed Use of REmote Patient Management System AMia to Achieve Prescribed Dry Weight ACTIVE_NOT_RECRUITING

Advanced Queries

Filtering by Condition, Location, and Status

You can filter trials based on conditions, locations, and statuses:

ctg_get_fields(condition = "diabetes", location = "Kolkata", status = "RECRUITING")
# A tibble: 8 × 30
  `NCT Number` `Study Title`    `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
  <chr>        <chr>            <chr>       <chr>   <chr>          <chr>           <chr>           <chr>      <chr>        
1 NCT05929066  A Study of Reta… https://cl… TRIUMP… RECRUITING     "The purpose o… NO              Obesity|O… DRUG: Retatr…
2 NCT06221969  A Research Stud… https://cl… NA      RECRUITING     "This study wi… NO              Type 2 Di… DRUG: Cagril…
3 NCT06065540  A Research Stud… https://cl… REIMAG… RECRUITING     "The study wil… NO              Type 2 Di… DRUG: Cagril…
4 NCT05254002  A Study to Lear… https://cl… CONFID… RECRUITING     "Finerenone wo… NO              Type 2 Di… DRUG: Finere…
5 NCT04596631  A Research Stud… https://cl… PIONEE… RECRUITING     "This study co… NO              Diabetes … DRUG: Oral s…
6 NCT06370715  A Study of LY90… https://cl… NA      RECRUITING     "The purpose o… NO              Diabetes … DRUG: Insuli…
7 NCT05929079  A Study of Reta… https://cl… TRIUMP… RECRUITING     "The purpose o… NO              Type 2 Di… DRUG: Retatr…
8 NCT06269107  A Research Stud… https://cl… COMBIN… RECRUITING     "This study wi… NO              Type 2 Di… DRUG: IcoSem…
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
#   `Other Outcome Measures` <lgl>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
#   Enrollment <dbl>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
#   `Start Date` <date>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <date>,
#   `Results First Posted` <lgl>, `Last Update Posted` <date>, Locations <chr>, `Study Documents` <lgl>

Filtering by Title and Status

To filter trials by title and status with a specified page size:

ctg_get_fields(title = "vaccine", status = "COMPLETED", page_size = 50)
# A tibble: 50 × 30
   `NCT Number` `Study Title`   `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
   <chr>        <chr>           <chr>       <chr>   <chr>          <chr>           <chr>           <chr>      <chr>        
 1 NCT00783926  Phase 1 Study … https://cl… NA      COMPLETED      "The objective… NO              Influenza… BIOLOGICAL: …
 2 NCT00518726  Safety and Imm… https://cl… NA      COMPLETED      "To evaluate t… NO              Influenza  BIOLOGICAL: …
 3 NCT00003357  Vaccine Therap… https://cl… NA      COMPLETED      "RATIONALE: Va… NO              Breast Ca… BIOLOGICAL: …
 4 NCT05398926  Immunogenicity… https://cl… NA      COMPLETED      "This is an op… NO              COVID-19   BIOLOGICAL: …
 5 NCT00657657  Safety and Imm… https://cl… NA      COMPLETED      "In this study… YES             Hepatitis… BIOLOGICAL: …
 6 NCT00323557  Immuno-Augment… https://cl… NA      COMPLETED      "The goal of t… YES             Leukemia   DRUG: Sargra…
 7 NCT02453048  Study of BPZE1… https://cl… NA      COMPLETED      "This study ev… YES             Pertussis… BIOLOGICAL: …
 8 NCT00240526  LT F-up Study … https://cl… NA      COMPLETED      "To evaluate t… YES             Hepatitis… BIOLOGICAL: …
 9 NCT05107557  Immunogenicity… https://cl… NA      COMPLETED      "This study is… NO              COVID-19   BIOLOGICAL: …
10 NCT00707148  Pertussis Vacc… https://cl… NA      COMPLETED      "The purpose o… NO              Diphtheri… DRUG: Placeb…
# ℹ 40 more rows
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
#   `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
#   Enrollment <dbl>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
#   `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <date>,
#   `Results First Posted` <date>, `Last Update Posted` <date>, Locations <chr>, `Study Documents` <chr>
# ℹ Use `print(n = ...)` to see more rows

Counting Trials

To count the number of trials that match specific criteria:

ctg_count(
  condition = "Cancer",
  location = "India",
  title = NULL,
  intervention = "Drug",
  status = "RECRUITING"
)
[1] 100

Bulk Data Fetching

Fetch All Data

To fetch all available data based on your query:

data <- ctg_bulk_fetch(title = "vaccine", status = "COMPLETED")
data
Fetching Page 6/6 - ================================= Completed 100% 🕒 00:00:17                                           
# A tibble: 5,513 × 30
   `NCT Number` `Study Title`   `Study URL` Acronym `Study Status` `Brief Summary` `Study Results` Conditions Interventions
   <chr>        <chr>           <chr>       <chr>   <chr>          <chr>           <chr>           <chr>      <chr>        
 1 NCT00783926  Phase 1 Study … https://cl… NA      COMPLETED      "The objective… NO              Influenza… BIOLOGICAL: …
 2 NCT00518726  Safety and Imm… https://cl… NA      COMPLETED      "To evaluate t… NO              Influenza  BIOLOGICAL: …
 3 NCT00003357  Vaccine Therap… https://cl… NA      COMPLETED      "RATIONALE: Va… NO              Breast Ca… BIOLOGICAL: …
 4 NCT05398926  Immunogenicity… https://cl… NA      COMPLETED      "This is an op… NO              COVID-19   BIOLOGICAL: …
 5 NCT00657657  Safety and Imm… https://cl… NA      COMPLETED      "In this study… YES             Hepatitis… BIOLOGICAL: …
 6 NCT00323557  Immuno-Augment… https://cl… NA      COMPLETED      "The goal of t… YES             Leukemia   DRUG: Sargra…
 7 NCT02453048  Study of BPZE1… https://cl… NA      COMPLETED      "This study ev… YES             Pertussis… BIOLOGICAL: …
 8 NCT00240526  LT F-up Study … https://cl… NA      COMPLETED      "To evaluate t… YES             Hepatitis… BIOLOGICAL: …
 9 NCT05107557  Immunogenicity… https://cl… NA      COMPLETED      "This study is… NO              COVID-19   BIOLOGICAL: …
10 NCT00707148  Pertussis Vacc… https://cl… NA      COMPLETED      "The purpose o… NO              Diphtheri… DRUG: Placeb…
# ℹ 5,503 more rows
# ℹ 21 more variables: `Primary Outcome Measures` <chr>, `Secondary Outcome Measures` <chr>,
#   `Other Outcome Measures` <chr>, Sponsor <chr>, Collaborators <chr>, Sex <chr>, Age <chr>, Phases <chr>,
#   Enrollment <dbl>, `Funder Type` <chr>, `Study Type` <chr>, `Study Design` <chr>, `Other IDs` <chr>,
#   `Start Date` <chr>, `Primary Completion Date` <chr>, `Completion Date` <chr>, `First Posted` <date>,
#   `Results First Posted` <date>, `Last Update Posted` <date>, Locations <chr>, `Study Documents` <chr>
# ℹ Use `print(n = ...)` to see more rows

If you want to fetch all data, this query will do that for you.

ctg_bulk_fetch(
  condition = NULL,
  location = NULL,
  title = NULL,
  intervention = NULL,
  status = NULL
)

Another example

For example, to fetch data for trials in India:

trials <- ctg_bulk_fetch(location="india")
print(trials)

AACT Database Connection and Custom Queries

Setting Up Environment Variables

First refer to this section - https://ineelhere.github.io/clintrialx/#setup-aact-account

Set your database credentials in the .Renviron file and load them:

readRenviron(".Renviron")

Connecting to the Database

Establish a connection to the database using your credentials:

con <- aact_connection(Sys.getenv('user'), Sys.getenv('password'))

Running a Custom Query

You can run custom SQL queries on the database. For example, to select specific fields from the studies table:

query <- "SELECT nct_id, source, enrollment, overall_status FROM studies LIMIT 5;"
results <- aact_custom_query(con, query)
print(results)
       nct_id                                  source enrollment        overall_status
1 NCT06105710 University of California, San Francisco         24    NOT_YET_RECRUITING
2 NCT05813210                     Kantonsspital Olten        772    NOT_YET_RECRUITING
3 NCT04868110     University of Kansas Medical Center         56 ACTIVE_NOT_RECRUITING
4 NCT04097210    Helsinki University Central Hospital        110             COMPLETED
5 NCT03184623               Medical Centre Leeuwarden         23             COMPLETED

The clintrialx package thus provides a comprehensive set of tools for accessing and analyzing clinical trial data. Whether you need data for a single trial, multiple trials, or want to perform advanced queries, clintrialx has you covered. By following this guide, you should be able to effectively utilize the package for your research and data analysis needs.

0
Subscribe to my newsletter

Read articles from Indraneel Chakraborty directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Indraneel Chakraborty
Indraneel Chakraborty

I am Indraneel Chakraborty - a recovering Bioinformatician in love with Technology, Data Science and DevOps. Solving problems (not limited to Bioinformatics) with code-first, data-centric approaches on cloud architecture is my primary focus. Currently, I'm working with Elucidata as a Bioinformatics Engineer, helping teams to scale up using advanced workflow management systems like Nextflow and cloud based solutions to effectively manage technological resources, thereby cutting costs and time taken in providing ML ready biomedical data. Other than these, I am also involved in development of webapps using R-Shiny (R programming) and Streamlit (Python). Apart from my full time job, I also volunteer as an application creator at Streamlit, open source lesson maintainer at The Carpentries, technical reviewer at Packt Publications, Community member at Data Science Festival London and beta tester at Coursera. Found my profile interesting? Lets talk!