My experience in GA4GH @GSoC '24

Aarav MehtaAarav Mehta
7 min read

In the summer of 2024, guided by the expertise of Pavel Nikonorov and Alex Kanitz, I undertook a significant technological challenge. It is with great pride that I present the culmination of my efforts: Extensible GA4GH Client Library/SDK and Command Line Interface implemented in Rust.

Throughout this endeavor, I delved deep into the complexities of TES API, and crafted a library to access them safely in Rust, and a Command Line Interface, so that Researchers can easily access these API's. This project stands as a testament to collaborative effort and the shared vision of advancing scientific discovery through technology.

Click here for proposal submitted.

Background

The Global Alliance for Genomics and Health better known as GA4GH is a "policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework."

ELIXIR coordinates and develops life science resources across Europe so that researchers can more easily find, analyze and share data, exchange expertise, and implement best practices.

ELIXIR Cloud & AAI is a cross-platform initiative of ELIXIR and a Driver Project of the GA4GH that develops services towards establishing a federated cloud computing network that enables the analysis of population-scale genomic and phenotypic data across participating, international nodes.

ELIXIR Cloud components are the Web Components which are developed & managed by the ELIXIR Cloud & AII Community.

Motivation

The motivation behind this project was to bridge the gap between the vast array of tools available in the genomics and health domain and the end-users who could benefit from them.

  • Development of the Generic GA4GH SDK Library: My project aimed to simplify the use of GA4GH services for researchers and developers by creating a Rust-based SDK and CLI that ensures safe, efficient API access. Rust's speed and memory management, along with its ability to support confidential computing, made it ideal for this project.

  • Creation of a Command Line Interface to access the library: The CLI interface is mainly for Researchers to easily use and access the API's without worrying about the details of the API and how to send/access the data in a secure manner.

Designing the System Architecture

I had a lot of meetings with my mentors for deciding the architecture of GA4GH SDK/CLI project in the month of May.

The first plan of action was to figure out the architecture of the Library, what structs will be there, and how they will interact. This is the final decided architecture. There are 2 versions over here. One is a more low level view regarding how different structs can interact, while the other version is a more high level view of how these libraries can interact.

For the CLI, it was decided that we could use Rust, since it is a very robust language, and it is easier to integrate a Rust CLI with rust library.

Core Components and Features

The core components of this project are:

  • Core Library / SDK: a modular extensible Rust-based library for interacting with
    GA4GH services including DRS, TRS, TES, WES, and Passports.

  • Multiple API Versions, (experimentally) by auto code generation from OpenAPI specs.

  • Cross-Language Bindings for various programming languages (e.g., Python, Go, C, Java).

  • Confidential Computing Support for hardware-protected data privacy.

  • Command Line Interface (CLI): a user-friendly tool to interface with GA4GH services.

Pull Requests

After the System Architecture was made, we started making the code for the library. This code was divided into several PR’s:

  1. feat: an initial client implementation of GA4GH ServiceInfo and TES APIs as a library

    • Description: This issue was the first PR I made in my GSOC period. After creating an initial version of the GA4GH-SDK library, I had created this PR. Later, Alex suggested to split this into multiple PRs. So,after changing a lot of changes suggested, I had closed this PR, and split them into multiple PRs.
  2. chore: add .gitignore : The .gitignore file was added in this PR.

  3. docs: add README : The README file was added in this PR.

  4. chore: add Rust build system : The Cargo.toml file was added in this PR.

  5. feat: add script to generate OAI models

    • Description: In this PR, I have introduced a script to automate the generation of Rust models from OpenAPI specifications, ensuring the necessary structs are built for the main functions.
  6. feat(serviceinfo,tes): add models

    • Description: In this PR, I have added all the autogenerated models created by running the script in the #27 This PR doesn't contain any manually written code, and everything is autogenerated.
  7. feat: add configuration & transport structs

    • Description: This pull request introduces two new structs: Configuration for storing API request details and Transport for making HTTP requests using the reqwest library. It also includes unit tests for the Transport struct.
  8. feat(serviceinfo): add struct

    • Description: This pull request introduces the ServiceInfo class to handle details of any implementation of the main GA4GH API and return its details.
  9. feat(tes): add struct

    • Description: This pull request introduces the TES struct with task management methods (create, list, get, delete, get status Tasks) using the TES API and corresponding unit tests, and integration tests with Funnel.
  10. ci: workflows for local runs and GH actions

    • Description: This pull request introduced initial continuous integration workflows for local and GitHub CI/CD, but was later closed, to use the cookie cutter templates developed by Javed at ci: add workflows
  11. feat(cli): add TES support

    • Description: This pull adds a new CLI tool for TES task management with commands for creating, listing, retrieving, checking the status of, and canceling tasks, along with usage documentation.
  12. feat(cli): add config

    • Description: This pull adds the functionality to read configuration from a JSON file and update the documentation with usage examples for the CLI tool
  13. python bindings

    • Description: This branch adds a python bindings for the entire Rust library created until now, so that users can access this library in Python as well.

Live Demo

Here is a live demo of the GA4GH-CLI:

Next Steps and Future Contributions

  • The SDK and CLI are is completed made for TES and ServiceInfo APIs;

  • Here is the version 0.1.0: https://github.com/elixir-cloud-aai/ga4gh-sdk .

  • We are looking for contributors to integrate TRS, DRS and WES API too.

  • The Python Bindings are complete, and meanwhile we are also looking for contributors willing to add bindings for other languages.

  • This project will be displayed at ELIXIR BioHackathon, BioHack Cloud:
    stack: ga4gh-sdk, aTLS lib, TES-crypt4gh-middleware
    https://github.com/elixir-europe/biohackathon-projects-2024/blob/main/30.md, by Alex and Pavel, to showcase the confidential computing abilities of GENXT

Presenting the work

  1. My Final Report

  2. I spoke about my project at the session, “Towards GA4GH-powered SPEs/TREs”. Here is what I presented: slides, live demo

  3. I will also speak at ELIXIR Google Summer of Code 2024 Final presentations.

Outlook

I successfully achieved all the primary objectives outlined in the proposal. While the project's scope and timeframe did not allow for the integration of API's other than TES, the basic outline for making other libraries is made, which can be the same architecture as TES (with the exception of AAI). All the other goals were met, with unit and integration tests and documentation. Also, the bindings stretch goal is also almost complete. Overall, I am very happy with the work done, and am excited to see how this project evolves, and how GENXT integrates confidential computing in this, and gets used by researchers.

Building on the project's foundation, I think the following issues are the most major ones to be added:

  • Creating a more generalized integration testing, so that other TES implementations apart from Funnel are used.

  • Adding the remaining GA4GH API's to the project

Acknowledgment

I would like to express my special thanks of gratitude to my mentors Pavel Nikonorov and Alex Kanitz, who gave me the golden opportunity to work on the project "Extensible GA4GH Client Library/SDK and Command Line Interface implemented in Rust". The invaluable guidance, feedback, and support of my mentors throughout this journey have been instrumental, and has taught me a lot about how to work in an organization, and about GA4GH and GENXT. Their expertise and insights were pivotal in shaping the direction and outcomes of this project.

I also want to thank Ankur Patil, Ishan Pandhare and Shishir Kushwaha for motivating me to go ahead and do GSoC in March.

I also want to extend my appreciation to the GSOC program for providing me with this remarkable opportunity to contribute to a project with such profound implications for the future of genomics and health.

0
Subscribe to my newsletter

Read articles from Aarav Mehta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aarav Mehta
Aarav Mehta