Personal highlights from Nextflow Summit: Boston 2024

Ken BrewerKen Brewer
3 min read

I had the pleasure of attending the Nextflow Summit: Boston 2024 last week. It was a fantastic experience catching up with friends in the Boston bioinformatics community and finally meeting up in person with some of my collaborators from the nf-core community that I mostly just know by their GitHub handles. There were a number of excellent talks, but here some key details and themes that stood out to me.

1) Nextflow adoption and impact is accelerating

In Evan Flodan's welcome address he shared some convincing data from telemetry and other sources showing that Nextflow is the fastest growing workflow management for Biology. One piece of data came from a recent biorxiv pre-print by the EuroFAANG and nf-core teams:

This chart is exciting to me for two reasons:

  1. In comparison to download stats alone, scientific publications likely represent successful scientific output.

  2. Publications are likely a lagging indicator. If the trend in Nextflow citations from 2022 to 2023 holds, we may be seeing a transformational shift in the adoption of containerized workflows for science.

To me, it seems like Nextflow's foundational hypothesis that better software enables better science is increasingly being validated by the numbers.

2) Imaging analysis has a strong and growing presence in the nf-core community.

Although the early years of Nextflow saw adoption most strongly for processing next-generation sequencing (NGS) data, there is a growing diversity in the scientific disciplines that have been taking advantage of the utility it provides. So as someone with a strong scientific interest in image analysis (see pycytominer), I was thrilled to see several talks focused specifically on new Nextflow pipelines designed for large-scale processing of various forms of imaging data. Pipelines were demoed focused on multiple imaging data modalities, including:

The recently launched combinatorial fluorescence in-situ hybridization analysis pipeline nf-core/molkart also had a prominent role in Seqera Lab's demo of their new Data Studios feature:

Rob Syme from Seqera demoing nf-core/molkart in Data Studios

3) Multiple companies are improving their Nextflow infrastructure offerings

As expected, Seqera Labs announced and demoed several new features for Seqera Platform including the previously-mentioned Data Studios. One exciting free-to-everyone feature is Seqera Containers, which provides a dead-simple interface for building on-demand, multi-architecture docker images for any combination of pypi and conda packages.

Beyond that, I was excited to see other companies expanding their Nextflow infrastructure tooling.

Memverge discussed their Memory Machine Cloud offering which can act as a control plane for launching Nextflow pipelines. It has the very clever capability for a spot instance to pause mid-processing and resume on different instance when the spot allocation is reclaimed. Memverge also discussed their effort into addressing the I/O bottleneck of many data-intensive bioinformatics pipelines with their development of the open-source JuiceFS file system.

Re-scale announced their support for an executor plugin that allows Nextflow to orchestrate tightly coupled jobs on Re-scale's cloud-based High-Performance Computing (HPC) or High-Throughput Computing (HTC) platforms.

Finally, Colby Ford from Tuple discussedahab, which can manage Nextflow, Snakemake, WDL and CWL-based in Kubernetes clusters deployed in Azure.

Conclusions

It's clear that the Nextflow ecosystem is rapidly evolving, with growing adoption, expanding use cases in imaging analysis, and significant advancements in infrastructure tooling. It's an exciting time to be part the Nextflow community!

Changelog:

  • 2024-06-14 - Updated cover image to picture of me meeting up with nf-core collaborator Maxime Garcia.
0
Subscribe to my newsletter

Read articles from Ken Brewer directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ken Brewer
Ken Brewer

Passionate about computational biology, bioinformatics, software engineering, machine learning and data science. Currently building data pipelines that can scale for the future of genetic medicine at GeneDx.