Nextflow, nf-core, Seqera and more

I am excited to announce that starting next week I will be joining Seqera as a Senior Developer Advocate supporting the Nextflow, nf-core and Seqera Platform user communities. For this role, I'm going to be relocating to the San Francisco Bay area to help build community within the vibrant BioPharma and Tech scenes in that city. As part of introducing myself to the broader community, I wanted to share a little about my journey as a scientist and engineer and why I am so passionate about the powerful open-source and commercial tools that are being built by Seqera.

Complexity of scientific workflows

As a PhD student in the Breaker Lab at Yale, my research focus was building computational pipelines for the discovery of novel noncoding RNA motifs. This involved computationally-intensive sifting through terabytes of bacterial genomes looking for RNA motifs with certain structural homologies and gene associations. As the number of different tools I wanted to incorporate into my research pipelines grew, I quickly began dealing with the complexity of programming large, multi-step pipelines.

While iterating on a complex, multi-step workflow, minor matters like the naming of intermediate files and discrepancies between my laptop and HPC cluster compute environments can quickly become sources of major frustration. I never ended up finding Nextflow during this time period, but I became strongly interested in software engineering best practices. I was convinced there had to be better ways to address these challenges than bash spaghetti with python meatballs.

Bash spaghetti with python metaballs

Moving to biotech, finding Nextflow

For my first role out of academia, I joined ProFound Therapeutics in 2021. ProFound was then a stealth-mode biotech startup in the Flagship Pioneering family of companies. I was their second computational hire, and my first project was to scale certain computational analyses to a massive collection of novel proteins they were studying. Before I started building pipelines, I took time to do a careful assessment of modern tooling for orchestrating complex scientific workflows.

There are a number of excellent bio-specific and general tech tools for building reproducible workflows, but Nextflow stood out to me for two key reasons:

Nextflow had excellent portability, where the same pipeline logic could be executed on a local computer, in AWS Cloud, or in an HPC cluster.
nf-core, a vibrant community of Nextflow users across the globe, were collaborating on building a collection of gold-standard, open-source bioinformatic pipelines for all kinds of relevant bioinformatic analyses.

After bringing my analysis of Nextflow's advantages to my computational lead, we decided to make Nextflow a core technology in our platform and were off to the races.... sort of.

Building with Seqera Platform

It turned out that there was a lot more to setting up a scalable bioinformatic platform than simply choosing to use Nextflow. We ran into headaches setting up our AWS Batch executors, setting up automations, and the need to train scientists who wanted to analyze their own data on the complexities of AWS.

Luckily, Seqera had recently come onto the block with a solution that seemed tailor-made to many of our biggest headaches. Seqera was founded by the original developers of Nextflow and their flagship product was a platform we could deploy into our own AWS account. Not only could it quickly deploy/modify some of the tricky parts of AWS infrastructure, Seqera Platform could set up automations, observability, and provide a user-friendly GUI for non-technical folks to run our Nextflow pipelines.

Thanks to my advocacy and the buy-in of my technical lead, we signed up with Seqera as one of their earliest commercial customers. The rock-solid combination of Nextflow and Seqera ended up fully living up to its promise and more as the Seqera team continued to add game-changing new features to both Nextflow and Seqera Platform over the nearly two years I worked with their team as a customer.

Super-scaling bioinformatics

For my next role after ProFound Tx, I joined GeneDx as a Senior Software Engineer on the bioinformatics platform team. The opportunity at GeneDx appealed to me for two reasons:

With hundreds of patients' genetic sequencing tests passing through bioinformatics pipelines every day, the data quality and reliability of my team's work was going to be critically important from Day 1.
I had the opportunity to act as technical lead for parts of the "Cloudflow" project: a major migration of bioinformatic pipelines from WDL running on-prem to Nextflow running in the cloud.

While the original migration plan involved setting up an in-house Nextflow orchestrator and building a variant-calling pipeline from scratch, we made two changes to the plan not long after I joined:

Instead of building a Nextflow pipeline from scratch, we decided to try configuring the open-source nf-core/sarek pipeline as the starting point for our planned production pipeline.
We started a proof-of-concept agreement with Seqera to explore using Seqera Platform as our pipeline orchestration solution.

With the combination of Seqera Platform and a gold-standard nf-core pipeline to build on, our small CloudFlow project team began delivering on project milestones at an incredible pace. Within a few short months of our change in direction, we had setup scalable bioinformatics infrastructure connected to Seqera Platform in not one, but two public clouds. We also developed proof-of-concept durable automations that handled the entire process from data coming off the sequencer to having processed genomic variants uploaded to our data portal.

What next?

Given the passion I've developed for Nextflow and Seqera products over the past 3+ years as a user and customer, it was a natural fit for me to join Seqera's community team when a position opened up. While the specifics of my role as developer advocate are still to-be-determined, here are a couple of topics that I'm incredibly passionate about that you'll likely hear me talk about in the coming months:

Highlighting the benefits for biopharma and healthcare companies that choose to build on and contribute to open-source projects like nf-core pipelines.
Expanding the range of scientific disciplines that are choosing to build sharable, reproducible pipelines using Nextflow.
Diversifying the bioinformatics talent and leadership pool by offering training and support to individuals from marginalized and underrepresented communities.
Bringing the power of modern software best practices like continuous integration and continuous deployments to complex data pipelines.

I'm thrilled to be getting more deeply involved in the incredibly vibrant open-source science communities that are built around Nextflow, nf-core, Seqera, and more! I'm also very much looking forward to meeting more of you in person and virtually over the coming months!

Nextflow, nf-core, Seqera and more...