Article Review: Clinician driven automated data preprocessing in nuclear medicine AI environments

Aldo YangAldo Yang
2 min read

Objectives

  • Introduced the rule set table (RST) as an interface to incorporate clinician's input into data preprocessing (DP) for AI models in nuclear medicine.
  • Evaluated the impact of RST on the predictive performance of machine learning (ML) models in three different cancer cohorts (glioma, prostate, and diffuse large B-cell lymphoma (DLBCL)).
  • Demonstrated that RST, when combined with manual DP, improved the balanced accuracy (BACC) of ML models by up to 18% compared to models without RST.

Methodology

  • Implemented a rule set table (RST) to translate clinician's input (exp-keep, exp-remove, pref-keep, pref-remove) into machine-readable instructions for DP algorithms.
  • Incorporated commonly used algorithms for DP of clinical cohorts in single and multi-center scenarios.
  • Utilized a 100-fold Monte Carlo cross-validation scheme for single-center cohorts and a dual-center setup for DLBCL cohort.
  • Employed the XGBoost algorithm for classification tasks across all established models.
  • Compared the performance of RST across all actions, as well as without RST, in both manual and automated (ML-driven data preparation, MLDP) settings for each cohort.

Results

  • Performance increase of ML models with manual preprocessing combined with RST was up to 18% BACC compared to models without RST.
  • ML models with "exp-keep" and "pref-keep" instructions showed the highest performance increase: +18% BACC (glioma), +6% BACC (prostate), and +3% BACC (DLBCL) compared to other models across all datasets.
  • Specific BACC values for different scenarios are provided in Table 3, along with p-values and confidence intervals in Supplemental Table S3.

Discussions

  • The study presents a novel approach (RST) to incorporate clinical domain knowledge into the DP process, which is a significant contribution. However, the validation relies on previously identified high-ranking features, limiting the assessment of RST's ability to discover new relevant features.
  • The study could benefit from a more detailed explanation of how the "pref-keep" and "pref-remove" actions are weighted and prioritized within the DP algorithms. The criteria in Supplemental Table S1 are relatively general.
  • While the study compares manual DP and MLDP, it would be valuable to investigate the performance of RST with other automated DP methods besides MLDP.
  • The retrospective nature of the study, using pre-identified features, is acknowledged as a limitation. A prospective study involving clinicians actively providing input through RST would strengthen the findings.

Reference: Clinician driven automated data preprocessing in nuclear medicine AI environments

0
Subscribe to my newsletter

Read articles from Aldo Yang directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aldo Yang
Aldo Yang