Objectives

Introduced the rule set table (RST) as an interface to incorporate clinician's input into data preprocessing (DP) for AI models in nuclear medicine.
Evaluated the impact of RST on the predictive performance of machine learning (ML) models in three different cancer cohorts (glioma, prostate, and diffuse large B-cell lymphoma (DLBCL)).
Demonstrated that RST, when combined with manual DP, improved the balanced accuracy (BACC) of ML models by up to 18% compared to models without RST.

Methodology

Implemented a rule set table (RST) to translate clinician's input (exp-keep, exp-remove, pref-keep, pref-remove) into machine-readable instructions for DP algorithms.
Incorporated commonly used algorithms for DP of clinical cohorts in single and multi-center scenarios.
Utilized a 100-fold Monte Carlo cross-validation scheme for single-center cohorts and a dual-center setup for DLBCL cohort.
Employed the XGBoost algorithm for classification tasks across all established models.
Compared the performance of RST across all actions, as well as without RST, in both manual and automated (ML-driven data preparation, MLDP) settings for each cohort.

Performance increase of ML models with manual preprocessing combined with RST was up to 18% BACC compared to models without RST.
ML models with "exp-keep" and "pref-keep" instructions showed the highest performance increase: +18% BACC (glioma), +6% BACC (prostate), and +3% BACC (DLBCL) compared to other models across all datasets.
Specific BACC values for different scenarios are provided in Table 3, along with p-values and confidence intervals in Supplemental Table S3.

The study presents a novel approach (RST) to incorporate clinical domain knowledge into the DP process, which is a significant contribution. However, the validation relies on previously identified high-ranking features, limiting the assessment of RST's ability to discover new relevant features.
The study could benefit from a more detailed explanation of how the "pref-keep" and "pref-remove" actions are weighted and prioritized within the DP algorithms. The criteria in Supplemental Table S1 are relatively general.
While the study compares manual DP and MLDP, it would be valuable to investigate the performance of RST with other automated DP methods besides MLDP.
The retrospective nature of the study, using pre-identified features, is acknowledged as a limitation. A prospective study involving clinicians actively providing input through RST would strengthen the findings.