Comparing OpenEvidence and ChatGPT: Evaluating Adherence to NCCN Melanoma Guidelines for Staging, Workup, and Treatment Options

Christina BearChristina Bear
26 min read

How we compared ChatGPT and OpenEvidence on their ability to provide accurate, guideline-based medical advice for melanoma staging, work-up, and treatment.

Introduction

Large language models (LLMs) such as ChatGPT (GPT-4) are increasingly being explored as tools to assist clinicians by generating evidence-based answers to medical questions, including staging, workup, and treatment recommendations. However, while LLMs excel at producing fluent, contextually appropriate responses, they may sometimes generate outdated or incorrect information if not carefully aligned to current guidelines. In contrast, tools like OpenEvidence are explicitly designed to cite and link to the latest medical literature, improving transparency and traceability, but they can be slower or less conversational in their output.

In this post, we evaluate two leading AI-based clinical tools, ChatGPT (GPT-4) and OpenEvidence, on their ability to provide accurate and up-to-date melanoma staging, workup, and treatment recommendations according to the latest clinical guidelines. By comparing the strengths and limitations of each approach, we aim to highlight how these technologies perform in a guideline-driven clinical context.

What We're Testing

The AI Models

  • ChatGPT (GPT-4): A general-purpose AI assistant that can answer questions on any topic

  • OpenEvidence: A specialized AI system designed specifically for medical knowledge and evidence-based medicine

The Question

Can these AI systems provide accurate, up-to-date medical advice that matches what expert physicians would recommend based on the latest clinical guidelines?

Methods

How We Tested the AI Systems

The Clinical Scenarios

We gave both AI systems the same three medical scenarios and asked them to provide staging and treatment recommendations. For each clinical scenario, we generated several independent responses (runs) from each model. This approach allowed us to measure not only the average performance but also the consistency and robustness of each model across multiple attempts.Here's an example of one of the prompts we used:

Actual Prompt Used:

  • Note: “Melanoma scenario” changed for each prompt
You are a dermatologist.

Respond in EXACTLY three sections, each beginning with the Markdown H2 header shown (## Stage, ## Work-up, ## Treatment). No other headers.

─────────────────────────────────────
Melanoma scenario
• Pathology: 1.4 mm Breslow thickness, non-ulcerated superficial-spreading melanoma on the anterior calf  
• Lymph nodes: no palpable inguinal or popliteal nodes  
• Imaging: none yet
─────────────────────────────────────

1. Stage – Provide the TNM designation and overall stage.  
2. Work-up – Summarize the recommended next diagnostic or staging steps.  
3. Treatment – Summarize the initial management you would recommend.

Cite your sources for each section, if possible, using references or links. Cite all sources inline in each section using raw identifiers only — write out full URLs, DOIs (doi:...), PMIDs (PMID: ...), or PMCs (PMC: ...), not hyperlinks or plain text.

Example Response from ChatGPT to above prompt:

Stage
The TNM classification for this melanoma scenario is as follows: T2a N0 M0. The "T2" indicates a Breslow thickness of more than 1.0 mm but not more than 2.0 mm, and "a" signifies that it is non-ulcerated. "N0" implies no regional lymph node metastasis, and "M0" indicates no distant metastasis. According to the American Joint Committee on Cancer (AJCC) 8th Edition Cancer Staging Manual, this corresponds to a Stage IB melanoma (doi:10.1002/cncr.36683).

Work-up
For a Stage IB melanoma, additional work-up may include a sentinel lymph node biopsy (SLNB) to evaluate the presence of metastasis in the regional lymph nodes, as the SLNB is the standard of care for melanomas with a Breslow thickness greater than 0.8 mm or with other high-risk features (doi:10.1200/JCO.2018.78.2005). Advanced imaging studies, such as PET/CT or MRI, are not routinely recommended unless there are clinical indications of metastasis (PMID: 31675400).

Treatment
Initial management for a Stage IB melanoma typically involves a wide local excision of the primary lesion with 1-2 cm margins, depending on the anatomical location and surgical considerations, to ensure complete removal of the melanoma (PMID: 29116527). If the SLNB is positive, further management may include a completion lymph node dissection or consideration of adjuvant therapy with immune checkpoint inhibitors such as pembrolizumab in certain cases (doi:10.1056/NEJMoa1709030). Regular follow-up for surveillance is also recommended to monitor for recurrence or new primary melanomas (PMID: 31675400).

The Gold Standard

For each scenario, we compared the AI responses against the official NCCN Clinical Practice Guidelines for Melanoma (v2.2024). These guidelines represent the consensus of expert physicians and are considered the standard of care.

Here are the exact gold standards we used for each scenario:

Stage 0/IA Melanoma (Melanoma in situ)

Stage 0: Melanoma in situ
Stage IA: Tumor <0.8 mm thick, no ulceration

Work-up
History and physical examination (H&P)
Routine imaging and laboratory tests not recommended
Imaging only if needed to evaluate specific signs or symptoms

Treatment
Wide excision (category 1 for stage IA)
Proceed to follow-up (ME-10)

Stage IB (T2a) Melanoma

Stage IB (T2a):
T2a: Tumor ≥1.0–2.0 mm thick without ulceration (Stage IB)

Work-up
History and physical examination (H&P)
Baseline imaging and laboratory tests not recommended, unless:
Needed for surgical planning
Prior to systemic treatment discussion/initiation
Imaging if needed to evaluate specific signs or symptoms
Discuss and offer sentinel node biopsy (SLNB)

Treatment
Wide excision (category 1)
Either without SLNB
Or with SLNB
If sentinel node negative →
Clinical trial for stage II
Or observation (ME-11)
Then proceed to follow-up (ME-10 and ME-11)
If sentinel node positive → proceed to Stage III workup and treatment (ME-5)

Stage II (T2b or higher) Melanoma

Stage II (T2b or higher):
T2b or higher: Tumor ≥1.0 mm with ulceration, or thicker

Work-up
History and physical examination (H&P)
Baseline imaging and laboratory tests not recommended, unless:
Needed for surgical planning
Prior to systemic treatment discussion/initiation
Imaging if needed to evaluate specific signs or symptoms
Discuss and offer sentinel node biopsy (SLNB)

Treatment
Wide excision (category 1)
Either without SLNB
Or with SLNB
If sentinel node negative →
Clinical trial for stage II
Or observation (ME-11)
Or for pathological stage IIB or IIC:
Pembrolizumab (category 1)
Nivolumab (category 1)
+/- primary tumor site radiation therapy (category 2B)
Then proceed to follow-up (ME-10 and ME-11)
If sentinel node positive → proceed to Stage III workup and treatment (ME-5)

How We Evaluated the Responses

1. Similarity Metrics (How Close to the Gold Standard?)

We used three different ways to measure how similar the AI responses were to the expert guidelines:

  • SBERT Similarity: Measures how similar the meaning is between answers (ignores exact words)

  • ROUGE Similarity: Measures how many words/phrases overlap between answers

  • BLEU Similarity: Measures exact word matching (very strict - low scores are normal)

2. AI Physician Grading (Medical Expert Evaluation)

We created an AI "physician grader" that evaluates responses like a real doctor would, meant to compare against other grading metrics and to aid manual grading for a “gold standard” of human grading. Here's how it works:

What is a System Prompt?

Think of a system prompt as the "job description" we give to an AI. It tells the AI what role to play and how to behave. In our case, we told the AI: "You are a dermatologist and expert in melanoma. Grade this answer against the gold standard."

The Grading System

Our AI physician grader evaluates each response on two main categories:

Medical Accuracy (0-6 points total):

  • Stage: Is the cancer staging correct? (0-2 points)

  • Workup: Are the recommended tests appropriate? (0-2 points)

  • Treatment: Is the treatment plan correct? (0-2 points)

Communication Quality (0-10 points total):

  • Accuracy: Are the medical facts correct? (0-2 points)

  • Relevance: Does it answer the specific question? (0-2 points)

  • Depth: Is there enough detail? (0-2 points)

  • Clarity: Is it well-written and clear? (0-2 points)

  • Completeness: Does it cover everything needed? (0-2 points)

Total Score: 0-16 points (medical accuracy + communication quality)

3. Citation Analysis and Validity Checks

To evaluate the reliability and recency of the references provided by each model, we performed a detailed citation analysis for every answer. Our process included:

  • Extraction: All citations (DOIs, PMIDs, URLs) were extracted from each model's output for every run and prompt variant. Duplicate citations within runs were counted only once per model answer.

  • Validation: Each citation was checked for validity by attempting to resolve it via official registries (CrossRef, PubMed, or direct URL access). Citations that did not resolve or were not found in the registry were marked as invalid.

  • Recency: For valid citations, we extracted the publication year. Citations from before 2021 were flagged as 'old' to assess whether models referenced up-to-date literature.

The Complete Results

Here's the full comparison table showing how both AI systems performed across all metrics:

MetricGPT-4OpenEvidenceWinnerWhat This Measures
SBERT Similarity0.717 ± 0.0290.709 ± 0.040GPT-4How similar the meaning is
ROUGE Similarity0.156 ± 0.0200.177 ± 0.020OpenEvidenceHow much text overlaps
BLEU Similarity0.009 ± 0.0030.018 ± 0.009OpenEvidenceExact word matching
LLM Score (0-16)11.667 ± 1.41412.889 ± 2.315OpenEvidenceOverall physician evaluation
LLM Normalized (0-1)0.729 ± 0.0880.806 ± 0.145OpenEvidencePhysician score scaled to 0-1
Section Score (0-6)4.222 ± 0.4414.667 ± 0.866OpenEvidenceMedical accuracy only
Global Score (0-10)7.444 ± 1.0148.222 ± 1.481OpenEvidenceCommunication quality only

Citation results

Citation Analysis by ModelValidInvalidOld (<2021)
ChatGPT18/22 (81.8%)4/22 (18.2%)16/22 (72.7%)
OpenEvidence121/121 (100.0%)0/121 (0.0%)S62/121 (51.2%)
  • Valid: Citation resolves to a real reference (DOI/PMID/URL)

  • Invalid: Citation does not resolve, is fake, or not in the official registry (CrossRef/PubMed)

  • Old: Citation is from before 2021.

  • Note: see invalid citations below in supplemental material

Key Findings

OpenEvidence Wins 6 Out of 7 Metrics (85.7% Win Rate)

GPT-4 wins only 1 metric:

  • SBERT Similarity (0.717 vs 0.709) - semantic similarity

OpenEvidence wins:

  • ROUGE Similarity (0.177 vs 0.156) - text overlap

  • BLEU Similarity (0.018 vs 0.009) - exact word matching

  • LLM Score (13.2 vs 12.2) - physician evaluation

  • LLM Normalized (0.826 vs 0.764) - scaled physician evaluation

  • Section Score (4.8 vs 4.4) - medical accuracy

  • Global Score (8.4 vs 7.8) - communication quality

Key findings from the variability results:

  • OpenEvidence's LLM scores show higher variability for some prompts, indicating it sometimes gives a wider range of answer quality.

  • ChatGPT's scores are generally more consistent (lower standard deviation), but its average scores are often lower than OpenEvidence's.

  • For the easiest prompt (stage_0_ia), both models are highly consistent (very low standard deviation).

  • For more complex prompts (stage_ib_t2a, stage_ii_t2b_or_higher), OpenEvidence's higher mean is sometimes accompanied by higher variability, suggesting it occasionally produces both very strong and weaker answers.

  • Overall: OpenEvidence is more likely to produce top scores, but with a bit more spread; ChatGPT is steadier, but less likely to hit the highest marks.

Key findings from citation analysis:

  • OpenEvidence consistently provided valid citations: Across all runs and prompt variants, OpenEvidence produced only valid citations, with none found to be hallucinated or broken. This demonstrates the strength of specialized medical LLMs in evidence-based referencing.

  • ChatGPT occasionally hallucinated or provided broken citations: Several citations from ChatGPT did not resolve or were not found in official registries, highlighting a known limitation of general-purpose LLMs in generating reliable references.

  • Recency gap: Both models frequently cited older literature, but OpenEvidence had a higher proportion of recent (post-2020) references compared to ChatGPT.

What This Means

  1. OpenEvidence is more accurate - It provides more medically correct information

  2. OpenEvidence is more complete - It covers more of the required details

  3. OpenEvidence is clearer - It communicates medical information better

  4. OpenEvidence can give the best answers, but is less consistent - ChatGPT is more consistent, but rarely the most accurate

  5. OpenEvidence cites more extensively - It includes more references, which are usually valid, though sometimes slightly older

  6. Overall, specialized medical LLMs work better for this task - A system designed for medicine outperforms a general-purpose LLM

Conclusions

Key Insights

  • In straightforward melanoma clinical scenarios, one LLM outperformed the other. In our evaluation of clear-cut clinical scenarios involving melanoma staging, work-up, and treatment recommendations, OpenEvidence, an LLM trained for evidence-based medicine, produced more accurate, complete, and guideline-concordant answers than the general-purpose LLM across most metrics.

  • Guideline adherence can be systematically assessed. Using a structured evaluation pipeline that combined semantic similarity, physician-style grading, and human validation allowed us to measure how closely LLM responses followed the NCCN melanoma guidelines.

  • Human oversight remains necessary. Even in these straightforward melanoma cases, both LLMs occasionally omitted important details or introduced minor inaccuracies, showing that expert review is still essential.

  • Evaluation frameworks are valuable for benchmarking. Our structured, multi-metric approach demonstrates that with appropriate tools and benchmarks, it is possible to meaningfully compare LLMs for specific clinical tasks such as interpreting melanoma guidelines.

Limitations

  • Focused on melanoma staging scenarios. Our evaluation was limited to straightforward melanoma cases and may not generalize to other skin cancers or more complex situations.

  • Emphasized guideline adherence over outcomes. We assessed how well LLMs followed guidelines but did not evaluate whether their recommendations would improve patient outcomes.

  • Did not measure clinical impact. The study did not test the effect of using LLMs in real clinical settings.

  • Single evaluation per model. Each LLM was tested on one set of runs, which may not capture variability in performance.

Future Work

  • Broaden to other skin cancers and more complex/multi-step clinical scenarios.

  • Study clinical utility. Research should measure how LLM recommendations affect care quality, safety, and efficiency in practice.

  • Develop quality control tools. Building automated checks for LLM outputs could help maintain accuracy and reliability at scale.


The Technical Details (For Those Who Want to Dig Deeper)

The AI Grader

How We Built the AI Physician Grader

We used OpenAI's GPT-4 to create an AI "physician" that evaluates medical responses. Here's the system prompt we used:

You are a dermatologist and expert in melanoma. You will GRADE a model answer against the following gold standard excerpt from the most recent NCCN melanoma guidelines. Base your evaluation strictly on this reference.

GRADE the model answer on the following:

Section Accuracy (score each 0, 1, or 2 - WHOLE NUMBERS ONLY):
- Stage: 2 = fully correct, 1 = minor error/omission, 0 = major error/omission
- Workup: 2 = fully correct, 1 = minor error/omission, 0 = major error/omission
- Treatment: 2 = fully correct, 1 = minor error/omission, 0 = major error/omission

Global Criteria (score each 0, 1, or 2 - WHOLE NUMBERS ONLY):
1. ACCURACY: Is the medical information factually correct?
2. RELEVANCE: Does the answer address the specific question asked?
3. DEPTH: Does the answer provide sufficient detail and explanation?
4. CLARITY: Is the answer clearly written and easy to understand?
5. COMPLETENESS: Does the answer cover all necessary aspects of the question?

For each issue, be as specific as possible about what is incorrect, missing, or unclear.

Structured Outputs Explained

Instead of having the AI write free-form text like "This answer is pretty good but missing some details," we made it fill out a specific form with exact scores and specific issues. This ensures consistent, comparable results.

The AI returns graded results in this exact format:

{
  "section_accuracy": {
    "Stage": 2,
    "Workup": 1, 
    "Treatment": 2
  },
  "global": {
    "Accuracy": 2,
    "Relevance": 2,
    "Depth": 1,
    "Clarity": 2,
    "Completeness": 1
  },
  "issues": [
    "Treatment section omits specific adjuvant therapy options",
    "Workup section suggests unnecessary imaging"
  ]
}

Human Validation

To ensure our AI grading was reliable, we created a system that:

  1. Exports all data to CSV files that human physicians can review

  2. Provides detailed scoring breakdowns for each response

  3. Lists specific issues found by the AI grader

  4. Includes the gold standard guidelines for comparison

This allows human physicians to:

  • Review each AI response against the guidelines

  • Compare their scores with the AI scores

  • Identify any discrepancies

  • Provide their own expert evaluation

Supplemental Materials

Detailed Results by Prompt

Stage 0/IA Melanoma (Melanoma in situ)

ModelRunSBERTROUGEBLEULLM ScoreKey Issues
ChatGPT10.7440.1270.00611/16Suggested dermatoscopic evaluation and Mohs surgery (not in guidelines); omitted follow-up reference
ChatGPT20.7510.1780.00915/16Omitted "category 1 for stage IA" specification; missing follow-up guideline reference
ChatGPT30.7450.1320.00716/16Perfect score - no issues identified
OpenEvidence10.7370.1920.02914/16Suggested non-surgical options for melanoma in situ
OpenEvidence20.7380.1710.01011/16Omitted history and physical examination; missing "wide excision" specification
OpenEvidence30.7250.1580.00714/16Suggested 9-10 mm margins (not specified in guidelines)

Stage IB (T2a) Melanoma

ModelRunSBERTROUGEBLEULLM ScoreKey Issues
ChatGPT10.6900.1630.00911/16Suggested high-resolution ultrasound; omitted clinical trial/observation options; missing Stage III referral
ChatGPT20.6920.1630.01111/16Suggested baseline imaging; omitted clinical trial/observation; missing follow-up procedures
ChatGPT30.7450.1770.01113/16Omitted clinical trial or observation options for negative SLNB
OpenEvidence10.7580.2010.02316/16Perfect score - no issues identified
OpenEvidence20.7420.2140.02616/16Perfect score - no issues identified
OpenEvidence30.6610.1610.00316/16Perfect score - no issues identified

Stage II (T2b or higher) Melanoma

ModelRunSBERTROUGEBLEULLM ScoreKey Issues
ChatGPT10.7090.1490.01011/16Suggested baseline CT/PET imaging; omitted pembrolizumab/nivolumab for stage IIB/IIC
ChatGPT20.7020.1410.00411/16Omitted specific adjuvant therapy options; missing clinical trial/observation
ChatGPT30.6780.1770.0129/16Incorrect T2b definition (1.01-2.0 mm); suggested unnecessary imaging
OpenEvidence10.6480.1580.02111/16Omitted specific conditions for baseline imaging; missing adjuvant therapy options
OpenEvidence20.6830.1740.02313/16Omitted pembrolizumab/nivolumab for pathological stage IIB/IIC
OpenEvidence30.6850.1670.02311/16Omitted specific adjuvant therapy options; missing follow-up procedures

Citation Validity Supplmental Material

Invalid citations found

ModelPromptRun#TypeCitation
ChatGPTstage_0_ia1DOI10.1007/s12094-014-1218-5
ChatGPTstage_ib_t2a1DOI10.1001/jama.2017.16261
ChatGPTstage_ib_t2a3DOI10.1002/cncr.36683
ChatGPTstage_ib_t2a3DOI10.1200/JCO.2018.78.2005

Full citation list

ModelPromptRun#TypeCitationValidYearOld
ChatGPTstage_0_ia1DOI10.1007/s12094-014-1218-5INVALID-
ChatGPTstage_0_ia1DOI10.3322/caac.21348VALID2016OLD
ChatGPTstage_ib_t2a1DOI10.1002/cncr.32764VALID2020OLD
ChatGPTstage_ib_t2a1DOI10.1001/jama.2017.16261INVALID-
ChatGPTstage_ii_t2b_or_higher1DOI10.1007/978-3-319-40618-3_41VALID2017OLD
ChatGPTstage_ii_t2b_or_higher1DOI10.1200/JCO.2009.23.4799VALID2009OLD
ChatGPTstage_ii_t2b_or_higher1DOI10.1097/CMR.0000000000000743VALID2021
ChatGPTstage_0_ia2DOI10.1007/978-3-319-40618-3_48VALID2017OLD
ChatGPTstage_0_ia2DOI10.1016/j.jaad.2011.06.038VALID2012OLD
ChatGPTstage_0_ia2DOI10.1001/jamadermatol.2013.7117VALID2014OLD
ChatGPTstage_ib_t2a2DOI10.3322/caac.21392VALID2017OLD
ChatGPTstage_ib_t2a2DOI10.1097/CMR.0000000000000785VALID2021
ChatGPTstage_ii_t2b_or_higher2DOI10.1200/JCO.2016.67.1529VALID2016OLD
ChatGPTstage_ii_t2b_or_higher2DOI10.1016/j.jaad.2018.02.022VALID2018OLD
ChatGPTstage_ii_t2b_or_higher2DOI10.1016/j.jaad.2018.02.022VALID2018OLD
ChatGPTstage_0_ia3DOI10.3322/caac.21388VALID2017OLD
ChatGPTstage_0_ia3DOI10.1016/j.jaad.2018.03.037VALID2018OLD
ChatGPTstage_0_ia3DOI10.1016/j.jaad.2018.03.037VALID2018OLD
ChatGPTstage_ib_t2a3DOI10.1002/cncr.36683INVALID-
ChatGPTstage_ib_t2a3DOI10.1200/JCO.2018.78.2005INVALID-
ChatGPTstage_ib_t2a3DOI10.1056/NEJMoa1709030VALID2017OLD
ChatGPTstage_ii_t2b_or_higher3DOI10.1007/978-3-319-40618-3VALID2017OLD
OpenEvidencestage_ii_t2b_or_higher1DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_ii_t2b_or_higher1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher1DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ii_t2b_or_higher1DOI10.1001/jamadermatol.2023.4193VALID2023
OpenEvidencestage_ii_t2b_or_higher1DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_ii_t2b_or_higher1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher1DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_ii_t2b_or_higher1DOI10.1001/jamasurg.2023.6904VALID2024
OpenEvidencestage_ii_t2b_or_higher1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher1DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ii_t2b_or_higher1PMID31758078VALID2020OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ii_t2b_or_higher3DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1001/jamadermatol.2023.4193VALID2023
OpenEvidencestage_ii_t2b_or_higher3DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1001/jamasurg.2023.6904VALID2024
OpenEvidencestage_ii_t2b_or_higher3DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ii_t2b_or_higher3DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ii_t2b_or_higher3DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_ii_t2b_or_higher3DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ii_t2b_or_higher3DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ii_t2b_or_higher3PMID31758078VALID2020OLD
OpenEvidencestage_ii_t2b_or_higher3PMID31758078VALID2020OLD
OpenEvidencestage_ib_t2a1DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_ib_t2a1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ib_t2a1DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_ib_t2a1DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_ib_t2a1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ib_t2a1DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_ib_t2a1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ib_t2a1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ib_t2a1DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_ib_t2a1DOI10.1001/jamasurg.2023.6904VALID2024
OpenEvidencestage_ib_t2a1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ib_t2a1DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ib_t2a1DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_ib_t2a1DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ib_t2a1PMID31758078VALID2020OLD
OpenEvidencestage_ib_t2a1PMID31758078VALID2020OLD
OpenEvidencestage_ib_t2a1PMID31758078VALID2020OLD
OpenEvidencestage_ib_t2a2DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ib_t2a2DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_ib_t2a2DOI10.1038/s41379-019-0402-xVALID2020OLD
OpenEvidencestage_ib_t2a2DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ib_t2a2DOI10.3390/jcm13061607VALID2024
OpenEvidencestage_ib_t2a2DOI10.1001/jamasurg.2023.6904VALID2024
OpenEvidencestage_ib_t2a2DOI10.1097/PRS.0000000000002367VALID2016OLD
OpenEvidencestage_ib_t2a2DOI10.1007/s11912-019-0843-xVALID2019OLD
OpenEvidencestage_ib_t2a2DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_ib_t2a2DOI10.1001/jamanetworkopen.2022.50613VALID2023
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1016/j.jaad.2018.08.055VALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.3322/caac.21409VALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1038/s41379-019-0402-xVALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1056/NEJMra2034861VALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.3390/jcm13061607VALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1001/jamasurg.2023.6904VALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1097/PRS.0000000000002367VALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1007/s11912-019-0843-xVALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1200/JCO.2017.75.7724VALID-
OpenEvidencestage_ib_t2a2URLhttps://doi.org/10.1001/jamanetworkopen.2022.50613VALID-
OpenEvidencestage_ib_t2a3DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_ib_t2a3DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_ib_t2a3DOI10.1038/s41379-019-0402-xVALID2020OLD
OpenEvidencestage_ib_t2a3DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_ib_t2a3DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_ib_t2a3DOI10.1001/jamasurg.2023.6904VALID2024
OpenEvidencestage_ib_t2a3DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_ib_t2a3URLhttps://doi.org/10.1016/j.jaad.2018.08.055VALID-
OpenEvidencestage_ib_t2a3URLhttps://doi.org/10.3322/caac.21409VALID-
OpenEvidencestage_ib_t2a3URLhttps://doi.org/10.1038/s41379-019-0402-xVALID-
OpenEvidencestage_ib_t2a3URLhttps://doi.org/10.1111/bjd.16892VALID-
OpenEvidencestage_ib_t2a3URLhttps://doi.org/10.1200/JCO.2017.75.7724VALID-
OpenEvidencestage_ib_t2a3URLhttps://doi.org/10.1001/jamasurg.2023.6904VALID-
OpenEvidencestage_ib_t2a3URLhttps://doi.org/10.1056/NEJMra2034861VALID-
OpenEvidencestage_0_ia1DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_0_ia1DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_0_ia1DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_0_ia1DOI10.1016/j.suc.2014.07.001VALID2014OLD
OpenEvidencestage_0_ia1DOI10.1001/jamadermatol.2016.2668VALID2016OLD
OpenEvidencestage_0_ia1DOI10.1097/PRS.0000000000002367VALID2016OLD
OpenEvidencestage_0_ia1URLhttps://doi.org/10.1016/j.jaad.2018.08.055VALID-
OpenEvidencestage_0_ia1URLhttps://doi.org/10.3322/caac.21409VALID-
OpenEvidencestage_0_ia1URLhttps://doi.org/10.1056/NEJMra2034861VALID-
OpenEvidencestage_0_ia1URLhttps://doi.org/10.1016/j.suc.2014.07.001VALID-
OpenEvidencestage_0_ia1URLhttps://doi.org/10.1001/jamadermatol.2016.2668VALID-
OpenEvidencestage_0_ia1URLhttps://doi.org/10.1097/PRS.0000000000002367VALID-
OpenEvidencestage_0_ia2DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_0_ia2DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_0_ia2DOI10.1056/NEJMra2034861VALID2021
OpenEvidencestage_0_ia2DOI10.1016/j.suc.2014.07.001VALID2014OLD
OpenEvidencestage_0_ia2DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_0_ia2DOI10.1001/jamadermatol.2016.2668VALID2016OLD
OpenEvidencestage_0_ia2DOI10.1097/PRS.0000000000002367VALID2016OLD
OpenEvidencestage_0_ia2URLhttps://doi.org/10.1016/j.jaad.2018.08.055VALID-
OpenEvidencestage_0_ia2URLhttps://doi.org/10.1111/bjd.16892VALID-
OpenEvidencestage_0_ia2URLhttps://doi.org/10.1056/NEJMra2034861VALID-
OpenEvidencestage_0_ia2URLhttps://doi.org/10.1016/j.suc.2014.07.001VALID-
OpenEvidencestage_0_ia2URLhttps://doi.org/10.1200/JCO.2017.75.7724VALID-
OpenEvidencestage_0_ia2URLhttps://doi.org/10.1001/jamadermatol.2016.2668VALID-
OpenEvidencestage_0_ia2URLhttps://doi.org/10.1097/PRS.0000000000002367VALID-
OpenEvidencestage_0_ia3DOI10.1016/j.jaad.2018.08.055VALID2019OLD
OpenEvidencestage_0_ia3DOI10.1111/bjd.16892VALID2018OLD
OpenEvidencestage_0_ia3DOI10.3322/caac.21409VALID2017OLD
OpenEvidencestage_0_ia3DOI10.1016/j.cps.2021.05.004VALID2021
OpenEvidencestage_0_ia3DOI10.1016/j.suc.2014.07.001VALID2014OLD
OpenEvidencestage_0_ia3DOI10.1200/JCO.2017.75.7724VALID2018OLD
OpenEvidencestage_0_ia3DOI10.1016/j.jaad.2019.01.051VALID2019OLD
OpenEvidencestage_0_ia3URLhttps://doi.org/10.1016/j.jaad.2018.08.055VALID-
OpenEvidencestage_0_ia3URLhttps://doi.org/10.1111/bjd.16892VALID-
OpenEvidencestage_0_ia3URLhttps://doi.org/10.3322/caac.21409VALID-
OpenEvidencestage_0_ia3URLhttps://doi.org/10.1016/j.cps.2021.05.004VALID-
OpenEvidencestage_0_ia3URLhttps://doi.org/10.1016/j.suc.2014.07.001VALID-
OpenEvidencestage_0_ia3URLhttps://doi.org/10.1200/JCO.2017.75.7724VALID-
OpenEvidencestage_0_ia3URLhttps://doi.org/10.1016/j.jaad.2019.01.051VALID-
0
Subscribe to my newsletter

Read articles from Christina Bear directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Christina Bear
Christina Bear