MSKCC Skin Tone Labeling Dataset

This dataset contains detailed skin tone annotations collected from a prospective, single-center observational study performed at Memorial Sloan Kettering Cancer Center from 2023-2024. The cohort consists of 64 adult patients who underwent full-body skin examinations by board-certified dermatologists. To ensure diverse representation across the spectrum of skin tones, patients were recruited to achieve a balanced distribution across all six Fitzpatrick Skin Types. This dataset was developed to evaluate the reliability of different skin tone labeling methods and to support fairness research in dermatologic AI.

The dataset comprises both patient-level and site-level metadata for skin tone classification using the Fitzpatrick Skin Type scale, Monk Skin Tone scale, Pantone SkinTone Guide, and colorimeter readings (SkinColorCatch, Delfin Technologies). A total of 4,879 dermoscopic images are included. Skin tone assessments were collected across both lesional and non-lesional (normal skin) sites, mapped to standardized anatomic locations. All skin lesions are assumed to be benign, as they were imaged immediately following dermatologic evaluation.

All data were collected under an IRB-approved protocol with informed consent. The dataset has been fully de-identified in accordance with HIPAA regulations, and no protected health information (PHI) is included.

Files

Description Size Type Action
The complete bundle of all images, metadata, and supplemental files related to this dataset. 5.5 GB ZIP
The metadata for this dataset. 968.1 KB CSV
Supplemental data dictionary 7.0 KB TXT
Characteristics of patients enrolled in the study. 5.7 KB CSV
Characteristics of skin sites analyzed in the study. 71.8 KB CSV
Analysis of inter-rater agreement for in-person Pantone and MST on both lesional and non-lesional sites. 90.1 KB CSV
Analysis of inter-rater agreement for in-person colorimeter device measurements on non-lesional sites. 47.0 KB CSV
Analysis of clustering of FST, MST, and Pantone classes in the 2-dimensional CIELAB color space on non-lesional sites. 23.6 KB CSV
Analysis of concordance between MST ratings of non-lesional sites performed by Rater 1 in-person versus through 3D total body photographs (TBP). 55.7 KB CSV
Analysis of colorimeter vs. image-extracted ITA in non-lesional sites. 274.5 KB CSV
Analysis of crowd-sourced FST vs. in-person FST rating. 167.5 KB CSV
Analysis of estimated risk scores from an AI algorithm (ADAE) 34.6 KB CSV

Dataset Details

Published
DOI
10.34970/962049
Images
4,879
Attributions
  • Memorial Sloan Kettering Cancer Center

Licenses

CC-BY
CC-BY

This content is free to use, modify, and share as long as you provide credit to the original creator.

How to Cite