MSKCC Skin Tone Labeling Dataset
Description:

This dataset contains detailed skin tone annotations collected from a prospective, single-center observational study performed at Memorial Sloan Kettering Cancer Center from 2023-2024. The cohort consists of 64 adult patients who underwent full-body skin examinations by board-certified dermatologists. To ensure diverse representation across the spectrum of skin tones, patients were recruited to achieve a balanced distribution across all six Fitzpatrick Skin Types. This dataset was developed to evaluate the reliability of different skin tone labeling methods and to support fairness research in dermatologic AI.

The dataset comprises both patient-level and site-level metadata for skin tone classification using the Fitzpatrick Skin Type scale, Monk Skin Tone scale, Pantone SkinTone Guide, and colorimeter readings (SkinColorCatch, Delfin Technologies). A total of 4,879 dermoscopic images are included. Skin tone assessments were collected across both lesional and non-lesional (normal skin) sites, mapped to standardized anatomic locations. All skin lesions are assumed to be benign, as they were imaged immediately following dermatologic evaluation.

All data were collected under an IRB-approved protocol with informed consent. The dataset has been fully de-identified in accordance with HIPAA regulations, and no protected health information (PHI) is included.