MILK10k consists of 10480 images, each representing a paired clinical close-up and dermatoscopic image for 5240 lesions. The dataset’s metadata include age (in 5-year intervals), sex, anatomic site, skin tone, diagnosis, method of ground truth establishment (histopathology or other means), and, if a dermatoscopic image of the same lesion was previously included in ISIC, its corresponding ISIC identifier. Skin tone is categorized into six levels, ranging from very dark (0) to very light (5), intentionally distinct from the Fitzpatrick skin types to avoid confusion. Most patients had skin tones in the middle ranges. Of the 5240 lesions, 95.7% were biopsied or excised, with histopathology serving as the gold standard for diagnosis. Diagnoses were mapped to both the ISIC-Dx diagnostic scheme and a simplified classification based on the ISIC2018/2019 challenge and HAM10000 diagnostic categories. The dataset includes 11 broad diagnostic categories:

  1. Basal cell carcinoma (bcc)
  2. Melanocytic nevus (nv)
  3. Benign keratinocytic lesion (bkl)
  4. Squamous cell carcinoma/keratoacanthoma (sccka)
  5. Melanoma (mel)
  6. Actinic keratosis/intraepidermal carcinoma (akiec)
  7. Dermatofibroma (df)
  8. Inflammatory and infectious conditions (inf)
  9. Vascular lesions and hemorrhage (vasc)
  10. Other benign proliferations including collision tumors (ben_oth)
  11. Other malignant proliferations including collision tumors (mal_oth)

Additionally, we provide the most specific ISIC-Dx diagnosis and its parent branch in the ISIC-Dx diagnostic tree. In cases where a dermatoscopic image of the same lesion was already included in the ISIC archive, its ISIC identifier is reported in the metadata. Furthermore, all images have been annotated using the MONET framework, with probabilities for the following concept term groups included in the metadata:

  1. Ulceration, crust
  2. Hair
  3. Vasculature, vessels
  4. Erythema
  5. Pigmentation
  6. Gel, water drop, fluid, dermoscopy liquid
  7. Skin markings, pen ink, purple pen

In addition to MILK10k, we have curated a smaller benchmark dataset, called MILK10k Benchmark derived from the same sources and covering the same diagnostic categories. This dataset is available as part of a live challenge within the ISIC framework and can be accessed on ISIC.

Images were provided by the following institutions:

  • Department of Dermatology, Medical University of Vienna, Vienna, Austria
  • Medicine Faculty Department of Dermatology, Ankara University, Ankara, Turkey
  • Mayne Academy of General Practice, Medical School, The University of Queensland, Australia
  • Dermatology Service, Memorial Sloan Kettering Cancer Center, New York, USA
  • Independent Researcher, 1000 Skopje, North Macedonia

Files

Description Size Type Action
The complete bundle of all images, metadata, and supplemental files related to this dataset. 341.5 MB ZIP
The metadata for this dataset. 2.0 MB CSV
Model input metadata containing non-diagnostic image-level attributes 2.5 MB CSV
Ground truth file containing one-hot encoded lesion-level diagnostic labels 184.3 KB CSV
Image-level diagnostic attributes absent from the model input file 626.1 KB CSV

Dataset Details

Published
DOI
10.34970/648456
Images
10,480
Attributions
  • MILK study team

Licenses

CC-BY-NC
CC-BY-NC

This content is free to use, modify, and share for non-commercial purposes, as long as you provide credit to the original creator.

How to Cite