Description:

MILK10k consists of 10480 images, each representing a paired clinical close-up and dermatoscopic image for 5240 lesions. The dataset’s metadata include age (in 5-year intervals), sex, anatomic site, skin tone, diagnosis, method of ground truth establishment (histopathology or other means), and, if a dermatoscopic image of the same lesion was previously included in ISIC, its corresponding ISIC identifier. Skin tone is categorized into six levels, ranging from very dark (0) to very light (5), intentionally distinct from the Fitzpatrick skin types to avoid confusion. Most patients had skin tones in the middle ranges. Of the 5240 lesions, 95.7% were biopsied or excised, with histopathology serving as the gold standard for diagnosis. Diagnoses were mapped to both the ISIC-Dx diagnostic scheme and a simplified classification based on the ISIC2018/2019 challenge and HAM10000 diagnostic categories. The dataset includes 11 broad diagnostic categories:

  1. Basal cell carcinoma (bcc)
  2. Melanocytic nevus (nv)
  3. Benign keratinocytic lesion (bkl)
  4. Squamous cell carcinoma/keratoacanthoma (sccka)
  5. Melanoma (mel)
  6. Actinic keratosis/intraepidermal carcinoma (akiec)
  7. Dermatofibroma (df)
  8. Inflammatory and infectious conditions (inf)
  9. Vascular lesions and hemorrhage (vasc)
  10. Other benign proliferations including collision tumors (ben_oth)
  11. Other malignant proliferations including collision tumors (mal_oth)

Additionally, we provide the most specific ISIC-Dx diagnosis and its parent branch in the ISIC-Dx diagnostic tree. In cases where a dermatoscopic image of the same lesion was already included in the ISIC archive, its ISIC identifier is reported in the metadata. Furthermore, all images have been annotated using the MONET framework, with probabilities for the following concept term groups included in the metadata:

  1. Ulceration, crust
  2. Hair
  3. Vasculature, vessels
  4. Erythema
  5. Pigmentation
  6. Gel, water drop, fluid, dermoscopy liquid
  7. Skin markings, pen ink, purple pen

In addition to MILK10k, we have curated a smaller benchmark dataset, called MILK10k Benchmark derived from the same sources and covering the same diagnostic categories. This dataset is available as part of a live challenge within the ISIC framework and can be accessed on ISIC.

Images were provided by the following institutions:

  • Department of Dermatology, Medical University of Vienna, Vienna, Austria
  • Medicine Faculty Department of Dermatology, Ankara University, Ankara, Turkey
  • Mayne Academy of General Practice, Medical School, The University of Queensland, Australia
  • Dermatology Service, Memorial Sloan Kettering Cancer Center, New York, USA
  • Independent Researcher, 1000 Skopje, North Macedonia